diff options
author | 2023-02-21 18:24:12 -0800 | |
---|---|---|
committer | 2023-02-21 18:24:12 -0800 | |
commit | 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 (patch) | |
tree | cc5c2d0a898769fd59549594fedb3ee6f84e59a0 /drivers/powercap/dtpm_cpu.c | |
download | linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.tar.gz linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.zip |
Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextgrafted
Pull networking updates from Jakub Kicinski:
"Core:
- Add dedicated kmem_cache for typical/small skb->head, avoid having
to access struct page at kfree time, and improve memory use.
- Introduce sysctl to set default RPS configuration for new netdevs.
- Define Netlink protocol specification format which can be used to
describe messages used by each family and auto-generate parsers.
Add tools for generating kernel data structures and uAPI headers.
- Expose all net/core sysctls inside netns.
- Remove 4s sleep in netpoll if carrier is instantly detected on
boot.
- Add configurable limit of MDB entries per port, and port-vlan.
- Continue populating drop reasons throughout the stack.
- Retire a handful of legacy Qdiscs and classifiers.
Protocols:
- Support IPv4 big TCP (TSO frames larger than 64kB).
- Add IP_LOCAL_PORT_RANGE socket option, to control local port range
on socket by socket basis.
- Track and report in procfs number of MPTCP sockets used.
- Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path
manager.
- IPv6: don't check net.ipv6.route.max_size and rely on garbage
collection to free memory (similarly to IPv4).
- Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).
- ICMP: add per-rate limit counters.
- Add support for user scanning requests in ieee802154.
- Remove static WEP support.
- Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
reporting.
- WiFi 7 EHT channel puncturing support (client & AP).
BPF:
- Add a rbtree data structure following the "next-gen data structure"
precedent set by recently added linked list, that is, by using
kfunc + kptr instead of adding a new BPF map type.
- Expose XDP hints via kfuncs with initial support for RX hash and
timestamp metadata.
- Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to
better support decap on GRE tunnel devices not operating in collect
metadata.
- Improve x86 JIT's codegen for PROBE_MEM runtime error checks.
- Remove the need for trace_printk_lock for bpf_trace_printk and
bpf_trace_vprintk helpers.
- Extend libbpf's bpf_tracing.h support for tracing arguments of
kprobes/uprobes and syscall as a special case.
- Significantly reduce the search time for module symbols by
livepatch and BPF.
- Enable cpumasks to be used as kptrs, which is useful for tracing
programs tracking which tasks end up running on which CPUs in
different time intervals.
- Add support for BPF trampoline on s390x and riscv64.
- Add capability to export the XDP features supported by the NIC.
- Add __bpf_kfunc tag for marking kernel functions as kfuncs.
- Add cgroup.memory=nobpf kernel parameter option to disable BPF
memory accounting for container environments.
Netfilter:
- Remove the CLUSTERIP target. It has been marked as obsolete for
years, and we still have WARN splats wrt races of the out-of-band
/proc interface installed by this target.
- Add 'destroy' commands to nf_tables. They are identical to the
existing 'delete' commands, but do not return an error if the
referenced object (set, chain, rule...) did not exist.
Driver API:
- Improve cpumask_local_spread() locality to help NICs set the right
IRQ affinity on AMD platforms.
- Separate C22 and C45 MDIO bus transactions more clearly.
- Introduce new DCB table to control DSCP rewrite on egress.
- Support configuration of Physical Layer Collision Avoidance (PLCA)
Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
shared medium Ethernet.
- Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
preemption of low priority frames by high priority frames.
- Add support for controlling MACSec offload using netlink SET.
- Rework devlink instance refcounts to allow registration and
de-registration under the instance lock. Split the code into
multiple files, drop some of the unnecessarily granular locks and
factor out common parts of netlink operation handling.
- Add TX frame aggregation parameters (for USB drivers).
- Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
messages with notifications for debug.
- Allow offloading of UDP NEW connections via act_ct.
- Add support for per action HW stats in TC.
- Support hardware miss to TC action (continue processing in SW from
a specific point in the action chain).
- Warn if old Wireless Extension user space interface is used with
modern cfg80211/mac80211 drivers. Do not support Wireless
Extensions for Wi-Fi 7 devices at all. Everyone should switch to
using nl80211 interface instead.
- Improve the CAN bit timing configuration. Use extack to return
error messages directly to user space, update the SJW handling,
including the definition of a new default value that will benefit
CAN-FD controllers, by increasing their oscillator tolerance.
New hardware / drivers:
- Ethernet:
- nVidia BlueField-3 support (control traffic driver)
- Ethernet support for imx93 SoCs
- Motorcomm yt8531 gigabit Ethernet PHY
- onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
- Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
- Amlogic gxl MDIO mux
- WiFi:
- RealTek RTL8188EU (rtl8xxxu)
- Qualcomm Wi-Fi 7 devices (ath12k)
- CAN:
- Renesas R-Car V4H
Drivers:
- Bluetooth:
- Set Per Platform Antenna Gain (PPAG) for Intel controllers.
- Ethernet NICs:
- Intel (1G, igc):
- support TSN / Qbv / packet scheduling features of i226 model
- Intel (100G, ice):
- use GNSS subsystem instead of TTY
- multi-buffer XDP support
- extend support for GPIO pins to E823 devices
- nVidia/Mellanox:
- update the shared buffer configuration on PFC commands
- implement PTP adjphase function for HW offset control
- TC support for Geneve and GRE with VF tunnel offload
- more efficient crypto key management method
- multi-port eswitch support
- Netronome/Corigine:
- add DCB IEEE support
- support IPsec offloading for NFP3800
- Freescale/NXP (enetc):
- support XDP_REDIRECT for XDP non-linear buffers
- improve reconfig, avoid link flap and waiting for idle
- support MAC Merge layer
- Other NICs:
- sfc/ef100: add basic devlink support for ef100
- ionic: rx_push mode operation (writing descriptors via MMIO)
- bnxt: use the auxiliary bus abstraction for RDMA
- r8169: disable ASPM and reset bus in case of tx timeout
- cpsw: support QSGMII mode for J721e CPSW9G
- cpts: support pulse-per-second output
- ngbe: add an mdio bus driver
- usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
- r8152: handle devices with FW with NCM support
- amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
- virtio-net: support multi buffer XDP
- virtio/vsock: replace virtio_vsock_pkt with sk_buff
- tsnep: XDP support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add support for latency TLV (in FW control messages)
- Microchip (sparx5):
- separate explicit and implicit traffic forwarding rules, make
the implicit rules always active
- add support for egress DSCP rewrite
- IS0 VCAP support (Ingress Classification)
- IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS
etc.)
- ES2 VCAP support (Egress Access Control)
- support for Per-Stream Filtering and Policing (802.1Q,
8.6.5.1)
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- add MAB (port auth) offload support
- enable PTP receive for mv88e6390
- NXP (ocelot):
- support MAC Merge layer
- support for the the vsc7512 internal copper phys
- Microchip:
- lan9303: convert to PHYLINK
- lan966x: support TC flower filter statistics
- lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
- lan937x: support Credit Based Shaper configuration
- ksz9477: support Energy Efficient Ethernet
- other:
- qca8k: convert to regmap read/write API, use bulk operations
- rswitch: Improve TX timestamp accuracy
- Intel WiFi (iwlwifi):
- EHT (Wi-Fi 7) rate reporting
- STEP equalizer support: transfer some STEP (connection to radio
on platforms with integrated wifi) related parameters from the
BIOS to the firmware.
- Qualcomm 802.11ax WiFi (ath11k):
- IPQ5018 support
- Fine Timing Measurement (FTM) responder role support
- channel 177 support
- MediaTek WiFi (mt76):
- per-PHY LED support
- mt7996: EHT (Wi-Fi 7) support
- Wireless Ethernet Dispatch (WED) reset support
- switch to using page pool allocator
- RealTek WiFi (rtw89):
- support new version of Bluetooth co-existance
- Mobile:
- rmnet: support TX aggregation"
* tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits)
page_pool: add a comment explaining the fragment counter usage
net: ethtool: fix __ethtool_dev_mm_supported() implementation
ethtool: pse-pd: Fix double word in comments
xsk: add linux/vmalloc.h to xsk.c
sefltests: netdevsim: wait for devlink instance after netns removal
selftest: fib_tests: Always cleanup before exit
net/mlx5e: Align IPsec ASO result memory to be as required by hardware
net/mlx5e: TC, Set CT miss to the specific ct action instance
net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
net/mlx5: Refactor tc miss handling to a single function
net/mlx5: Kconfig: Make tc offload depend on tc skb extension
net/sched: flower: Support hardware miss to tc action
net/sched: flower: Move filter handle initialization earlier
net/sched: cls_api: Support hardware miss to tc action
net/sched: Rename user cookie and act cookie
sfc: fix builds without CONFIG_RTC_LIB
sfc: clean up some inconsistent indentings
net/mlx4_en: Introduce flexible array to silence overflow warning
net: lan966x: Fix possible deadlock inside PTP
net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
...
Diffstat (limited to 'drivers/powercap/dtpm_cpu.c')
-rw-r--r-- | drivers/powercap/dtpm_cpu.c | 297 |
1 files changed, 297 insertions, 0 deletions
diff --git a/drivers/powercap/dtpm_cpu.c b/drivers/powercap/dtpm_cpu.c new file mode 100644 index 000000000..2ff771753 --- /dev/null +++ b/drivers/powercap/dtpm_cpu.c @@ -0,0 +1,297 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright 2020 Linaro Limited + * + * Author: Daniel Lezcano <daniel.lezcano@linaro.org> + * + * The DTPM CPU is based on the energy model. It hooks the CPU in the + * DTPM tree which in turns update the power number by propagating the + * power number from the CPU energy model information to the parents. + * + * The association between the power and the performance state, allows + * to set the power of the CPU at the OPP granularity. + * + * The CPU hotplug is supported and the power numbers will be updated + * if a CPU is hot plugged / unplugged. + */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/cpumask.h> +#include <linux/cpufreq.h> +#include <linux/cpuhotplug.h> +#include <linux/dtpm.h> +#include <linux/energy_model.h> +#include <linux/of.h> +#include <linux/pm_qos.h> +#include <linux/slab.h> +#include <linux/units.h> + +struct dtpm_cpu { + struct dtpm dtpm; + struct freq_qos_request qos_req; + int cpu; +}; + +static DEFINE_PER_CPU(struct dtpm_cpu *, dtpm_per_cpu); + +static struct dtpm_cpu *to_dtpm_cpu(struct dtpm *dtpm) +{ + return container_of(dtpm, struct dtpm_cpu, dtpm); +} + +static u64 set_pd_power_limit(struct dtpm *dtpm, u64 power_limit) +{ + struct dtpm_cpu *dtpm_cpu = to_dtpm_cpu(dtpm); + struct em_perf_domain *pd = em_cpu_get(dtpm_cpu->cpu); + struct cpumask cpus; + unsigned long freq; + u64 power; + int i, nr_cpus; + + cpumask_and(&cpus, cpu_online_mask, to_cpumask(pd->cpus)); + nr_cpus = cpumask_weight(&cpus); + + for (i = 0; i < pd->nr_perf_states; i++) { + + power = pd->table[i].power * nr_cpus; + + if (power > power_limit) + break; + } + + freq = pd->table[i - 1].frequency; + + freq_qos_update_request(&dtpm_cpu->qos_req, freq); + + power_limit = pd->table[i - 1].power * nr_cpus; + + return power_limit; +} + +static u64 scale_pd_power_uw(struct cpumask *pd_mask, u64 power) +{ + unsigned long max, sum_util = 0; + int cpu; + + /* + * The capacity is the same for all CPUs belonging to + * the same perf domain. + */ + max = arch_scale_cpu_capacity(cpumask_first(pd_mask)); + + for_each_cpu_and(cpu, pd_mask, cpu_online_mask) + sum_util += sched_cpu_util(cpu); + + return (power * ((sum_util << 10) / max)) >> 10; +} + +static u64 get_pd_power_uw(struct dtpm *dtpm) +{ + struct dtpm_cpu *dtpm_cpu = to_dtpm_cpu(dtpm); + struct em_perf_domain *pd; + struct cpumask *pd_mask; + unsigned long freq; + int i; + + pd = em_cpu_get(dtpm_cpu->cpu); + + pd_mask = em_span_cpus(pd); + + freq = cpufreq_quick_get(dtpm_cpu->cpu); + + for (i = 0; i < pd->nr_perf_states; i++) { + + if (pd->table[i].frequency < freq) + continue; + + return scale_pd_power_uw(pd_mask, pd->table[i].power * + MICROWATT_PER_MILLIWATT); + } + + return 0; +} + +static int update_pd_power_uw(struct dtpm *dtpm) +{ + struct dtpm_cpu *dtpm_cpu = to_dtpm_cpu(dtpm); + struct em_perf_domain *em = em_cpu_get(dtpm_cpu->cpu); + struct cpumask cpus; + int nr_cpus; + + cpumask_and(&cpus, cpu_online_mask, to_cpumask(em->cpus)); + nr_cpus = cpumask_weight(&cpus); + + dtpm->power_min = em->table[0].power; + dtpm->power_min *= MICROWATT_PER_MILLIWATT; + dtpm->power_min *= nr_cpus; + + dtpm->power_max = em->table[em->nr_perf_states - 1].power; + dtpm->power_max *= MICROWATT_PER_MILLIWATT; + dtpm->power_max *= nr_cpus; + + return 0; +} + +static void pd_release(struct dtpm *dtpm) +{ + struct dtpm_cpu *dtpm_cpu = to_dtpm_cpu(dtpm); + struct cpufreq_policy *policy; + + if (freq_qos_request_active(&dtpm_cpu->qos_req)) + freq_qos_remove_request(&dtpm_cpu->qos_req); + + policy = cpufreq_cpu_get(dtpm_cpu->cpu); + if (policy) { + for_each_cpu(dtpm_cpu->cpu, policy->related_cpus) + per_cpu(dtpm_per_cpu, dtpm_cpu->cpu) = NULL; + } + + kfree(dtpm_cpu); +} + +static struct dtpm_ops dtpm_ops = { + .set_power_uw = set_pd_power_limit, + .get_power_uw = get_pd_power_uw, + .update_power_uw = update_pd_power_uw, + .release = pd_release, +}; + +static int cpuhp_dtpm_cpu_offline(unsigned int cpu) +{ + struct dtpm_cpu *dtpm_cpu; + + dtpm_cpu = per_cpu(dtpm_per_cpu, cpu); + if (dtpm_cpu) + dtpm_update_power(&dtpm_cpu->dtpm); + + return 0; +} + +static int cpuhp_dtpm_cpu_online(unsigned int cpu) +{ + struct dtpm_cpu *dtpm_cpu; + + dtpm_cpu = per_cpu(dtpm_per_cpu, cpu); + if (dtpm_cpu) + return dtpm_update_power(&dtpm_cpu->dtpm); + + return 0; +} + +static int __dtpm_cpu_setup(int cpu, struct dtpm *parent) +{ + struct dtpm_cpu *dtpm_cpu; + struct cpufreq_policy *policy; + struct em_perf_domain *pd; + char name[CPUFREQ_NAME_LEN]; + int ret = -ENOMEM; + + dtpm_cpu = per_cpu(dtpm_per_cpu, cpu); + if (dtpm_cpu) + return 0; + + policy = cpufreq_cpu_get(cpu); + if (!policy) + return 0; + + pd = em_cpu_get(cpu); + if (!pd || em_is_artificial(pd)) + return -EINVAL; + + dtpm_cpu = kzalloc(sizeof(*dtpm_cpu), GFP_KERNEL); + if (!dtpm_cpu) + return -ENOMEM; + + dtpm_init(&dtpm_cpu->dtpm, &dtpm_ops); + dtpm_cpu->cpu = cpu; + + for_each_cpu(cpu, policy->related_cpus) + per_cpu(dtpm_per_cpu, cpu) = dtpm_cpu; + + snprintf(name, sizeof(name), "cpu%d-cpufreq", dtpm_cpu->cpu); + + ret = dtpm_register(name, &dtpm_cpu->dtpm, parent); + if (ret) + goto out_kfree_dtpm_cpu; + + ret = freq_qos_add_request(&policy->constraints, + &dtpm_cpu->qos_req, FREQ_QOS_MAX, + pd->table[pd->nr_perf_states - 1].frequency); + if (ret) + goto out_dtpm_unregister; + + return 0; + +out_dtpm_unregister: + dtpm_unregister(&dtpm_cpu->dtpm); + dtpm_cpu = NULL; + +out_kfree_dtpm_cpu: + for_each_cpu(cpu, policy->related_cpus) + per_cpu(dtpm_per_cpu, cpu) = NULL; + kfree(dtpm_cpu); + + return ret; +} + +static int dtpm_cpu_setup(struct dtpm *dtpm, struct device_node *np) +{ + int cpu; + + cpu = of_cpu_node_to_id(np); + if (cpu < 0) + return 0; + + return __dtpm_cpu_setup(cpu, dtpm); +} + +static int dtpm_cpu_init(void) +{ + int ret; + + /* + * The callbacks at CPU hotplug time are calling + * dtpm_update_power() which in turns calls update_pd_power(). + * + * The function update_pd_power() uses the online mask to + * figure out the power consumption limits. + * + * At CPUHP_AP_ONLINE_DYN, the CPU is present in the CPU + * online mask when the cpuhp_dtpm_cpu_online function is + * called, but the CPU is still in the online mask for the + * tear down callback. So the power can not be updated when + * the CPU is unplugged. + * + * At CPUHP_AP_DTPM_CPU_DEAD, the situation is the opposite as + * above. The CPU online mask is not up to date when the CPU + * is plugged in. + * + * For this reason, we need to call the online and offline + * callbacks at different moments when the CPU online mask is + * consistent with the power numbers we want to update. + */ + ret = cpuhp_setup_state(CPUHP_AP_DTPM_CPU_DEAD, "dtpm_cpu:offline", + NULL, cpuhp_dtpm_cpu_offline); + if (ret < 0) + return ret; + + ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "dtpm_cpu:online", + cpuhp_dtpm_cpu_online, NULL); + if (ret < 0) + return ret; + + return 0; +} + +static void dtpm_cpu_exit(void) +{ + cpuhp_remove_state_nocalls(CPUHP_AP_ONLINE_DYN); + cpuhp_remove_state_nocalls(CPUHP_AP_DTPM_CPU_DEAD); +} + +struct dtpm_subsys_ops dtpm_cpu_ops = { + .name = KBUILD_MODNAME, + .init = dtpm_cpu_init, + .exit = dtpm_cpu_exit, + .setup = dtpm_cpu_setup, +}; |