diff options
author | 2023-02-21 18:24:12 -0800 | |
---|---|---|
committer | 2023-02-21 18:24:12 -0800 | |
commit | 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 (patch) | |
tree | cc5c2d0a898769fd59549594fedb3ee6f84e59a0 /arch/arm/kernel/perf_event_v6.c | |
download | linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.tar.gz linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.zip |
Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextgrafted
Pull networking updates from Jakub Kicinski:
"Core:
- Add dedicated kmem_cache for typical/small skb->head, avoid having
to access struct page at kfree time, and improve memory use.
- Introduce sysctl to set default RPS configuration for new netdevs.
- Define Netlink protocol specification format which can be used to
describe messages used by each family and auto-generate parsers.
Add tools for generating kernel data structures and uAPI headers.
- Expose all net/core sysctls inside netns.
- Remove 4s sleep in netpoll if carrier is instantly detected on
boot.
- Add configurable limit of MDB entries per port, and port-vlan.
- Continue populating drop reasons throughout the stack.
- Retire a handful of legacy Qdiscs and classifiers.
Protocols:
- Support IPv4 big TCP (TSO frames larger than 64kB).
- Add IP_LOCAL_PORT_RANGE socket option, to control local port range
on socket by socket basis.
- Track and report in procfs number of MPTCP sockets used.
- Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path
manager.
- IPv6: don't check net.ipv6.route.max_size and rely on garbage
collection to free memory (similarly to IPv4).
- Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).
- ICMP: add per-rate limit counters.
- Add support for user scanning requests in ieee802154.
- Remove static WEP support.
- Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
reporting.
- WiFi 7 EHT channel puncturing support (client & AP).
BPF:
- Add a rbtree data structure following the "next-gen data structure"
precedent set by recently added linked list, that is, by using
kfunc + kptr instead of adding a new BPF map type.
- Expose XDP hints via kfuncs with initial support for RX hash and
timestamp metadata.
- Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to
better support decap on GRE tunnel devices not operating in collect
metadata.
- Improve x86 JIT's codegen for PROBE_MEM runtime error checks.
- Remove the need for trace_printk_lock for bpf_trace_printk and
bpf_trace_vprintk helpers.
- Extend libbpf's bpf_tracing.h support for tracing arguments of
kprobes/uprobes and syscall as a special case.
- Significantly reduce the search time for module symbols by
livepatch and BPF.
- Enable cpumasks to be used as kptrs, which is useful for tracing
programs tracking which tasks end up running on which CPUs in
different time intervals.
- Add support for BPF trampoline on s390x and riscv64.
- Add capability to export the XDP features supported by the NIC.
- Add __bpf_kfunc tag for marking kernel functions as kfuncs.
- Add cgroup.memory=nobpf kernel parameter option to disable BPF
memory accounting for container environments.
Netfilter:
- Remove the CLUSTERIP target. It has been marked as obsolete for
years, and we still have WARN splats wrt races of the out-of-band
/proc interface installed by this target.
- Add 'destroy' commands to nf_tables. They are identical to the
existing 'delete' commands, but do not return an error if the
referenced object (set, chain, rule...) did not exist.
Driver API:
- Improve cpumask_local_spread() locality to help NICs set the right
IRQ affinity on AMD platforms.
- Separate C22 and C45 MDIO bus transactions more clearly.
- Introduce new DCB table to control DSCP rewrite on egress.
- Support configuration of Physical Layer Collision Avoidance (PLCA)
Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
shared medium Ethernet.
- Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
preemption of low priority frames by high priority frames.
- Add support for controlling MACSec offload using netlink SET.
- Rework devlink instance refcounts to allow registration and
de-registration under the instance lock. Split the code into
multiple files, drop some of the unnecessarily granular locks and
factor out common parts of netlink operation handling.
- Add TX frame aggregation parameters (for USB drivers).
- Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
messages with notifications for debug.
- Allow offloading of UDP NEW connections via act_ct.
- Add support for per action HW stats in TC.
- Support hardware miss to TC action (continue processing in SW from
a specific point in the action chain).
- Warn if old Wireless Extension user space interface is used with
modern cfg80211/mac80211 drivers. Do not support Wireless
Extensions for Wi-Fi 7 devices at all. Everyone should switch to
using nl80211 interface instead.
- Improve the CAN bit timing configuration. Use extack to return
error messages directly to user space, update the SJW handling,
including the definition of a new default value that will benefit
CAN-FD controllers, by increasing their oscillator tolerance.
New hardware / drivers:
- Ethernet:
- nVidia BlueField-3 support (control traffic driver)
- Ethernet support for imx93 SoCs
- Motorcomm yt8531 gigabit Ethernet PHY
- onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
- Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
- Amlogic gxl MDIO mux
- WiFi:
- RealTek RTL8188EU (rtl8xxxu)
- Qualcomm Wi-Fi 7 devices (ath12k)
- CAN:
- Renesas R-Car V4H
Drivers:
- Bluetooth:
- Set Per Platform Antenna Gain (PPAG) for Intel controllers.
- Ethernet NICs:
- Intel (1G, igc):
- support TSN / Qbv / packet scheduling features of i226 model
- Intel (100G, ice):
- use GNSS subsystem instead of TTY
- multi-buffer XDP support
- extend support for GPIO pins to E823 devices
- nVidia/Mellanox:
- update the shared buffer configuration on PFC commands
- implement PTP adjphase function for HW offset control
- TC support for Geneve and GRE with VF tunnel offload
- more efficient crypto key management method
- multi-port eswitch support
- Netronome/Corigine:
- add DCB IEEE support
- support IPsec offloading for NFP3800
- Freescale/NXP (enetc):
- support XDP_REDIRECT for XDP non-linear buffers
- improve reconfig, avoid link flap and waiting for idle
- support MAC Merge layer
- Other NICs:
- sfc/ef100: add basic devlink support for ef100
- ionic: rx_push mode operation (writing descriptors via MMIO)
- bnxt: use the auxiliary bus abstraction for RDMA
- r8169: disable ASPM and reset bus in case of tx timeout
- cpsw: support QSGMII mode for J721e CPSW9G
- cpts: support pulse-per-second output
- ngbe: add an mdio bus driver
- usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
- r8152: handle devices with FW with NCM support
- amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
- virtio-net: support multi buffer XDP
- virtio/vsock: replace virtio_vsock_pkt with sk_buff
- tsnep: XDP support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add support for latency TLV (in FW control messages)
- Microchip (sparx5):
- separate explicit and implicit traffic forwarding rules, make
the implicit rules always active
- add support for egress DSCP rewrite
- IS0 VCAP support (Ingress Classification)
- IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS
etc.)
- ES2 VCAP support (Egress Access Control)
- support for Per-Stream Filtering and Policing (802.1Q,
8.6.5.1)
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- add MAB (port auth) offload support
- enable PTP receive for mv88e6390
- NXP (ocelot):
- support MAC Merge layer
- support for the the vsc7512 internal copper phys
- Microchip:
- lan9303: convert to PHYLINK
- lan966x: support TC flower filter statistics
- lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
- lan937x: support Credit Based Shaper configuration
- ksz9477: support Energy Efficient Ethernet
- other:
- qca8k: convert to regmap read/write API, use bulk operations
- rswitch: Improve TX timestamp accuracy
- Intel WiFi (iwlwifi):
- EHT (Wi-Fi 7) rate reporting
- STEP equalizer support: transfer some STEP (connection to radio
on platforms with integrated wifi) related parameters from the
BIOS to the firmware.
- Qualcomm 802.11ax WiFi (ath11k):
- IPQ5018 support
- Fine Timing Measurement (FTM) responder role support
- channel 177 support
- MediaTek WiFi (mt76):
- per-PHY LED support
- mt7996: EHT (Wi-Fi 7) support
- Wireless Ethernet Dispatch (WED) reset support
- switch to using page pool allocator
- RealTek WiFi (rtw89):
- support new version of Bluetooth co-existance
- Mobile:
- rmnet: support TX aggregation"
* tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits)
page_pool: add a comment explaining the fragment counter usage
net: ethtool: fix __ethtool_dev_mm_supported() implementation
ethtool: pse-pd: Fix double word in comments
xsk: add linux/vmalloc.h to xsk.c
sefltests: netdevsim: wait for devlink instance after netns removal
selftest: fib_tests: Always cleanup before exit
net/mlx5e: Align IPsec ASO result memory to be as required by hardware
net/mlx5e: TC, Set CT miss to the specific ct action instance
net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
net/mlx5: Refactor tc miss handling to a single function
net/mlx5: Kconfig: Make tc offload depend on tc skb extension
net/sched: flower: Support hardware miss to tc action
net/sched: flower: Move filter handle initialization earlier
net/sched: cls_api: Support hardware miss to tc action
net/sched: Rename user cookie and act cookie
sfc: fix builds without CONFIG_RTC_LIB
sfc: clean up some inconsistent indentings
net/mlx4_en: Introduce flexible array to silence overflow warning
net: lan966x: Fix possible deadlock inside PTP
net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
...
Diffstat (limited to 'arch/arm/kernel/perf_event_v6.c')
-rw-r--r-- | arch/arm/kernel/perf_event_v6.c | 590 |
1 files changed, 590 insertions, 0 deletions
diff --git a/arch/arm/kernel/perf_event_v6.c b/arch/arm/kernel/perf_event_v6.c new file mode 100644 index 000000000..1ae99deee --- /dev/null +++ b/arch/arm/kernel/perf_event_v6.c @@ -0,0 +1,590 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * ARMv6 Performance counter handling code. + * + * Copyright (C) 2009 picoChip Designs, Ltd., Jamie Iles + * + * ARMv6 has 2 configurable performance counters and a single cycle counter. + * They all share a single reset bit but can be written to zero so we can use + * that for a reset. + * + * The counters can't be individually enabled or disabled so when we remove + * one event and replace it with another we could get spurious counts from the + * wrong event. However, we can take advantage of the fact that the + * performance counters can export events to the event bus, and the event bus + * itself can be monitored. This requires that we *don't* export the events to + * the event bus. The procedure for disabling a configurable counter is: + * - change the counter to count the ETMEXTOUT[0] signal (0x20). This + * effectively stops the counter from counting. + * - disable the counter's interrupt generation (each counter has it's + * own interrupt enable bit). + * Once stopped, the counter value can be written as 0 to reset. + * + * To enable a counter: + * - enable the counter's interrupt generation. + * - set the new event type. + * + * Note: the dedicated cycle counter only counts cycles and can't be + * enabled/disabled independently of the others. When we want to disable the + * cycle counter, we have to just disable the interrupt reporting and start + * ignoring that counter. When re-enabling, we have to reset the value and + * enable the interrupt. + */ + +#if defined(CONFIG_CPU_V6) || defined(CONFIG_CPU_V6K) + +#include <asm/cputype.h> +#include <asm/irq_regs.h> + +#include <linux/of.h> +#include <linux/perf/arm_pmu.h> +#include <linux/platform_device.h> + +enum armv6_perf_types { + ARMV6_PERFCTR_ICACHE_MISS = 0x0, + ARMV6_PERFCTR_IBUF_STALL = 0x1, + ARMV6_PERFCTR_DDEP_STALL = 0x2, + ARMV6_PERFCTR_ITLB_MISS = 0x3, + ARMV6_PERFCTR_DTLB_MISS = 0x4, + ARMV6_PERFCTR_BR_EXEC = 0x5, + ARMV6_PERFCTR_BR_MISPREDICT = 0x6, + ARMV6_PERFCTR_INSTR_EXEC = 0x7, + ARMV6_PERFCTR_DCACHE_HIT = 0x9, + ARMV6_PERFCTR_DCACHE_ACCESS = 0xA, + ARMV6_PERFCTR_DCACHE_MISS = 0xB, + ARMV6_PERFCTR_DCACHE_WBACK = 0xC, + ARMV6_PERFCTR_SW_PC_CHANGE = 0xD, + ARMV6_PERFCTR_MAIN_TLB_MISS = 0xF, + ARMV6_PERFCTR_EXPL_D_ACCESS = 0x10, + ARMV6_PERFCTR_LSU_FULL_STALL = 0x11, + ARMV6_PERFCTR_WBUF_DRAINED = 0x12, + ARMV6_PERFCTR_CPU_CYCLES = 0xFF, + ARMV6_PERFCTR_NOP = 0x20, +}; + +enum armv6_counters { + ARMV6_CYCLE_COUNTER = 0, + ARMV6_COUNTER0, + ARMV6_COUNTER1, +}; + +/* + * The hardware events that we support. We do support cache operations but + * we have harvard caches and no way to combine instruction and data + * accesses/misses in hardware. + */ +static const unsigned armv6_perf_map[PERF_COUNT_HW_MAX] = { + PERF_MAP_ALL_UNSUPPORTED, + [PERF_COUNT_HW_CPU_CYCLES] = ARMV6_PERFCTR_CPU_CYCLES, + [PERF_COUNT_HW_INSTRUCTIONS] = ARMV6_PERFCTR_INSTR_EXEC, + [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = ARMV6_PERFCTR_BR_EXEC, + [PERF_COUNT_HW_BRANCH_MISSES] = ARMV6_PERFCTR_BR_MISPREDICT, + [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = ARMV6_PERFCTR_IBUF_STALL, + [PERF_COUNT_HW_STALLED_CYCLES_BACKEND] = ARMV6_PERFCTR_LSU_FULL_STALL, +}; + +static const unsigned armv6_perf_cache_map[PERF_COUNT_HW_CACHE_MAX] + [PERF_COUNT_HW_CACHE_OP_MAX] + [PERF_COUNT_HW_CACHE_RESULT_MAX] = { + PERF_CACHE_MAP_ALL_UNSUPPORTED, + + /* + * The performance counters don't differentiate between read and write + * accesses/misses so this isn't strictly correct, but it's the best we + * can do. Writes and reads get combined. + */ + [C(L1D)][C(OP_READ)][C(RESULT_ACCESS)] = ARMV6_PERFCTR_DCACHE_ACCESS, + [C(L1D)][C(OP_READ)][C(RESULT_MISS)] = ARMV6_PERFCTR_DCACHE_MISS, + [C(L1D)][C(OP_WRITE)][C(RESULT_ACCESS)] = ARMV6_PERFCTR_DCACHE_ACCESS, + [C(L1D)][C(OP_WRITE)][C(RESULT_MISS)] = ARMV6_PERFCTR_DCACHE_MISS, + + [C(L1I)][C(OP_READ)][C(RESULT_MISS)] = ARMV6_PERFCTR_ICACHE_MISS, + + /* + * The ARM performance counters can count micro DTLB misses, micro ITLB + * misses and main TLB misses. There isn't an event for TLB misses, so + * use the micro misses here and if users want the main TLB misses they + * can use a raw counter. + */ + [C(DTLB)][C(OP_READ)][C(RESULT_MISS)] = ARMV6_PERFCTR_DTLB_MISS, + [C(DTLB)][C(OP_WRITE)][C(RESULT_MISS)] = ARMV6_PERFCTR_DTLB_MISS, + + [C(ITLB)][C(OP_READ)][C(RESULT_MISS)] = ARMV6_PERFCTR_ITLB_MISS, + [C(ITLB)][C(OP_WRITE)][C(RESULT_MISS)] = ARMV6_PERFCTR_ITLB_MISS, +}; + +enum armv6mpcore_perf_types { + ARMV6MPCORE_PERFCTR_ICACHE_MISS = 0x0, + ARMV6MPCORE_PERFCTR_IBUF_STALL = 0x1, + ARMV6MPCORE_PERFCTR_DDEP_STALL = 0x2, + ARMV6MPCORE_PERFCTR_ITLB_MISS = 0x3, + ARMV6MPCORE_PERFCTR_DTLB_MISS = 0x4, + ARMV6MPCORE_PERFCTR_BR_EXEC = 0x5, + ARMV6MPCORE_PERFCTR_BR_NOTPREDICT = 0x6, + ARMV6MPCORE_PERFCTR_BR_MISPREDICT = 0x7, + ARMV6MPCORE_PERFCTR_INSTR_EXEC = 0x8, + ARMV6MPCORE_PERFCTR_DCACHE_RDACCESS = 0xA, + ARMV6MPCORE_PERFCTR_DCACHE_RDMISS = 0xB, + ARMV6MPCORE_PERFCTR_DCACHE_WRACCESS = 0xC, + ARMV6MPCORE_PERFCTR_DCACHE_WRMISS = 0xD, + ARMV6MPCORE_PERFCTR_DCACHE_EVICTION = 0xE, + ARMV6MPCORE_PERFCTR_SW_PC_CHANGE = 0xF, + ARMV6MPCORE_PERFCTR_MAIN_TLB_MISS = 0x10, + ARMV6MPCORE_PERFCTR_EXPL_MEM_ACCESS = 0x11, + ARMV6MPCORE_PERFCTR_LSU_FULL_STALL = 0x12, + ARMV6MPCORE_PERFCTR_WBUF_DRAINED = 0x13, + ARMV6MPCORE_PERFCTR_CPU_CYCLES = 0xFF, +}; + +/* + * The hardware events that we support. We do support cache operations but + * we have harvard caches and no way to combine instruction and data + * accesses/misses in hardware. + */ +static const unsigned armv6mpcore_perf_map[PERF_COUNT_HW_MAX] = { + PERF_MAP_ALL_UNSUPPORTED, + [PERF_COUNT_HW_CPU_CYCLES] = ARMV6MPCORE_PERFCTR_CPU_CYCLES, + [PERF_COUNT_HW_INSTRUCTIONS] = ARMV6MPCORE_PERFCTR_INSTR_EXEC, + [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = ARMV6MPCORE_PERFCTR_BR_EXEC, + [PERF_COUNT_HW_BRANCH_MISSES] = ARMV6MPCORE_PERFCTR_BR_MISPREDICT, + [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = ARMV6MPCORE_PERFCTR_IBUF_STALL, + [PERF_COUNT_HW_STALLED_CYCLES_BACKEND] = ARMV6MPCORE_PERFCTR_LSU_FULL_STALL, +}; + +static const unsigned armv6mpcore_perf_cache_map[PERF_COUNT_HW_CACHE_MAX] + [PERF_COUNT_HW_CACHE_OP_MAX] + [PERF_COUNT_HW_CACHE_RESULT_MAX] = { + PERF_CACHE_MAP_ALL_UNSUPPORTED, + + [C(L1D)][C(OP_READ)][C(RESULT_ACCESS)] = ARMV6MPCORE_PERFCTR_DCACHE_RDACCESS, + [C(L1D)][C(OP_READ)][C(RESULT_MISS)] = ARMV6MPCORE_PERFCTR_DCACHE_RDMISS, + [C(L1D)][C(OP_WRITE)][C(RESULT_ACCESS)] = ARMV6MPCORE_PERFCTR_DCACHE_WRACCESS, + [C(L1D)][C(OP_WRITE)][C(RESULT_MISS)] = ARMV6MPCORE_PERFCTR_DCACHE_WRMISS, + + [C(L1I)][C(OP_READ)][C(RESULT_MISS)] = ARMV6MPCORE_PERFCTR_ICACHE_MISS, + + /* + * The ARM performance counters can count micro DTLB misses, micro ITLB + * misses and main TLB misses. There isn't an event for TLB misses, so + * use the micro misses here and if users want the main TLB misses they + * can use a raw counter. + */ + [C(DTLB)][C(OP_READ)][C(RESULT_MISS)] = ARMV6MPCORE_PERFCTR_DTLB_MISS, + [C(DTLB)][C(OP_WRITE)][C(RESULT_MISS)] = ARMV6MPCORE_PERFCTR_DTLB_MISS, + + [C(ITLB)][C(OP_READ)][C(RESULT_MISS)] = ARMV6MPCORE_PERFCTR_ITLB_MISS, + [C(ITLB)][C(OP_WRITE)][C(RESULT_MISS)] = ARMV6MPCORE_PERFCTR_ITLB_MISS, +}; + +static inline unsigned long +armv6_pmcr_read(void) +{ + u32 val; + asm volatile("mrc p15, 0, %0, c15, c12, 0" : "=r"(val)); + return val; +} + +static inline void +armv6_pmcr_write(unsigned long val) +{ + asm volatile("mcr p15, 0, %0, c15, c12, 0" : : "r"(val)); +} + +#define ARMV6_PMCR_ENABLE (1 << 0) +#define ARMV6_PMCR_CTR01_RESET (1 << 1) +#define ARMV6_PMCR_CCOUNT_RESET (1 << 2) +#define ARMV6_PMCR_CCOUNT_DIV (1 << 3) +#define ARMV6_PMCR_COUNT0_IEN (1 << 4) +#define ARMV6_PMCR_COUNT1_IEN (1 << 5) +#define ARMV6_PMCR_CCOUNT_IEN (1 << 6) +#define ARMV6_PMCR_COUNT0_OVERFLOW (1 << 8) +#define ARMV6_PMCR_COUNT1_OVERFLOW (1 << 9) +#define ARMV6_PMCR_CCOUNT_OVERFLOW (1 << 10) +#define ARMV6_PMCR_EVT_COUNT0_SHIFT 20 +#define ARMV6_PMCR_EVT_COUNT0_MASK (0xFF << ARMV6_PMCR_EVT_COUNT0_SHIFT) +#define ARMV6_PMCR_EVT_COUNT1_SHIFT 12 +#define ARMV6_PMCR_EVT_COUNT1_MASK (0xFF << ARMV6_PMCR_EVT_COUNT1_SHIFT) + +#define ARMV6_PMCR_OVERFLOWED_MASK \ + (ARMV6_PMCR_COUNT0_OVERFLOW | ARMV6_PMCR_COUNT1_OVERFLOW | \ + ARMV6_PMCR_CCOUNT_OVERFLOW) + +static inline int +armv6_pmcr_has_overflowed(unsigned long pmcr) +{ + return pmcr & ARMV6_PMCR_OVERFLOWED_MASK; +} + +static inline int +armv6_pmcr_counter_has_overflowed(unsigned long pmcr, + enum armv6_counters counter) +{ + int ret = 0; + + if (ARMV6_CYCLE_COUNTER == counter) + ret = pmcr & ARMV6_PMCR_CCOUNT_OVERFLOW; + else if (ARMV6_COUNTER0 == counter) + ret = pmcr & ARMV6_PMCR_COUNT0_OVERFLOW; + else if (ARMV6_COUNTER1 == counter) + ret = pmcr & ARMV6_PMCR_COUNT1_OVERFLOW; + else + WARN_ONCE(1, "invalid counter number (%d)\n", counter); + + return ret; +} + +static inline u64 armv6pmu_read_counter(struct perf_event *event) +{ + struct hw_perf_event *hwc = &event->hw; + int counter = hwc->idx; + unsigned long value = 0; + + if (ARMV6_CYCLE_COUNTER == counter) + asm volatile("mrc p15, 0, %0, c15, c12, 1" : "=r"(value)); + else if (ARMV6_COUNTER0 == counter) + asm volatile("mrc p15, 0, %0, c15, c12, 2" : "=r"(value)); + else if (ARMV6_COUNTER1 == counter) + asm volatile("mrc p15, 0, %0, c15, c12, 3" : "=r"(value)); + else + WARN_ONCE(1, "invalid counter number (%d)\n", counter); + + return value; +} + +static inline void armv6pmu_write_counter(struct perf_event *event, u64 value) +{ + struct hw_perf_event *hwc = &event->hw; + int counter = hwc->idx; + + if (ARMV6_CYCLE_COUNTER == counter) + asm volatile("mcr p15, 0, %0, c15, c12, 1" : : "r"(value)); + else if (ARMV6_COUNTER0 == counter) + asm volatile("mcr p15, 0, %0, c15, c12, 2" : : "r"(value)); + else if (ARMV6_COUNTER1 == counter) + asm volatile("mcr p15, 0, %0, c15, c12, 3" : : "r"(value)); + else + WARN_ONCE(1, "invalid counter number (%d)\n", counter); +} + +static void armv6pmu_enable_event(struct perf_event *event) +{ + unsigned long val, mask, evt, flags; + struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu); + struct hw_perf_event *hwc = &event->hw; + struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events); + int idx = hwc->idx; + + if (ARMV6_CYCLE_COUNTER == idx) { + mask = 0; + evt = ARMV6_PMCR_CCOUNT_IEN; + } else if (ARMV6_COUNTER0 == idx) { + mask = ARMV6_PMCR_EVT_COUNT0_MASK; + evt = (hwc->config_base << ARMV6_PMCR_EVT_COUNT0_SHIFT) | + ARMV6_PMCR_COUNT0_IEN; + } else if (ARMV6_COUNTER1 == idx) { + mask = ARMV6_PMCR_EVT_COUNT1_MASK; + evt = (hwc->config_base << ARMV6_PMCR_EVT_COUNT1_SHIFT) | + ARMV6_PMCR_COUNT1_IEN; + } else { + WARN_ONCE(1, "invalid counter number (%d)\n", idx); + return; + } + + /* + * Mask out the current event and set the counter to count the event + * that we're interested in. + */ + raw_spin_lock_irqsave(&events->pmu_lock, flags); + val = armv6_pmcr_read(); + val &= ~mask; + val |= evt; + armv6_pmcr_write(val); + raw_spin_unlock_irqrestore(&events->pmu_lock, flags); +} + +static irqreturn_t +armv6pmu_handle_irq(struct arm_pmu *cpu_pmu) +{ + unsigned long pmcr = armv6_pmcr_read(); + struct perf_sample_data data; + struct pmu_hw_events *cpuc = this_cpu_ptr(cpu_pmu->hw_events); + struct pt_regs *regs; + int idx; + + if (!armv6_pmcr_has_overflowed(pmcr)) + return IRQ_NONE; + + regs = get_irq_regs(); + + /* + * The interrupts are cleared by writing the overflow flags back to + * the control register. All of the other bits don't have any effect + * if they are rewritten, so write the whole value back. + */ + armv6_pmcr_write(pmcr); + + for (idx = 0; idx < cpu_pmu->num_events; ++idx) { + struct perf_event *event = cpuc->events[idx]; + struct hw_perf_event *hwc; + + /* Ignore if we don't have an event. */ + if (!event) + continue; + + /* + * We have a single interrupt for all counters. Check that + * each counter has overflowed before we process it. + */ + if (!armv6_pmcr_counter_has_overflowed(pmcr, idx)) + continue; + + hwc = &event->hw; + armpmu_event_update(event); + perf_sample_data_init(&data, 0, hwc->last_period); + if (!armpmu_event_set_period(event)) + continue; + + if (perf_event_overflow(event, &data, regs)) + cpu_pmu->disable(event); + } + + /* + * Handle the pending perf events. + * + * Note: this call *must* be run with interrupts disabled. For + * platforms that can have the PMU interrupts raised as an NMI, this + * will not work. + */ + irq_work_run(); + + return IRQ_HANDLED; +} + +static void armv6pmu_start(struct arm_pmu *cpu_pmu) +{ + unsigned long flags, val; + struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events); + + raw_spin_lock_irqsave(&events->pmu_lock, flags); + val = armv6_pmcr_read(); + val |= ARMV6_PMCR_ENABLE; + armv6_pmcr_write(val); + raw_spin_unlock_irqrestore(&events->pmu_lock, flags); +} + +static void armv6pmu_stop(struct arm_pmu *cpu_pmu) +{ + unsigned long flags, val; + struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events); + + raw_spin_lock_irqsave(&events->pmu_lock, flags); + val = armv6_pmcr_read(); + val &= ~ARMV6_PMCR_ENABLE; + armv6_pmcr_write(val); + raw_spin_unlock_irqrestore(&events->pmu_lock, flags); +} + +static int +armv6pmu_get_event_idx(struct pmu_hw_events *cpuc, + struct perf_event *event) +{ + struct hw_perf_event *hwc = &event->hw; + /* Always place a cycle counter into the cycle counter. */ + if (ARMV6_PERFCTR_CPU_CYCLES == hwc->config_base) { + if (test_and_set_bit(ARMV6_CYCLE_COUNTER, cpuc->used_mask)) + return -EAGAIN; + + return ARMV6_CYCLE_COUNTER; + } else { + /* + * For anything other than a cycle counter, try and use + * counter0 and counter1. + */ + if (!test_and_set_bit(ARMV6_COUNTER1, cpuc->used_mask)) + return ARMV6_COUNTER1; + + if (!test_and_set_bit(ARMV6_COUNTER0, cpuc->used_mask)) + return ARMV6_COUNTER0; + + /* The counters are all in use. */ + return -EAGAIN; + } +} + +static void armv6pmu_clear_event_idx(struct pmu_hw_events *cpuc, + struct perf_event *event) +{ + clear_bit(event->hw.idx, cpuc->used_mask); +} + +static void armv6pmu_disable_event(struct perf_event *event) +{ + unsigned long val, mask, evt, flags; + struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu); + struct hw_perf_event *hwc = &event->hw; + struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events); + int idx = hwc->idx; + + if (ARMV6_CYCLE_COUNTER == idx) { + mask = ARMV6_PMCR_CCOUNT_IEN; + evt = 0; + } else if (ARMV6_COUNTER0 == idx) { + mask = ARMV6_PMCR_COUNT0_IEN | ARMV6_PMCR_EVT_COUNT0_MASK; + evt = ARMV6_PERFCTR_NOP << ARMV6_PMCR_EVT_COUNT0_SHIFT; + } else if (ARMV6_COUNTER1 == idx) { + mask = ARMV6_PMCR_COUNT1_IEN | ARMV6_PMCR_EVT_COUNT1_MASK; + evt = ARMV6_PERFCTR_NOP << ARMV6_PMCR_EVT_COUNT1_SHIFT; + } else { + WARN_ONCE(1, "invalid counter number (%d)\n", idx); + return; + } + + /* + * Mask out the current event and set the counter to count the number + * of ETM bus signal assertion cycles. The external reporting should + * be disabled and so this should never increment. + */ + raw_spin_lock_irqsave(&events->pmu_lock, flags); + val = armv6_pmcr_read(); + val &= ~mask; + val |= evt; + armv6_pmcr_write(val); + raw_spin_unlock_irqrestore(&events->pmu_lock, flags); +} + +static void armv6mpcore_pmu_disable_event(struct perf_event *event) +{ + unsigned long val, mask, flags, evt = 0; + struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu); + struct hw_perf_event *hwc = &event->hw; + struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events); + int idx = hwc->idx; + + if (ARMV6_CYCLE_COUNTER == idx) { + mask = ARMV6_PMCR_CCOUNT_IEN; + } else if (ARMV6_COUNTER0 == idx) { + mask = ARMV6_PMCR_COUNT0_IEN; + } else if (ARMV6_COUNTER1 == idx) { + mask = ARMV6_PMCR_COUNT1_IEN; + } else { + WARN_ONCE(1, "invalid counter number (%d)\n", idx); + return; + } + + /* + * Unlike UP ARMv6, we don't have a way of stopping the counters. We + * simply disable the interrupt reporting. + */ + raw_spin_lock_irqsave(&events->pmu_lock, flags); + val = armv6_pmcr_read(); + val &= ~mask; + val |= evt; + armv6_pmcr_write(val); + raw_spin_unlock_irqrestore(&events->pmu_lock, flags); +} + +static int armv6_map_event(struct perf_event *event) +{ + return armpmu_map_event(event, &armv6_perf_map, + &armv6_perf_cache_map, 0xFF); +} + +static void armv6pmu_init(struct arm_pmu *cpu_pmu) +{ + cpu_pmu->handle_irq = armv6pmu_handle_irq; + cpu_pmu->enable = armv6pmu_enable_event; + cpu_pmu->disable = armv6pmu_disable_event; + cpu_pmu->read_counter = armv6pmu_read_counter; + cpu_pmu->write_counter = armv6pmu_write_counter; + cpu_pmu->get_event_idx = armv6pmu_get_event_idx; + cpu_pmu->clear_event_idx = armv6pmu_clear_event_idx; + cpu_pmu->start = armv6pmu_start; + cpu_pmu->stop = armv6pmu_stop; + cpu_pmu->map_event = armv6_map_event; + cpu_pmu->num_events = 3; +} + +static int armv6_1136_pmu_init(struct arm_pmu *cpu_pmu) +{ + armv6pmu_init(cpu_pmu); + cpu_pmu->name = "armv6_1136"; + return 0; +} + +static int armv6_1156_pmu_init(struct arm_pmu *cpu_pmu) +{ + armv6pmu_init(cpu_pmu); + cpu_pmu->name = "armv6_1156"; + return 0; +} + +static int armv6_1176_pmu_init(struct arm_pmu *cpu_pmu) +{ + armv6pmu_init(cpu_pmu); + cpu_pmu->name = "armv6_1176"; + return 0; +} + +/* + * ARMv6mpcore is almost identical to single core ARMv6 with the exception + * that some of the events have different enumerations and that there is no + * *hack* to stop the programmable counters. To stop the counters we simply + * disable the interrupt reporting and update the event. When unthrottling we + * reset the period and enable the interrupt reporting. + */ + +static int armv6mpcore_map_event(struct perf_event *event) +{ + return armpmu_map_event(event, &armv6mpcore_perf_map, + &armv6mpcore_perf_cache_map, 0xFF); +} + +static int armv6mpcore_pmu_init(struct arm_pmu *cpu_pmu) +{ + cpu_pmu->name = "armv6_11mpcore"; + cpu_pmu->handle_irq = armv6pmu_handle_irq; + cpu_pmu->enable = armv6pmu_enable_event; + cpu_pmu->disable = armv6mpcore_pmu_disable_event; + cpu_pmu->read_counter = armv6pmu_read_counter; + cpu_pmu->write_counter = armv6pmu_write_counter; + cpu_pmu->get_event_idx = armv6pmu_get_event_idx; + cpu_pmu->clear_event_idx = armv6pmu_clear_event_idx; + cpu_pmu->start = armv6pmu_start; + cpu_pmu->stop = armv6pmu_stop; + cpu_pmu->map_event = armv6mpcore_map_event; + cpu_pmu->num_events = 3; + + return 0; +} + +static const struct of_device_id armv6_pmu_of_device_ids[] = { + {.compatible = "arm,arm11mpcore-pmu", .data = armv6mpcore_pmu_init}, + {.compatible = "arm,arm1176-pmu", .data = armv6_1176_pmu_init}, + {.compatible = "arm,arm1136-pmu", .data = armv6_1136_pmu_init}, + { /* sentinel value */ } +}; + +static const struct pmu_probe_info armv6_pmu_probe_table[] = { + ARM_PMU_PROBE(ARM_CPU_PART_ARM1136, armv6_1136_pmu_init), + ARM_PMU_PROBE(ARM_CPU_PART_ARM1156, armv6_1156_pmu_init), + ARM_PMU_PROBE(ARM_CPU_PART_ARM1176, armv6_1176_pmu_init), + ARM_PMU_PROBE(ARM_CPU_PART_ARM11MPCORE, armv6mpcore_pmu_init), + { /* sentinel value */ } +}; + +static int armv6_pmu_device_probe(struct platform_device *pdev) +{ + return arm_pmu_device_probe(pdev, armv6_pmu_of_device_ids, + armv6_pmu_probe_table); +} + +static struct platform_driver armv6_pmu_driver = { + .driver = { + .name = "armv6-pmu", + .of_match_table = armv6_pmu_of_device_ids, + }, + .probe = armv6_pmu_device_probe, +}; + +builtin_platform_driver(armv6_pmu_driver); +#endif /* CONFIG_CPU_V6 || CONFIG_CPU_V6K */ |