aboutsummaryrefslogtreecommitdiff
path: root/tools/perf/Documentation/perf-diff.txt
diff options
context:
space:
mode:
authorLibravatar Linus Torvalds <torvalds@linux-foundation.org>2023-02-21 18:24:12 -0800
committerLibravatar Linus Torvalds <torvalds@linux-foundation.org>2023-02-21 18:24:12 -0800
commit5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 (patch)
treecc5c2d0a898769fd59549594fedb3ee6f84e59a0 /tools/perf/Documentation/perf-diff.txt
downloadlinux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.tar.gz
linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.zip
Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextgrafted
Pull networking updates from Jakub Kicinski: "Core: - Add dedicated kmem_cache for typical/small skb->head, avoid having to access struct page at kfree time, and improve memory use. - Introduce sysctl to set default RPS configuration for new netdevs. - Define Netlink protocol specification format which can be used to describe messages used by each family and auto-generate parsers. Add tools for generating kernel data structures and uAPI headers. - Expose all net/core sysctls inside netns. - Remove 4s sleep in netpoll if carrier is instantly detected on boot. - Add configurable limit of MDB entries per port, and port-vlan. - Continue populating drop reasons throughout the stack. - Retire a handful of legacy Qdiscs and classifiers. Protocols: - Support IPv4 big TCP (TSO frames larger than 64kB). - Add IP_LOCAL_PORT_RANGE socket option, to control local port range on socket by socket basis. - Track and report in procfs number of MPTCP sockets used. - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path manager. - IPv6: don't check net.ipv6.route.max_size and rely on garbage collection to free memory (similarly to IPv4). - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986). - ICMP: add per-rate limit counters. - Add support for user scanning requests in ieee802154. - Remove static WEP support. - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate reporting. - WiFi 7 EHT channel puncturing support (client & AP). BPF: - Add a rbtree data structure following the "next-gen data structure" precedent set by recently added linked list, that is, by using kfunc + kptr instead of adding a new BPF map type. - Expose XDP hints via kfuncs with initial support for RX hash and timestamp metadata. - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better support decap on GRE tunnel devices not operating in collect metadata. - Improve x86 JIT's codegen for PROBE_MEM runtime error checks. - Remove the need for trace_printk_lock for bpf_trace_printk and bpf_trace_vprintk helpers. - Extend libbpf's bpf_tracing.h support for tracing arguments of kprobes/uprobes and syscall as a special case. - Significantly reduce the search time for module symbols by livepatch and BPF. - Enable cpumasks to be used as kptrs, which is useful for tracing programs tracking which tasks end up running on which CPUs in different time intervals. - Add support for BPF trampoline on s390x and riscv64. - Add capability to export the XDP features supported by the NIC. - Add __bpf_kfunc tag for marking kernel functions as kfuncs. - Add cgroup.memory=nobpf kernel parameter option to disable BPF memory accounting for container environments. Netfilter: - Remove the CLUSTERIP target. It has been marked as obsolete for years, and we still have WARN splats wrt races of the out-of-band /proc interface installed by this target. - Add 'destroy' commands to nf_tables. They are identical to the existing 'delete' commands, but do not return an error if the referenced object (set, chain, rule...) did not exist. Driver API: - Improve cpumask_local_spread() locality to help NICs set the right IRQ affinity on AMD platforms. - Separate C22 and C45 MDIO bus transactions more clearly. - Introduce new DCB table to control DSCP rewrite on egress. - Support configuration of Physical Layer Collision Avoidance (PLCA) Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of shared medium Ethernet. - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing preemption of low priority frames by high priority frames. - Add support for controlling MACSec offload using netlink SET. - Rework devlink instance refcounts to allow registration and de-registration under the instance lock. Split the code into multiple files, drop some of the unnecessarily granular locks and factor out common parts of netlink operation handling. - Add TX frame aggregation parameters (for USB drivers). - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning messages with notifications for debug. - Allow offloading of UDP NEW connections via act_ct. - Add support for per action HW stats in TC. - Support hardware miss to TC action (continue processing in SW from a specific point in the action chain). - Warn if old Wireless Extension user space interface is used with modern cfg80211/mac80211 drivers. Do not support Wireless Extensions for Wi-Fi 7 devices at all. Everyone should switch to using nl80211 interface instead. - Improve the CAN bit timing configuration. Use extack to return error messages directly to user space, update the SJW handling, including the definition of a new default value that will benefit CAN-FD controllers, by increasing their oscillator tolerance. New hardware / drivers: - Ethernet: - nVidia BlueField-3 support (control traffic driver) - Ethernet support for imx93 SoCs - Motorcomm yt8531 gigabit Ethernet PHY - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA) - Microchip LAN8841 PHY (incl. cable diagnostics and PTP) - Amlogic gxl MDIO mux - WiFi: - RealTek RTL8188EU (rtl8xxxu) - Qualcomm Wi-Fi 7 devices (ath12k) - CAN: - Renesas R-Car V4H Drivers: - Bluetooth: - Set Per Platform Antenna Gain (PPAG) for Intel controllers. - Ethernet NICs: - Intel (1G, igc): - support TSN / Qbv / packet scheduling features of i226 model - Intel (100G, ice): - use GNSS subsystem instead of TTY - multi-buffer XDP support - extend support for GPIO pins to E823 devices - nVidia/Mellanox: - update the shared buffer configuration on PFC commands - implement PTP adjphase function for HW offset control - TC support for Geneve and GRE with VF tunnel offload - more efficient crypto key management method - multi-port eswitch support - Netronome/Corigine: - add DCB IEEE support - support IPsec offloading for NFP3800 - Freescale/NXP (enetc): - support XDP_REDIRECT for XDP non-linear buffers - improve reconfig, avoid link flap and waiting for idle - support MAC Merge layer - Other NICs: - sfc/ef100: add basic devlink support for ef100 - ionic: rx_push mode operation (writing descriptors via MMIO) - bnxt: use the auxiliary bus abstraction for RDMA - r8169: disable ASPM and reset bus in case of tx timeout - cpsw: support QSGMII mode for J721e CPSW9G - cpts: support pulse-per-second output - ngbe: add an mdio bus driver - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing - r8152: handle devices with FW with NCM support - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation - virtio-net: support multi buffer XDP - virtio/vsock: replace virtio_vsock_pkt with sk_buff - tsnep: XDP support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add support for latency TLV (in FW control messages) - Microchip (sparx5): - separate explicit and implicit traffic forwarding rules, make the implicit rules always active - add support for egress DSCP rewrite - IS0 VCAP support (Ingress Classification) - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS etc.) - ES2 VCAP support (Egress Access Control) - support for Per-Stream Filtering and Policing (802.1Q, 8.6.5.1) - Ethernet embedded switches: - Marvell (mv88e6xxx): - add MAB (port auth) offload support - enable PTP receive for mv88e6390 - NXP (ocelot): - support MAC Merge layer - support for the the vsc7512 internal copper phys - Microchip: - lan9303: convert to PHYLINK - lan966x: support TC flower filter statistics - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x - lan937x: support Credit Based Shaper configuration - ksz9477: support Energy Efficient Ethernet - other: - qca8k: convert to regmap read/write API, use bulk operations - rswitch: Improve TX timestamp accuracy - Intel WiFi (iwlwifi): - EHT (Wi-Fi 7) rate reporting - STEP equalizer support: transfer some STEP (connection to radio on platforms with integrated wifi) related parameters from the BIOS to the firmware. - Qualcomm 802.11ax WiFi (ath11k): - IPQ5018 support - Fine Timing Measurement (FTM) responder role support - channel 177 support - MediaTek WiFi (mt76): - per-PHY LED support - mt7996: EHT (Wi-Fi 7) support - Wireless Ethernet Dispatch (WED) reset support - switch to using page pool allocator - RealTek WiFi (rtw89): - support new version of Bluetooth co-existance - Mobile: - rmnet: support TX aggregation" * tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits) page_pool: add a comment explaining the fragment counter usage net: ethtool: fix __ethtool_dev_mm_supported() implementation ethtool: pse-pd: Fix double word in comments xsk: add linux/vmalloc.h to xsk.c sefltests: netdevsim: wait for devlink instance after netns removal selftest: fib_tests: Always cleanup before exit net/mlx5e: Align IPsec ASO result memory to be as required by hardware net/mlx5e: TC, Set CT miss to the specific ct action instance net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG net/mlx5: Refactor tc miss handling to a single function net/mlx5: Kconfig: Make tc offload depend on tc skb extension net/sched: flower: Support hardware miss to tc action net/sched: flower: Move filter handle initialization earlier net/sched: cls_api: Support hardware miss to tc action net/sched: Rename user cookie and act cookie sfc: fix builds without CONFIG_RTC_LIB sfc: clean up some inconsistent indentings net/mlx4_en: Introduce flexible array to silence overflow warning net: lan966x: Fix possible deadlock inside PTP net/ulp: Remove redundant ->clone() test in inet_clone_ulp(). ...
Diffstat (limited to 'tools/perf/Documentation/perf-diff.txt')
-rw-r--r--tools/perf/Documentation/perf-diff.txt305
1 files changed, 305 insertions, 0 deletions
diff --git a/tools/perf/Documentation/perf-diff.txt b/tools/perf/Documentation/perf-diff.txt
new file mode 100644
index 000000000..f3067a4af
--- /dev/null
+++ b/tools/perf/Documentation/perf-diff.txt
@@ -0,0 +1,305 @@
+perf-diff(1)
+============
+
+NAME
+----
+perf-diff - Read perf.data files and display the differential profile
+
+SYNOPSIS
+--------
+[verse]
+'perf diff' [baseline file] [data file1] [[data file2] ... ]
+
+DESCRIPTION
+-----------
+This command displays the performance difference amongst two or more perf.data
+files captured via perf record.
+
+If no parameters are passed it will assume perf.data.old and perf.data.
+
+The differential profile is displayed only for events matching both
+specified perf.data files.
+
+If no parameters are passed the samples will be sorted by dso and symbol.
+As the perf.data files could come from different binaries, the symbols addresses
+could vary. So perf diff is based on the comparison of the files and
+symbols name.
+
+OPTIONS
+-------
+-D::
+--dump-raw-trace::
+ Dump raw trace in ASCII.
+
+--kallsyms=<file>::
+ kallsyms pathname
+
+-m::
+--modules::
+ Load module symbols. WARNING: use only with -k and LIVE kernel
+
+-d::
+--dsos=::
+ Only consider symbols in these dsos. CSV that understands
+ file://filename entries. This option will affect the percentage
+ of the Baseline/Delta column. See --percentage for more info.
+
+-C::
+--comms=::
+ Only consider symbols in these comms. CSV that understands
+ file://filename entries. This option will affect the percentage
+ of the Baseline/Delta column. See --percentage for more info.
+
+-S::
+--symbols=::
+ Only consider these symbols. CSV that understands
+ file://filename entries. This option will affect the percentage
+ of the Baseline/Delta column. See --percentage for more info.
+
+-s::
+--sort=::
+ Sort by key(s): pid, comm, dso, symbol, cpu, parent, srcline.
+ Please see description of --sort in the perf-report man page.
+
+-t::
+--field-separator=::
+
+ Use a special separator character and don't pad with spaces, replacing
+ all occurrences of this separator in symbol names (and other output)
+ with a '.' character, that thus it's the only non valid separator.
+
+-v::
+--verbose::
+ Be verbose, for instance, show the raw counts in addition to the
+ diff.
+
+-q::
+--quiet::
+ Do not show any warnings or messages. (Suppress -v)
+
+-f::
+--force::
+ Don't do ownership validation.
+
+--symfs=<directory>::
+ Look for files with symbols relative to this directory.
+
+-b::
+--baseline-only::
+ Show only items with match in baseline.
+
+-c::
+--compute::
+ Differential computation selection - delta, ratio, wdiff, cycles,
+ delta-abs (default is delta-abs). Default can be changed using
+ diff.compute config option. See COMPARISON METHODS section for
+ more info.
+
+--cycles-hist::
+ Report a histogram and the standard deviation for cycles data.
+ It can help us to judge if the reported cycles data is noisy or
+ not. This option should be used with '-c cycles'.
+
+-p::
+--period::
+ Show period values for both compared hist entries.
+
+-F::
+--formula::
+ Show formula for given computation.
+
+-o::
+--order::
+ Specify compute sorting column number. 0 means sorting by baseline
+ overhead and 1 (default) means sorting by computed value of column 1
+ (data from the first file other base baseline). Values more than 1
+ can be used only if enough data files are provided.
+ The default value can be set using the diff.order config option.
+
+--percentage::
+ Determine how to display the overhead percentage of filtered entries.
+ Filters can be applied by --comms, --dsos and/or --symbols options.
+
+ "relative" means it's relative to filtered entries only so that the
+ sum of shown entries will be always 100%. "absolute" means it retains
+ the original value before and after the filter is applied.
+
+--time::
+ Analyze samples within given time window. It supports time
+ percent with multiple time ranges. Time string is 'a%/n,b%/m,...'
+ or 'a%-b%,c%-%d,...'.
+
+ For example:
+
+ Select the second 10% time slice to diff:
+
+ perf diff --time 10%/2
+
+ Select from 0% to 10% time slice to diff:
+
+ perf diff --time 0%-10%
+
+ Select the first and the second 10% time slices to diff:
+
+ perf diff --time 10%/1,10%/2
+
+ Select from 0% to 10% and 30% to 40% slices to diff:
+
+ perf diff --time 0%-10%,30%-40%
+
+ It also supports analyzing samples within a given time window
+ <start>,<stop>. Times have the format seconds.nanoseconds. If 'start'
+ is not given (i.e. time string is ',x.y') then analysis starts at
+ the beginning of the file. If stop time is not given (i.e. time
+ string is 'x.y,') then analysis goes to the end of the file.
+ Multiple ranges can be separated by spaces, which requires the argument
+ to be quoted e.g. --time "1234.567,1234.789 1235,"
+ Time string is'a1.b1,c1.d1:a2.b2,c2.d2'. Use ':' to separate timestamps
+ for different perf.data files.
+
+ For example, we get the timestamp information from 'perf script'.
+
+ perf script -i perf.data.old
+ mgen 13940 [000] 3946.361400: ...
+
+ perf script -i perf.data
+ mgen 13940 [000] 3971.150589 ...
+
+ perf diff --time 3946.361400,:3971.150589,
+
+ It analyzes the perf.data.old from the timestamp 3946.361400 to
+ the end of perf.data.old and analyzes the perf.data from the
+ timestamp 3971.150589 to the end of perf.data.
+
+--cpu:: Only diff samples for the list of CPUs provided. Multiple CPUs can
+ be provided as a comma-separated list with no space: 0,1. Ranges of
+ CPUs are specified with -: 0-2. Default is to report samples on all
+ CPUs.
+
+--pid=::
+ Only diff samples for given process ID (comma separated list).
+
+--tid=::
+ Only diff samples for given thread ID (comma separated list).
+
+--stream::
+ Enable hot streams comparison. Stream can be a callchain which is
+ aggregated by the branch records from samples.
+
+COMPARISON
+----------
+The comparison is governed by the baseline file. The baseline perf.data
+file is iterated for samples. All other perf.data files specified on
+the command line are searched for the baseline sample pair. If the pair
+is found, specified computation is made and result is displayed.
+
+All samples from non-baseline perf.data files, that do not match any
+baseline entry, are displayed with empty space within baseline column
+and possible computation results (delta) in their related column.
+
+Example files samples:
+- file A with samples f1, f2, f3, f4, f6
+- file B with samples f2, f4, f5
+- file C with samples f1, f2, f5
+
+Example output:
+ x - computation takes place for pair
+ b - baseline sample percentage
+
+- perf diff A B C
+
+ baseline/A compute/B compute/C samples
+ ---------------------------------------
+ b x f1
+ b x x f2
+ b f3
+ b x f4
+ b f6
+ x x f5
+
+- perf diff B A C
+
+ baseline/B compute/A compute/C samples
+ ---------------------------------------
+ b x x f2
+ b x f4
+ b x f5
+ x x f1
+ x f3
+ x f6
+
+- perf diff C B A
+
+ baseline/C compute/B compute/A samples
+ ---------------------------------------
+ b x f1
+ b x x f2
+ b x f5
+ x f3
+ x x f4
+ x f6
+
+COMPARISON METHODS
+------------------
+delta
+~~~~~
+If specified the 'Delta' column is displayed with value 'd' computed as:
+
+ d = A->period_percent - B->period_percent
+
+with:
+ - A/B being matching hist entry from data/baseline file specified
+ (or perf.data/perf.data.old) respectively.
+
+ - period_percent being the % of the hist entry period value within
+ single data file
+
+ - with filtering by -C, -d and/or -S, period_percent might be changed
+ relative to how entries are filtered. Use --percentage=absolute to
+ prevent such fluctuation.
+
+delta-abs
+~~~~~~~~~
+Same as 'delta` method, but sort the result with the absolute values.
+
+ratio
+~~~~~
+If specified the 'Ratio' column is displayed with value 'r' computed as:
+
+ r = A->period / B->period
+
+with:
+ - A/B being matching hist entry from data/baseline file specified
+ (or perf.data/perf.data.old) respectively.
+
+ - period being the hist entry period value
+
+wdiff:WEIGHT-B,WEIGHT-A
+~~~~~~~~~~~~~~~~~~~~~~~
+If specified the 'Weighted diff' column is displayed with value 'd' computed as:
+
+ d = B->period * WEIGHT-A - A->period * WEIGHT-B
+
+ - A/B being matching hist entry from data/baseline file specified
+ (or perf.data/perf.data.old) respectively.
+
+ - period being the hist entry period value
+
+ - WEIGHT-A/WEIGHT-B being user supplied weights in the the '-c' option
+ behind ':' separator like '-c wdiff:1,2'.
+ - WEIGHT-A being the weight of the data file
+ - WEIGHT-B being the weight of the baseline data file
+
+cycles
+~~~~~~
+If specified the '[Program Block Range] Cycles Diff' column is displayed.
+It displays the cycles difference of same program basic block amongst
+two perf.data. The program basic block is the code between two branches.
+
+'[Program Block Range]' indicates the range of a program basic block.
+Source line is reported if it can be found otherwise uses symbol+offset
+instead.
+
+SEE ALSO
+--------
+linkperf:perf-record[1], linkperf:perf-report[1]