diff options
author | 2023-02-21 18:24:12 -0800 | |
---|---|---|
committer | 2023-02-21 18:24:12 -0800 | |
commit | 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 (patch) | |
tree | cc5c2d0a898769fd59549594fedb3ee6f84e59a0 /tools/perf/Documentation/jitdump-specification.txt | |
download | linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.tar.gz linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.zip |
Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextgrafted
Pull networking updates from Jakub Kicinski:
"Core:
- Add dedicated kmem_cache for typical/small skb->head, avoid having
to access struct page at kfree time, and improve memory use.
- Introduce sysctl to set default RPS configuration for new netdevs.
- Define Netlink protocol specification format which can be used to
describe messages used by each family and auto-generate parsers.
Add tools for generating kernel data structures and uAPI headers.
- Expose all net/core sysctls inside netns.
- Remove 4s sleep in netpoll if carrier is instantly detected on
boot.
- Add configurable limit of MDB entries per port, and port-vlan.
- Continue populating drop reasons throughout the stack.
- Retire a handful of legacy Qdiscs and classifiers.
Protocols:
- Support IPv4 big TCP (TSO frames larger than 64kB).
- Add IP_LOCAL_PORT_RANGE socket option, to control local port range
on socket by socket basis.
- Track and report in procfs number of MPTCP sockets used.
- Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path
manager.
- IPv6: don't check net.ipv6.route.max_size and rely on garbage
collection to free memory (similarly to IPv4).
- Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).
- ICMP: add per-rate limit counters.
- Add support for user scanning requests in ieee802154.
- Remove static WEP support.
- Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
reporting.
- WiFi 7 EHT channel puncturing support (client & AP).
BPF:
- Add a rbtree data structure following the "next-gen data structure"
precedent set by recently added linked list, that is, by using
kfunc + kptr instead of adding a new BPF map type.
- Expose XDP hints via kfuncs with initial support for RX hash and
timestamp metadata.
- Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to
better support decap on GRE tunnel devices not operating in collect
metadata.
- Improve x86 JIT's codegen for PROBE_MEM runtime error checks.
- Remove the need for trace_printk_lock for bpf_trace_printk and
bpf_trace_vprintk helpers.
- Extend libbpf's bpf_tracing.h support for tracing arguments of
kprobes/uprobes and syscall as a special case.
- Significantly reduce the search time for module symbols by
livepatch and BPF.
- Enable cpumasks to be used as kptrs, which is useful for tracing
programs tracking which tasks end up running on which CPUs in
different time intervals.
- Add support for BPF trampoline on s390x and riscv64.
- Add capability to export the XDP features supported by the NIC.
- Add __bpf_kfunc tag for marking kernel functions as kfuncs.
- Add cgroup.memory=nobpf kernel parameter option to disable BPF
memory accounting for container environments.
Netfilter:
- Remove the CLUSTERIP target. It has been marked as obsolete for
years, and we still have WARN splats wrt races of the out-of-band
/proc interface installed by this target.
- Add 'destroy' commands to nf_tables. They are identical to the
existing 'delete' commands, but do not return an error if the
referenced object (set, chain, rule...) did not exist.
Driver API:
- Improve cpumask_local_spread() locality to help NICs set the right
IRQ affinity on AMD platforms.
- Separate C22 and C45 MDIO bus transactions more clearly.
- Introduce new DCB table to control DSCP rewrite on egress.
- Support configuration of Physical Layer Collision Avoidance (PLCA)
Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
shared medium Ethernet.
- Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
preemption of low priority frames by high priority frames.
- Add support for controlling MACSec offload using netlink SET.
- Rework devlink instance refcounts to allow registration and
de-registration under the instance lock. Split the code into
multiple files, drop some of the unnecessarily granular locks and
factor out common parts of netlink operation handling.
- Add TX frame aggregation parameters (for USB drivers).
- Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
messages with notifications for debug.
- Allow offloading of UDP NEW connections via act_ct.
- Add support for per action HW stats in TC.
- Support hardware miss to TC action (continue processing in SW from
a specific point in the action chain).
- Warn if old Wireless Extension user space interface is used with
modern cfg80211/mac80211 drivers. Do not support Wireless
Extensions for Wi-Fi 7 devices at all. Everyone should switch to
using nl80211 interface instead.
- Improve the CAN bit timing configuration. Use extack to return
error messages directly to user space, update the SJW handling,
including the definition of a new default value that will benefit
CAN-FD controllers, by increasing their oscillator tolerance.
New hardware / drivers:
- Ethernet:
- nVidia BlueField-3 support (control traffic driver)
- Ethernet support for imx93 SoCs
- Motorcomm yt8531 gigabit Ethernet PHY
- onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
- Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
- Amlogic gxl MDIO mux
- WiFi:
- RealTek RTL8188EU (rtl8xxxu)
- Qualcomm Wi-Fi 7 devices (ath12k)
- CAN:
- Renesas R-Car V4H
Drivers:
- Bluetooth:
- Set Per Platform Antenna Gain (PPAG) for Intel controllers.
- Ethernet NICs:
- Intel (1G, igc):
- support TSN / Qbv / packet scheduling features of i226 model
- Intel (100G, ice):
- use GNSS subsystem instead of TTY
- multi-buffer XDP support
- extend support for GPIO pins to E823 devices
- nVidia/Mellanox:
- update the shared buffer configuration on PFC commands
- implement PTP adjphase function for HW offset control
- TC support for Geneve and GRE with VF tunnel offload
- more efficient crypto key management method
- multi-port eswitch support
- Netronome/Corigine:
- add DCB IEEE support
- support IPsec offloading for NFP3800
- Freescale/NXP (enetc):
- support XDP_REDIRECT for XDP non-linear buffers
- improve reconfig, avoid link flap and waiting for idle
- support MAC Merge layer
- Other NICs:
- sfc/ef100: add basic devlink support for ef100
- ionic: rx_push mode operation (writing descriptors via MMIO)
- bnxt: use the auxiliary bus abstraction for RDMA
- r8169: disable ASPM and reset bus in case of tx timeout
- cpsw: support QSGMII mode for J721e CPSW9G
- cpts: support pulse-per-second output
- ngbe: add an mdio bus driver
- usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
- r8152: handle devices with FW with NCM support
- amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
- virtio-net: support multi buffer XDP
- virtio/vsock: replace virtio_vsock_pkt with sk_buff
- tsnep: XDP support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add support for latency TLV (in FW control messages)
- Microchip (sparx5):
- separate explicit and implicit traffic forwarding rules, make
the implicit rules always active
- add support for egress DSCP rewrite
- IS0 VCAP support (Ingress Classification)
- IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS
etc.)
- ES2 VCAP support (Egress Access Control)
- support for Per-Stream Filtering and Policing (802.1Q,
8.6.5.1)
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- add MAB (port auth) offload support
- enable PTP receive for mv88e6390
- NXP (ocelot):
- support MAC Merge layer
- support for the the vsc7512 internal copper phys
- Microchip:
- lan9303: convert to PHYLINK
- lan966x: support TC flower filter statistics
- lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
- lan937x: support Credit Based Shaper configuration
- ksz9477: support Energy Efficient Ethernet
- other:
- qca8k: convert to regmap read/write API, use bulk operations
- rswitch: Improve TX timestamp accuracy
- Intel WiFi (iwlwifi):
- EHT (Wi-Fi 7) rate reporting
- STEP equalizer support: transfer some STEP (connection to radio
on platforms with integrated wifi) related parameters from the
BIOS to the firmware.
- Qualcomm 802.11ax WiFi (ath11k):
- IPQ5018 support
- Fine Timing Measurement (FTM) responder role support
- channel 177 support
- MediaTek WiFi (mt76):
- per-PHY LED support
- mt7996: EHT (Wi-Fi 7) support
- Wireless Ethernet Dispatch (WED) reset support
- switch to using page pool allocator
- RealTek WiFi (rtw89):
- support new version of Bluetooth co-existance
- Mobile:
- rmnet: support TX aggregation"
* tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits)
page_pool: add a comment explaining the fragment counter usage
net: ethtool: fix __ethtool_dev_mm_supported() implementation
ethtool: pse-pd: Fix double word in comments
xsk: add linux/vmalloc.h to xsk.c
sefltests: netdevsim: wait for devlink instance after netns removal
selftest: fib_tests: Always cleanup before exit
net/mlx5e: Align IPsec ASO result memory to be as required by hardware
net/mlx5e: TC, Set CT miss to the specific ct action instance
net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
net/mlx5: Refactor tc miss handling to a single function
net/mlx5: Kconfig: Make tc offload depend on tc skb extension
net/sched: flower: Support hardware miss to tc action
net/sched: flower: Move filter handle initialization earlier
net/sched: cls_api: Support hardware miss to tc action
net/sched: Rename user cookie and act cookie
sfc: fix builds without CONFIG_RTC_LIB
sfc: clean up some inconsistent indentings
net/mlx4_en: Introduce flexible array to silence overflow warning
net: lan966x: Fix possible deadlock inside PTP
net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
...
Diffstat (limited to 'tools/perf/Documentation/jitdump-specification.txt')
-rw-r--r-- | tools/perf/Documentation/jitdump-specification.txt | 170 |
1 files changed, 170 insertions, 0 deletions
diff --git a/tools/perf/Documentation/jitdump-specification.txt b/tools/perf/Documentation/jitdump-specification.txt new file mode 100644 index 000000000..79936355d --- /dev/null +++ b/tools/perf/Documentation/jitdump-specification.txt @@ -0,0 +1,170 @@ +JITDUMP specification version 2 +Last Revised: 09/15/2016 +Author: Stephane Eranian <eranian@gmail.com> + +-------------------------------------------------------- +| Revision | Date | Description | +-------------------------------------------------------- +| 1 | 09/07/2016 | Initial revision | +-------------------------------------------------------- +| 2 | 09/15/2016 | Add JIT_CODE_UNWINDING_INFO | +-------------------------------------------------------- + + +I/ Introduction + + +This document describes the jitdump file format. The file is generated by Just-In-time compiler runtimes to save meta-data information about the generated code, such as address, size, and name of generated functions, the native code generated, the source line information. The data may then be used by performance tools, such as Linux perf to generate function and assembly level profiles. + +The format is not specific to any particular programming language. It can be extended as need be. + +The format of the file is binary. It is self-describing in terms of endianness and is portable across multiple processor architectures. + + +II/ Overview of the format + + +The format requires only sequential accesses, i.e., append only mode. The file starts with a fixed size file header describing the version of the specification, the endianness. + +The header is followed by a series of records, each starting with a fixed size header describing the type of record and its size. It is, itself, followed by the payload for the record. Records can have a variable size even for a given type. + +Each entry in the file is timestamped. All timestamps must use the same clock source. The CLOCK_MONOTONIC clock source is recommended. + + +III/ Jitdump file header format + +Each jitdump file starts with a fixed size header containing the following fields in order: + + +* uint32_t magic : a magic number tagging the file type. The value is 4-byte long and represents the string "JiTD" in ASCII form. It written is as 0x4A695444. The reader will detect an endian mismatch when it reads 0x4454694a. +* uint32_t version : a 4-byte value representing the format version. It is currently set to 1 +* uint32_t total_size: size in bytes of file header +* uint32_t elf_mach : ELF architecture encoding (ELF e_machine value as specified in /usr/include/elf.h) +* uint32_t pad1 : padding. Reserved for future use +* uint32_t pid : JIT runtime process identification (OS specific) +* uint64_t timestamp : timestamp of when the file was created +* uint64_t flags : a bitmask of flags + +The flags currently defined are as follows: + * bit 0: JITDUMP_FLAGS_ARCH_TIMESTAMP : set if the jitdump file is using an architecture-specific timestamp clock source. For instance, on x86, one could use TSC directly + +IV/ Record header + +The file header is immediately followed by records. Each record starts with a fixed size header describing the record that follows. + +The record header is specified in order as follows: +* uint32_t id : a value identifying the record type (see below) +* uint32_t total_size: the size in bytes of the record including the header. +* uint64_t timestamp : a timestamp of when the record was created. + +The following record types are defined: + * Value 0 : JIT_CODE_LOAD : record describing a jitted function + * Value 1 : JIT_CODE_MOVE : record describing an already jitted function which is moved + * Value 2 : JIT_CODE_DEBUG_INFO: record describing the debug information for a jitted function + * Value 3 : JIT_CODE_CLOSE : record marking the end of the jit runtime (optional) + * Value 4 : JIT_CODE_UNWINDING_INFO: record describing a function unwinding information + + The payload of the record must immediately follow the record header without padding. + +V/ JIT_CODE_LOAD record + + + The record has the following fields following the fixed-size record header in order: + * uint32_t pid: OS process id of the runtime generating the jitted code + * uint32_t tid: OS thread identification of the runtime thread generating the jitted code + * uint64_t vma: virtual address of jitted code start + * uint64_t code_addr: code start address for the jitted code. By default vma = code_addr + * uint64_t code_size: size in bytes of the generated jitted code + * uint64_t code_index: unique identifier for the jitted code (see below) + * char[n]: function name in ASCII including the null termination + * native code: raw byte encoding of the jitted code + + The record header total_size field is inclusive of all components: + * record header + * fixed-sized fields + * function name string, including termination + * native code length + * record specific variable data (e.g., array of data entries) + +The code_index is used to uniquely identify each jitted function. The index can be a monotonically increasing 64-bit value. Each time a function is jitted it gets a new number. This value is used in case the code for a function is moved and avoids having to issue another JIT_CODE_LOAD record. + +The format supports empty functions with no native code. + + +VI/ JIT_CODE_MOVE record + + The record type is optional. + + The record has the following fields following the fixed-size record header in order: + * uint32_t pid : OS process id of the runtime generating the jitted code + * uint32_t tid : OS thread identification of the runtime thread generating the jitted code + * uint64_t vma : new virtual address of jitted code start + * uint64_t old_code_addr: previous code address for the same function + * uint64_t new_code_addr: alternate new code started address for the jitted code. By default it should be equal to the vma address. + * uint64_t code_size : size in bytes of the jitted code + * uint64_t code_index : index referring to the JIT_CODE_LOAD code_index record of when the function was initially jitted + + +The MOVE record can be used in case an already jitted function is simply moved by the runtime inside the code cache. + +The JIT_CODE_MOVE record cannot come before the JIT_CODE_LOAD record for the same function name. The function cannot have changed name, otherwise a new JIT_CODE_LOAD record must be emitted. + +The code size of the function cannot change. + + +VII/ JIT_DEBUG_INFO record + +The record type is optional. + +The record contains source lines debug information, i.e., a way to map a code address back to a source line. This information may be used by the performance tool. + +The record has the following fields following the fixed-size record header in order: + * uint64_t code_addr: address of function for which the debug information is generated + * uint64_t nr_entry : number of debug entries for the function + * debug_entry[n]: array of nr_entry debug entries for the function + +The debug_entry describes the source line information. It is defined as follows in order: +* uint64_t code_addr: address of function for which the debug information is generated +* uint32_t line : source file line number (starting at 1) +* uint32_t discrim : column discriminator, 0 is default +* char name[n] : source file name in ASCII, including null termination + +The debug_entry entries are saved in sequence but given that they have variable sizes due to the file name string, they cannot be indexed directly. +They need to be walked sequentially. The next debug_entry is found at sizeof(debug_entry) + strlen(name) + 1. + +IMPORTANT: + The JIT_CODE_DEBUG for a given function must always be generated BEFORE the JIT_CODE_LOAD for the function. This facilitates greatly the parser for the jitdump file. + + +VIII/ JIT_CODE_CLOSE record + + +The record type is optional. + +The record is used as a marker for the end of the jitted runtime. It can be replaced by the end of the file. + +The JIT_CODE_CLOSE record does not have any specific fields, the record header contains all the information needed. + + +IX/ JIT_CODE_UNWINDING_INFO + + +The record type is optional. + +The record is used to describe the unwinding information for a jitted function. + +The record has the following fields following the fixed-size record header in order: + +uint64_t unwind_data_size : the size in bytes of the unwinding data table at the end of the record +uint64_t eh_frame_hdr_size : the size in bytes of the DWARF EH Frame Header at the start of the unwinding data table at the end of the record +uint64_t mapped_size : the size of the unwinding data mapped in memory +const char unwinding_data[n]: an array of unwinding data, consisting of the EH Frame Header, followed by the actual EH Frame + + +The EH Frame header follows the Linux Standard Base (LSB) specification as described in the document at https://refspecs.linuxfoundation.org/LSB_1.3.0/gLSB/gLSB/ehframehdr.html + + +The EH Frame follows the LSB specification as described in the document at https://refspecs.linuxbase.org/LSB_3.0.0/LSB-PDA/LSB-PDA/ehframechpt.html + + +NOTE: The mapped_size is generally either the same as unwind_data_size (if the unwinding data was mapped in memory by the running process) or zero (if the unwinding data is not mapped by the process). If the unwinding data was not mapped, then only the EH Frame Header will be read, which can be used to specify FP based unwinding for a function which does not have unwinding information. |