diff options
author | 2023-02-21 18:24:12 -0800 | |
---|---|---|
committer | 2023-02-21 18:24:12 -0800 | |
commit | 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 (patch) | |
tree | cc5c2d0a898769fd59549594fedb3ee6f84e59a0 /Documentation/core-api/genericirq.rst | |
download | linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.tar.gz linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.zip |
Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextgrafted
Pull networking updates from Jakub Kicinski:
"Core:
- Add dedicated kmem_cache for typical/small skb->head, avoid having
to access struct page at kfree time, and improve memory use.
- Introduce sysctl to set default RPS configuration for new netdevs.
- Define Netlink protocol specification format which can be used to
describe messages used by each family and auto-generate parsers.
Add tools for generating kernel data structures and uAPI headers.
- Expose all net/core sysctls inside netns.
- Remove 4s sleep in netpoll if carrier is instantly detected on
boot.
- Add configurable limit of MDB entries per port, and port-vlan.
- Continue populating drop reasons throughout the stack.
- Retire a handful of legacy Qdiscs and classifiers.
Protocols:
- Support IPv4 big TCP (TSO frames larger than 64kB).
- Add IP_LOCAL_PORT_RANGE socket option, to control local port range
on socket by socket basis.
- Track and report in procfs number of MPTCP sockets used.
- Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path
manager.
- IPv6: don't check net.ipv6.route.max_size and rely on garbage
collection to free memory (similarly to IPv4).
- Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).
- ICMP: add per-rate limit counters.
- Add support for user scanning requests in ieee802154.
- Remove static WEP support.
- Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
reporting.
- WiFi 7 EHT channel puncturing support (client & AP).
BPF:
- Add a rbtree data structure following the "next-gen data structure"
precedent set by recently added linked list, that is, by using
kfunc + kptr instead of adding a new BPF map type.
- Expose XDP hints via kfuncs with initial support for RX hash and
timestamp metadata.
- Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to
better support decap on GRE tunnel devices not operating in collect
metadata.
- Improve x86 JIT's codegen for PROBE_MEM runtime error checks.
- Remove the need for trace_printk_lock for bpf_trace_printk and
bpf_trace_vprintk helpers.
- Extend libbpf's bpf_tracing.h support for tracing arguments of
kprobes/uprobes and syscall as a special case.
- Significantly reduce the search time for module symbols by
livepatch and BPF.
- Enable cpumasks to be used as kptrs, which is useful for tracing
programs tracking which tasks end up running on which CPUs in
different time intervals.
- Add support for BPF trampoline on s390x and riscv64.
- Add capability to export the XDP features supported by the NIC.
- Add __bpf_kfunc tag for marking kernel functions as kfuncs.
- Add cgroup.memory=nobpf kernel parameter option to disable BPF
memory accounting for container environments.
Netfilter:
- Remove the CLUSTERIP target. It has been marked as obsolete for
years, and we still have WARN splats wrt races of the out-of-band
/proc interface installed by this target.
- Add 'destroy' commands to nf_tables. They are identical to the
existing 'delete' commands, but do not return an error if the
referenced object (set, chain, rule...) did not exist.
Driver API:
- Improve cpumask_local_spread() locality to help NICs set the right
IRQ affinity on AMD platforms.
- Separate C22 and C45 MDIO bus transactions more clearly.
- Introduce new DCB table to control DSCP rewrite on egress.
- Support configuration of Physical Layer Collision Avoidance (PLCA)
Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
shared medium Ethernet.
- Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
preemption of low priority frames by high priority frames.
- Add support for controlling MACSec offload using netlink SET.
- Rework devlink instance refcounts to allow registration and
de-registration under the instance lock. Split the code into
multiple files, drop some of the unnecessarily granular locks and
factor out common parts of netlink operation handling.
- Add TX frame aggregation parameters (for USB drivers).
- Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
messages with notifications for debug.
- Allow offloading of UDP NEW connections via act_ct.
- Add support for per action HW stats in TC.
- Support hardware miss to TC action (continue processing in SW from
a specific point in the action chain).
- Warn if old Wireless Extension user space interface is used with
modern cfg80211/mac80211 drivers. Do not support Wireless
Extensions for Wi-Fi 7 devices at all. Everyone should switch to
using nl80211 interface instead.
- Improve the CAN bit timing configuration. Use extack to return
error messages directly to user space, update the SJW handling,
including the definition of a new default value that will benefit
CAN-FD controllers, by increasing their oscillator tolerance.
New hardware / drivers:
- Ethernet:
- nVidia BlueField-3 support (control traffic driver)
- Ethernet support for imx93 SoCs
- Motorcomm yt8531 gigabit Ethernet PHY
- onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
- Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
- Amlogic gxl MDIO mux
- WiFi:
- RealTek RTL8188EU (rtl8xxxu)
- Qualcomm Wi-Fi 7 devices (ath12k)
- CAN:
- Renesas R-Car V4H
Drivers:
- Bluetooth:
- Set Per Platform Antenna Gain (PPAG) for Intel controllers.
- Ethernet NICs:
- Intel (1G, igc):
- support TSN / Qbv / packet scheduling features of i226 model
- Intel (100G, ice):
- use GNSS subsystem instead of TTY
- multi-buffer XDP support
- extend support for GPIO pins to E823 devices
- nVidia/Mellanox:
- update the shared buffer configuration on PFC commands
- implement PTP adjphase function for HW offset control
- TC support for Geneve and GRE with VF tunnel offload
- more efficient crypto key management method
- multi-port eswitch support
- Netronome/Corigine:
- add DCB IEEE support
- support IPsec offloading for NFP3800
- Freescale/NXP (enetc):
- support XDP_REDIRECT for XDP non-linear buffers
- improve reconfig, avoid link flap and waiting for idle
- support MAC Merge layer
- Other NICs:
- sfc/ef100: add basic devlink support for ef100
- ionic: rx_push mode operation (writing descriptors via MMIO)
- bnxt: use the auxiliary bus abstraction for RDMA
- r8169: disable ASPM and reset bus in case of tx timeout
- cpsw: support QSGMII mode for J721e CPSW9G
- cpts: support pulse-per-second output
- ngbe: add an mdio bus driver
- usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
- r8152: handle devices with FW with NCM support
- amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
- virtio-net: support multi buffer XDP
- virtio/vsock: replace virtio_vsock_pkt with sk_buff
- tsnep: XDP support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add support for latency TLV (in FW control messages)
- Microchip (sparx5):
- separate explicit and implicit traffic forwarding rules, make
the implicit rules always active
- add support for egress DSCP rewrite
- IS0 VCAP support (Ingress Classification)
- IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS
etc.)
- ES2 VCAP support (Egress Access Control)
- support for Per-Stream Filtering and Policing (802.1Q,
8.6.5.1)
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- add MAB (port auth) offload support
- enable PTP receive for mv88e6390
- NXP (ocelot):
- support MAC Merge layer
- support for the the vsc7512 internal copper phys
- Microchip:
- lan9303: convert to PHYLINK
- lan966x: support TC flower filter statistics
- lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
- lan937x: support Credit Based Shaper configuration
- ksz9477: support Energy Efficient Ethernet
- other:
- qca8k: convert to regmap read/write API, use bulk operations
- rswitch: Improve TX timestamp accuracy
- Intel WiFi (iwlwifi):
- EHT (Wi-Fi 7) rate reporting
- STEP equalizer support: transfer some STEP (connection to radio
on platforms with integrated wifi) related parameters from the
BIOS to the firmware.
- Qualcomm 802.11ax WiFi (ath11k):
- IPQ5018 support
- Fine Timing Measurement (FTM) responder role support
- channel 177 support
- MediaTek WiFi (mt76):
- per-PHY LED support
- mt7996: EHT (Wi-Fi 7) support
- Wireless Ethernet Dispatch (WED) reset support
- switch to using page pool allocator
- RealTek WiFi (rtw89):
- support new version of Bluetooth co-existance
- Mobile:
- rmnet: support TX aggregation"
* tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits)
page_pool: add a comment explaining the fragment counter usage
net: ethtool: fix __ethtool_dev_mm_supported() implementation
ethtool: pse-pd: Fix double word in comments
xsk: add linux/vmalloc.h to xsk.c
sefltests: netdevsim: wait for devlink instance after netns removal
selftest: fib_tests: Always cleanup before exit
net/mlx5e: Align IPsec ASO result memory to be as required by hardware
net/mlx5e: TC, Set CT miss to the specific ct action instance
net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
net/mlx5: Refactor tc miss handling to a single function
net/mlx5: Kconfig: Make tc offload depend on tc skb extension
net/sched: flower: Support hardware miss to tc action
net/sched: flower: Move filter handle initialization earlier
net/sched: cls_api: Support hardware miss to tc action
net/sched: Rename user cookie and act cookie
sfc: fix builds without CONFIG_RTC_LIB
sfc: clean up some inconsistent indentings
net/mlx4_en: Introduce flexible array to silence overflow warning
net: lan966x: Fix possible deadlock inside PTP
net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
...
Diffstat (limited to 'Documentation/core-api/genericirq.rst')
-rw-r--r-- | Documentation/core-api/genericirq.rst | 444 |
1 files changed, 444 insertions, 0 deletions
diff --git a/Documentation/core-api/genericirq.rst b/Documentation/core-api/genericirq.rst new file mode 100644 index 000000000..f959c9b53 --- /dev/null +++ b/Documentation/core-api/genericirq.rst @@ -0,0 +1,444 @@ +.. include:: <isonum.txt> + +========================== +Linux generic IRQ handling +========================== + +:Copyright: |copy| 2005-2010: Thomas Gleixner +:Copyright: |copy| 2005-2006: Ingo Molnar + +Introduction +============ + +The generic interrupt handling layer is designed to provide a complete +abstraction of interrupt handling for device drivers. It is able to +handle all the different types of interrupt controller hardware. Device +drivers use generic API functions to request, enable, disable and free +interrupts. The drivers do not have to know anything about interrupt +hardware details, so they can be used on different platforms without +code changes. + +This documentation is provided to developers who want to implement an +interrupt subsystem based for their architecture, with the help of the +generic IRQ handling layer. + +Rationale +========= + +The original implementation of interrupt handling in Linux uses the +__do_IRQ() super-handler, which is able to deal with every type of +interrupt logic. + +Originally, Russell King identified different types of handlers to build +a quite universal set for the ARM interrupt handler implementation in +Linux 2.5/2.6. He distinguished between: + +- Level type + +- Edge type + +- Simple type + +During the implementation we identified another type: + +- Fast EOI type + +In the SMP world of the __do_IRQ() super-handler another type was +identified: + +- Per CPU type + +This split implementation of high-level IRQ handlers allows us to +optimize the flow of the interrupt handling for each specific interrupt +type. This reduces complexity in that particular code path and allows +the optimized handling of a given type. + +The original general IRQ implementation used hw_interrupt_type +structures and their ``->ack``, ``->end`` [etc.] callbacks to differentiate +the flow control in the super-handler. This leads to a mix of flow logic +and low-level hardware logic, and it also leads to unnecessary code +duplication: for example in i386, there is an ``ioapic_level_irq`` and an +``ioapic_edge_irq`` IRQ-type which share many of the low-level details but +have different flow handling. + +A more natural abstraction is the clean separation of the 'irq flow' and +the 'chip details'. + +Analysing a couple of architecture's IRQ subsystem implementations +reveals that most of them can use a generic set of 'irq flow' methods +and only need to add the chip-level specific code. The separation is +also valuable for (sub)architectures which need specific quirks in the +IRQ flow itself but not in the chip details - and thus provides a more +transparent IRQ subsystem design. + +Each interrupt descriptor is assigned its own high-level flow handler, +which is normally one of the generic implementations. (This high-level +flow handler implementation also makes it simple to provide +demultiplexing handlers which can be found in embedded platforms on +various architectures.) + +The separation makes the generic interrupt handling layer more flexible +and extensible. For example, an (sub)architecture can use a generic +IRQ-flow implementation for 'level type' interrupts and add a +(sub)architecture specific 'edge type' implementation. + +To make the transition to the new model easier and prevent the breakage +of existing implementations, the __do_IRQ() super-handler is still +available. This leads to a kind of duality for the time being. Over time +the new model should be used in more and more architectures, as it +enables smaller and cleaner IRQ subsystems. It's deprecated for three +years now and about to be removed. + +Known Bugs And Assumptions +========================== + +None (knock on wood). + +Abstraction layers +================== + +There are three main levels of abstraction in the interrupt code: + +1. High-level driver API + +2. High-level IRQ flow handlers + +3. Chip-level hardware encapsulation + +Interrupt control flow +---------------------- + +Each interrupt is described by an interrupt descriptor structure +irq_desc. The interrupt is referenced by an 'unsigned int' numeric +value which selects the corresponding interrupt description structure in +the descriptor structures array. The descriptor structure contains +status information and pointers to the interrupt flow method and the +interrupt chip structure which are assigned to this interrupt. + +Whenever an interrupt triggers, the low-level architecture code calls +into the generic interrupt code by calling desc->handle_irq(). This +high-level IRQ handling function only uses desc->irq_data.chip +primitives referenced by the assigned chip descriptor structure. + +High-level Driver API +--------------------- + +The high-level Driver API consists of following functions: + +- request_irq() + +- request_threaded_irq() + +- free_irq() + +- disable_irq() + +- enable_irq() + +- disable_irq_nosync() (SMP only) + +- synchronize_irq() (SMP only) + +- irq_set_irq_type() + +- irq_set_irq_wake() + +- irq_set_handler_data() + +- irq_set_chip() + +- irq_set_chip_data() + +See the autogenerated function documentation for details. + +High-level IRQ flow handlers +---------------------------- + +The generic layer provides a set of pre-defined irq-flow methods: + +- handle_level_irq() + +- handle_edge_irq() + +- handle_fasteoi_irq() + +- handle_simple_irq() + +- handle_percpu_irq() + +- handle_edge_eoi_irq() + +- handle_bad_irq() + +The interrupt flow handlers (either pre-defined or architecture +specific) are assigned to specific interrupts by the architecture either +during bootup or during device initialization. + +Default flow implementations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Helper functions +^^^^^^^^^^^^^^^^ + +The helper functions call the chip primitives and are used by the +default flow implementations. The following helper functions are +implemented (simplified excerpt):: + + default_enable(struct irq_data *data) + { + desc->irq_data.chip->irq_unmask(data); + } + + default_disable(struct irq_data *data) + { + if (!delay_disable(data)) + desc->irq_data.chip->irq_mask(data); + } + + default_ack(struct irq_data *data) + { + chip->irq_ack(data); + } + + default_mask_ack(struct irq_data *data) + { + if (chip->irq_mask_ack) { + chip->irq_mask_ack(data); + } else { + chip->irq_mask(data); + chip->irq_ack(data); + } + } + + noop(struct irq_data *data)) + { + } + + + +Default flow handler implementations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Default Level IRQ flow handler +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +handle_level_irq provides a generic implementation for level-triggered +interrupts. + +The following control flow is implemented (simplified excerpt):: + + desc->irq_data.chip->irq_mask_ack(); + handle_irq_event(desc->action); + desc->irq_data.chip->irq_unmask(); + + +Default Fast EOI IRQ flow handler +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +handle_fasteoi_irq provides a generic implementation for interrupts, +which only need an EOI at the end of the handler. + +The following control flow is implemented (simplified excerpt):: + + handle_irq_event(desc->action); + desc->irq_data.chip->irq_eoi(); + + +Default Edge IRQ flow handler +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +handle_edge_irq provides a generic implementation for edge-triggered +interrupts. + +The following control flow is implemented (simplified excerpt):: + + if (desc->status & running) { + desc->irq_data.chip->irq_mask_ack(); + desc->status |= pending | masked; + return; + } + desc->irq_data.chip->irq_ack(); + desc->status |= running; + do { + if (desc->status & masked) + desc->irq_data.chip->irq_unmask(); + desc->status &= ~pending; + handle_irq_event(desc->action); + } while (status & pending); + desc->status &= ~running; + + +Default simple IRQ flow handler +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +handle_simple_irq provides a generic implementation for simple +interrupts. + +.. note:: + + The simple flow handler does not call any handler/chip primitives. + +The following control flow is implemented (simplified excerpt):: + + handle_irq_event(desc->action); + + +Default per CPU flow handler +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +handle_percpu_irq provides a generic implementation for per CPU +interrupts. + +Per CPU interrupts are only available on SMP and the handler provides a +simplified version without locking. + +The following control flow is implemented (simplified excerpt):: + + if (desc->irq_data.chip->irq_ack) + desc->irq_data.chip->irq_ack(); + handle_irq_event(desc->action); + if (desc->irq_data.chip->irq_eoi) + desc->irq_data.chip->irq_eoi(); + + +EOI Edge IRQ flow handler +^^^^^^^^^^^^^^^^^^^^^^^^^ + +handle_edge_eoi_irq provides an abnomination of the edge handler +which is solely used to tame a badly wreckaged irq controller on +powerpc/cell. + +Bad IRQ flow handler +^^^^^^^^^^^^^^^^^^^^ + +handle_bad_irq is used for spurious interrupts which have no real +handler assigned.. + +Quirks and optimizations +~~~~~~~~~~~~~~~~~~~~~~~~ + +The generic functions are intended for 'clean' architectures and chips, +which have no platform-specific IRQ handling quirks. If an architecture +needs to implement quirks on the 'flow' level then it can do so by +overriding the high-level irq-flow handler. + +Delayed interrupt disable +~~~~~~~~~~~~~~~~~~~~~~~~~ + +This per interrupt selectable feature, which was introduced by Russell +King in the ARM interrupt implementation, does not mask an interrupt at +the hardware level when disable_irq() is called. The interrupt is kept +enabled and is masked in the flow handler when an interrupt event +happens. This prevents losing edge interrupts on hardware which does not +store an edge interrupt event while the interrupt is disabled at the +hardware level. When an interrupt arrives while the IRQ_DISABLED flag +is set, then the interrupt is masked at the hardware level and the +IRQ_PENDING bit is set. When the interrupt is re-enabled by +enable_irq() the pending bit is checked and if it is set, the interrupt +is resent either via hardware or by a software resend mechanism. (It's +necessary to enable CONFIG_HARDIRQS_SW_RESEND when you want to use +the delayed interrupt disable feature and your hardware is not capable +of retriggering an interrupt.) The delayed interrupt disable is not +configurable. + +Chip-level hardware encapsulation +--------------------------------- + +The chip-level hardware descriptor structure :c:type:`irq_chip` contains all +the direct chip relevant functions, which can be utilized by the irq flow +implementations. + +- ``irq_ack`` + +- ``irq_mask_ack`` - Optional, recommended for performance + +- ``irq_mask`` + +- ``irq_unmask`` + +- ``irq_eoi`` - Optional, required for EOI flow handlers + +- ``irq_retrigger`` - Optional + +- ``irq_set_type`` - Optional + +- ``irq_set_wake`` - Optional + +These primitives are strictly intended to mean what they say: ack means +ACK, masking means masking of an IRQ line, etc. It is up to the flow +handler(s) to use these basic units of low-level functionality. + +__do_IRQ entry point +==================== + +The original implementation __do_IRQ() was an alternative entry point +for all types of interrupts. It no longer exists. + +This handler turned out to be not suitable for all interrupt hardware +and was therefore reimplemented with split functionality for +edge/level/simple/percpu interrupts. This is not only a functional +optimization. It also shortens code paths for interrupts. + +Locking on SMP +============== + +The locking of chip registers is up to the architecture that defines the +chip primitives. The per-irq structure is protected via desc->lock, by +the generic layer. + +Generic interrupt chip +====================== + +To avoid copies of identical implementations of IRQ chips the core +provides a configurable generic interrupt chip implementation. +Developers should check carefully whether the generic chip fits their +needs before implementing the same functionality slightly differently +themselves. + +.. kernel-doc:: kernel/irq/generic-chip.c + :export: + +Structures +========== + +This chapter contains the autogenerated documentation of the structures +which are used in the generic IRQ layer. + +.. kernel-doc:: include/linux/irq.h + :internal: + +.. kernel-doc:: include/linux/interrupt.h + :internal: + +Public Functions Provided +========================= + +This chapter contains the autogenerated documentation of the kernel API +functions which are exported. + +.. kernel-doc:: kernel/irq/manage.c + +.. kernel-doc:: kernel/irq/chip.c + :export: + +Internal Functions Provided +=========================== + +This chapter contains the autogenerated documentation of the internal +functions. + +.. kernel-doc:: kernel/irq/irqdesc.c + +.. kernel-doc:: kernel/irq/handle.c + +.. kernel-doc:: kernel/irq/chip.c + :internal: + +Credits +======= + +The following people have contributed to this document: + +1. Thomas Gleixner tglx@linutronix.de + +2. Ingo Molnar mingo@elte.hu |