Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextgrafted

Pull networking updates from Jakub Kicinski: "Core: - Add dedicated kmem_cache for typical/small skb->head, avoid having to access struct page at kfree time, and improve memory use. - Introduce sysctl to set default RPS configuration for new netdevs. - Define Netlink protocol specification format which can be used to describe messages used by each family and auto-generate parsers. Add tools for generating kernel data structures and uAPI headers. - Expose all net/core sysctls inside netns. - Remove 4s sleep in netpoll if carrier is instantly detected on boot. - Add configurable limit of MDB entries per port, and port-vlan. - Continue populating drop reasons throughout the stack. - Retire a handful of legacy Qdiscs and classifiers. Protocols: - Support IPv4 big TCP (TSO frames larger than 64kB). - Add IP_LOCAL_PORT_RANGE socket option, to control local port range on socket by socket basis. - Track and report in procfs number of MPTCP sockets used. - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path manager. - IPv6: don't check net.ipv6.route.max_size and rely on garbage collection to free memory (similarly to IPv4). - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986). - ICMP: add per-rate limit counters. - Add support for user scanning requests in ieee802154. - Remove static WEP support. - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate reporting. - WiFi 7 EHT channel puncturing support (client & AP). BPF: - Add a rbtree data structure following the "next-gen data structure" precedent set by recently added linked list, that is, by using kfunc + kptr instead of adding a new BPF map type. - Expose XDP hints via kfuncs with initial support for RX hash and timestamp metadata. - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better support decap on GRE tunnel devices not operating in collect metadata. - Improve x86 JIT's codegen for PROBE_MEM runtime error checks. - Remove the need for trace_printk_lock for bpf_trace_printk and bpf_trace_vprintk helpers. - Extend libbpf's bpf_tracing.h support for tracing arguments of kprobes/uprobes and syscall as a special case. - Significantly reduce the search time for module symbols by livepatch and BPF. - Enable cpumasks to be used as kptrs, which is useful for tracing programs tracking which tasks end up running on which CPUs in different time intervals. - Add support for BPF trampoline on s390x and riscv64. - Add capability to export the XDP features supported by the NIC. - Add __bpf_kfunc tag for marking kernel functions as kfuncs. - Add cgroup.memory=nobpf kernel parameter option to disable BPF memory accounting for container environments. Netfilter: - Remove the CLUSTERIP target. It has been marked as obsolete for years, and we still have WARN splats wrt races of the out-of-band /proc interface installed by this target. - Add 'destroy' commands to nf_tables. They are identical to the existing 'delete' commands, but do not return an error if the referenced object (set, chain, rule...) did not exist. Driver API: - Improve cpumask_local_spread() locality to help NICs set the right IRQ affinity on AMD platforms. - Separate C22 and C45 MDIO bus transactions more clearly. - Introduce new DCB table to control DSCP rewrite on egress. - Support configuration of Physical Layer Collision Avoidance (PLCA) Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of shared medium Ethernet. - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing preemption of low priority frames by high priority frames. - Add support for controlling MACSec offload using netlink SET. - Rework devlink instance refcounts to allow registration and de-registration under the instance lock. Split the code into multiple files, drop some of the unnecessarily granular locks and factor out common parts of netlink operation handling. - Add TX frame aggregation parameters (for USB drivers). - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning messages with notifications for debug. - Allow offloading of UDP NEW connections via act_ct. - Add support for per action HW stats in TC. - Support hardware miss to TC action (continue processing in SW from a specific point in the action chain). - Warn if old Wireless Extension user space interface is used with modern cfg80211/mac80211 drivers. Do not support Wireless Extensions for Wi-Fi 7 devices at all. Everyone should switch to using nl80211 interface instead. - Improve the CAN bit timing configuration. Use extack to return error messages directly to user space, update the SJW handling, including the definition of a new default value that will benefit CAN-FD controllers, by increasing their oscillator tolerance. New hardware / drivers: - Ethernet: - nVidia BlueField-3 support (control traffic driver) - Ethernet support for imx93 SoCs - Motorcomm yt8531 gigabit Ethernet PHY - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA) - Microchip LAN8841 PHY (incl. cable diagnostics and PTP) - Amlogic gxl MDIO mux - WiFi: - RealTek RTL8188EU (rtl8xxxu) - Qualcomm Wi-Fi 7 devices (ath12k) - CAN: - Renesas R-Car V4H Drivers: - Bluetooth: - Set Per Platform Antenna Gain (PPAG) for Intel controllers. - Ethernet NICs: - Intel (1G, igc): - support TSN / Qbv / packet scheduling features of i226 model - Intel (100G, ice): - use GNSS subsystem instead of TTY - multi-buffer XDP support - extend support for GPIO pins to E823 devices - nVidia/Mellanox: - update the shared buffer configuration on PFC commands - implement PTP adjphase function for HW offset control - TC support for Geneve and GRE with VF tunnel offload - more efficient crypto key management method - multi-port eswitch support - Netronome/Corigine: - add DCB IEEE support - support IPsec offloading for NFP3800 - Freescale/NXP (enetc): - support XDP_REDIRECT for XDP non-linear buffers - improve reconfig, avoid link flap and waiting for idle - support MAC Merge layer - Other NICs: - sfc/ef100: add basic devlink support for ef100 - ionic: rx_push mode operation (writing descriptors via MMIO) - bnxt: use the auxiliary bus abstraction for RDMA - r8169: disable ASPM and reset bus in case of tx timeout - cpsw: support QSGMII mode for J721e CPSW9G - cpts: support pulse-per-second output - ngbe: add an mdio bus driver - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing - r8152: handle devices with FW with NCM support - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation - virtio-net: support multi buffer XDP - virtio/vsock: replace virtio_vsock_pkt with sk_buff - tsnep: XDP support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add support for latency TLV (in FW control messages) - Microchip (sparx5): - separate explicit and implicit traffic forwarding rules, make the implicit rules always active - add support for egress DSCP rewrite - IS0 VCAP support (Ingress Classification) - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS etc.) - ES2 VCAP support (Egress Access Control) - support for Per-Stream Filtering and Policing (802.1Q, 8.6.5.1) - Ethernet embedded switches: - Marvell (mv88e6xxx): - add MAB (port auth) offload support - enable PTP receive for mv88e6390 - NXP (ocelot): - support MAC Merge layer - support for the the vsc7512 internal copper phys - Microchip: - lan9303: convert to PHYLINK - lan966x: support TC flower filter statistics - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x - lan937x: support Credit Based Shaper configuration - ksz9477: support Energy Efficient Ethernet - other: - qca8k: convert to regmap read/write API, use bulk operations - rswitch: Improve TX timestamp accuracy - Intel WiFi (iwlwifi): - EHT (Wi-Fi 7) rate reporting - STEP equalizer support: transfer some STEP (connection to radio on platforms with integrated wifi) related parameters from the BIOS to the firmware. - Qualcomm 802.11ax WiFi (ath11k): - IPQ5018 support - Fine Timing Measurement (FTM) responder role support - channel 177 support - MediaTek WiFi (mt76): - per-PHY LED support - mt7996: EHT (Wi-Fi 7) support - Wireless Ethernet Dispatch (WED) reset support - switch to using page pool allocator - RealTek WiFi (rtw89): - support new version of Bluetooth co-existance - Mobile: - rmnet: support TX aggregation" * tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits) page_pool: add a comment explaining the fragment counter usage net: ethtool: fix __ethtool_dev_mm_supported() implementation ethtool: pse-pd: Fix double word in comments xsk: add linux/vmalloc.h to xsk.c sefltests: netdevsim: wait for devlink instance after netns removal selftest: fib_tests: Always cleanup before exit net/mlx5e: Align IPsec ASO result memory to be as required by hardware net/mlx5e: TC, Set CT miss to the specific ct action instance net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG net/mlx5: Refactor tc miss handling to a single function net/mlx5: Kconfig: Make tc offload depend on tc skb extension net/sched: flower: Support hardware miss to tc action net/sched: flower: Move filter handle initialization earlier net/sched: cls_api: Support hardware miss to tc action net/sched: Rename user cookie and act cookie sfc: fix builds without CONFIG_RTC_LIB sfc: clean up some inconsistent indentings net/mlx4_en: Introduce flexible array to silence overflow warning net: lan966x: Fix possible deadlock inside PTP net/ulp: Remove redundant ->clone() test in inet_clone_ulp(). ...
author: Linus Torvalds <torvalds@linux-foundation.org> 2023-02-21 18:24:12 -0800
committer: Linus Torvalds <torvalds@linux-foundation.org> 2023-02-21 18:24:12 -0800
commit: 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 (patch)
tree: cc5c2d0a898769fd59549594fedb3ee6f84e59a0 /Documentation/driver-api/surface_aggregator/internal.rst
download: linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.tar.gz
linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.zip
1 files changed, 578 insertions, 0 deletions
diff --git a/Documentation/driver-api/surface_aggregator/internal.rst b/Documentation/driver-api/surface_aggregator/internal.rst
new file mode 100644
index 000000000..8c7c80c9f
--- /dev/null
+++ b/Documentation/driver-api/surface_aggregator/internal.rst
@@ -0,0 +1,578 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+.. |ssh_ptl| replace:: :c:type:`struct ssh_ptl <ssh_ptl>`
+.. |ssh_ptl_submit| replace:: :c:func:`ssh_ptl_submit`
+.. |ssh_ptl_cancel| replace:: :c:func:`ssh_ptl_cancel`
+.. |ssh_ptl_shutdown| replace:: :c:func:`ssh_ptl_shutdown`
+.. |ssh_ptl_rx_rcvbuf| replace:: :c:func:`ssh_ptl_rx_rcvbuf`
+.. |ssh_rtl| replace:: :c:type:`struct ssh_rtl <ssh_rtl>`
+.. |ssh_rtl_submit| replace:: :c:func:`ssh_rtl_submit`
+.. |ssh_rtl_cancel| replace:: :c:func:`ssh_rtl_cancel`
+.. |ssh_rtl_shutdown| replace:: :c:func:`ssh_rtl_shutdown`
+.. |ssh_packet| replace:: :c:type:`struct ssh_packet <ssh_packet>`
+.. |ssh_packet_get| replace:: :c:func:`ssh_packet_get`
+.. |ssh_packet_put| replace:: :c:func:`ssh_packet_put`
+.. |ssh_packet_ops| replace:: :c:type:`struct ssh_packet_ops <ssh_packet_ops>`
+.. |ssh_packet_base_priority| replace:: :c:type:`enum ssh_packet_base_priority <ssh_packet_base_priority>`
+.. |ssh_packet_flags| replace:: :c:type:`enum ssh_packet_flags <ssh_packet_flags>`
+.. |SSH_PACKET_PRIORITY| replace:: :c:func:`SSH_PACKET_PRIORITY`
+.. |ssh_frame| replace:: :c:type:`struct ssh_frame <ssh_frame>`
+.. |ssh_command| replace:: :c:type:`struct ssh_command <ssh_command>`
+.. |ssh_request| replace:: :c:type:`struct ssh_request <ssh_request>`
+.. |ssh_request_get| replace:: :c:func:`ssh_request_get`
+.. |ssh_request_put| replace:: :c:func:`ssh_request_put`
+.. |ssh_request_ops| replace:: :c:type:`struct ssh_request_ops <ssh_request_ops>`
+.. |ssh_request_init| replace:: :c:func:`ssh_request_init`
+.. |ssh_request_flags| replace:: :c:type:`enum ssh_request_flags <ssh_request_flags>`
+.. |ssam_controller| replace:: :c:type:`struct ssam_controller <ssam_controller>`
+.. |ssam_device| replace:: :c:type:`struct ssam_device <ssam_device>`
+.. |ssam_device_driver| replace:: :c:type:`struct ssam_device_driver <ssam_device_driver>`
+.. |ssam_client_bind| replace:: :c:func:`ssam_client_bind`
+.. |ssam_client_link| replace:: :c:func:`ssam_client_link`
+.. |ssam_request_sync| replace:: :c:type:`struct ssam_request_sync <ssam_request_sync>`
+.. |ssam_event_registry| replace:: :c:type:`struct ssam_event_registry <ssam_event_registry>`
+.. |ssam_event_id| replace:: :c:type:`struct ssam_event_id <ssam_event_id>`
+.. |ssam_nf| replace:: :c:type:`struct ssam_nf <ssam_nf>`
+.. |ssam_nf_refcount_inc| replace:: :c:func:`ssam_nf_refcount_inc`
+.. |ssam_nf_refcount_dec| replace:: :c:func:`ssam_nf_refcount_dec`
+.. |ssam_notifier_register| replace:: :c:func:`ssam_notifier_register`
+.. |ssam_notifier_unregister| replace:: :c:func:`ssam_notifier_unregister`
+.. |ssam_cplt| replace:: :c:type:`struct ssam_cplt <ssam_cplt>`
+.. |ssam_event_queue| replace:: :c:type:`struct ssam_event_queue <ssam_event_queue>`
+.. |ssam_request_sync_submit| replace:: :c:func:`ssam_request_sync_submit`
+
+=====================
+Core Driver Internals
+=====================
+
+Architectural overview of the Surface System Aggregator Module (SSAM) core
+and Surface Serial Hub (SSH) driver. For the API documentation, refer to:
+
+.. toctree::
+   :maxdepth: 2
+
+   internal-api
+
+
+Overview
+========
+
+The SSAM core implementation is structured in layers, somewhat following the
+SSH protocol structure:
+
+Lower-level packet transport is implemented in the *packet transport layer
+(PTL)*, directly building on top of the serial device (serdev)
+infrastructure of the kernel. As the name indicates, this layer deals with
+the packet transport logic and handles things like packet validation, packet
+acknowledgment (ACKing), packet (retransmission) timeouts, and relaying
+packet payloads to higher-level layers.
+
+Above this sits the *request transport layer (RTL)*. This layer is centered
+around command-type packet payloads, i.e. requests (sent from host to EC),
+responses of the EC to those requests, and events (sent from EC to host).
+It, specifically, distinguishes events from request responses, matches
+responses to their corresponding requests, and implements request timeouts.
+
+The *controller* layer is building on top of this and essentially decides
+how request responses and, especially, events are dealt with. It provides an
+event notifier system, handles event activation/deactivation, provides a
+workqueue for event and asynchronous request completion, and also manages
+the message counters required for building command messages (``SEQ``,
+``RQID``). This layer basically provides a fundamental interface to the SAM
+EC for use in other kernel drivers.
+
+While the controller layer already provides an interface for other kernel
+drivers, the client *bus* extends this interface to provide support for
+native SSAM devices, i.e. devices that are not defined in ACPI and not
+implemented as platform devices, via |ssam_device| and |ssam_device_driver|
+simplify management of client devices and client drivers.
+
+Refer to Documentation/driver-api/surface_aggregator/client.rst for
+documentation regarding the client device/driver API and interface options
+for other kernel drivers. It is recommended to familiarize oneself with
+that chapter and the Documentation/driver-api/surface_aggregator/ssh.rst
+before continuing with the architectural overview below.
+
+
+Packet Transport Layer
+======================
+
+The packet transport layer is represented via |ssh_ptl| and is structured
+around the following key concepts:
+
+Packets
+-------
+
+Packets are the fundamental transmission unit of the SSH protocol. They are
+managed by the packet transport layer, which is essentially the lowest layer
+of the driver and is built upon by other components of the SSAM core.
+Packets to be transmitted by the SSAM core are represented via |ssh_packet|
+(in contrast, packets received by the core do not have any specific
+structure and are managed entirely via the raw |ssh_frame|).
+
+This structure contains the required fields to manage the packet inside the
+transport layer, as well as a reference to the buffer containing the data to
+be transmitted (i.e. the message wrapped in |ssh_frame|). Most notably, it
+contains an internal reference count, which is used for managing its
+lifetime (accessible via |ssh_packet_get| and |ssh_packet_put|). When this
+counter reaches zero, the ``release()`` callback provided to the packet via
+its |ssh_packet_ops| reference is executed, which may then deallocate the
+packet or its enclosing structure (e.g. |ssh_request|).
+
+In addition to the ``release`` callback, the |ssh_packet_ops| reference also
+provides a ``complete()`` callback, which is run once the packet has been
+completed and provides the status of this completion, i.e. zero on success
+or a negative errno value in case of an error. Once the packet has been
+submitted to the packet transport layer, the ``complete()`` callback is
+always guaranteed to be executed before the ``release()`` callback, i.e. the
+packet will always be completed, either successfully, with an error, or due
+to cancellation, before it will be released.
+
+The state of a packet is managed via its ``state`` flags
+(|ssh_packet_flags|), which also contains the packet type. In particular,
+the following bits are noteworthy:
+
+* ``SSH_PACKET_SF_LOCKED_BIT``: This bit is set when completion, either
+  through error or success, is imminent. It indicates that no further
+  references of the packet should be taken and any existing references
+  should be dropped as soon as possible. The process setting this bit is
+  responsible for removing any references to this packet from the packet
+  queue and pending set.
+
+* ``SSH_PACKET_SF_COMPLETED_BIT``: This bit is set by the process running the
+  ``complete()`` callback and is used to ensure that this callback only runs
+  once.
+
+* ``SSH_PACKET_SF_QUEUED_BIT``: This bit is set when the packet is queued on
+  the packet queue and cleared when it is dequeued.
+
+* ``SSH_PACKET_SF_PENDING_BIT``: This bit is set when the packet is added to
+  the pending set and cleared when it is removed from it.
+
+Packet Queue
+------------
+
+The packet queue is the first of the two fundamental collections in the
+packet transport layer. It is a priority queue, with priority of the
+respective packets based on the packet type (major) and number of tries
+(minor). See |SSH_PACKET_PRIORITY| for more details on the priority value.
+
+All packets to be transmitted by the transport layer must be submitted to
+this queue via |ssh_ptl_submit|. Note that this includes control packets
+sent by the transport layer itself. Internally, data packets can be
+re-submitted to this queue due to timeouts or NAK packets sent by the EC.
+
+Pending Set
+-----------
+
+The pending set is the second of the two fundamental collections in the
+packet transport layer. It stores references to packets that have already
+been transmitted, but wait for acknowledgment (e.g. the corresponding ACK
+packet) by the EC.
+
+Note that a packet may both be pending and queued if it has been
+re-submitted due to a packet acknowledgment timeout or NAK. On such a
+re-submission, packets are not removed from the pending set.
+
+Transmitter Thread
+------------------
+
+The transmitter thread is responsible for most of the actual work regarding
+packet transmission. In each iteration, it (waits for and) checks if the
+next packet on the queue (if any) can be transmitted and, if so, removes it
+from the queue and increments its counter for the number of transmission
+attempts, i.e. tries. If the packet is sequenced, i.e. requires an ACK by
+the EC, the packet is added to the pending set. Next, the packet's data is
+submitted to the serdev subsystem. In case of an error or timeout during
+this submission, the packet is completed by the transmitter thread with the
+status value of the callback set accordingly. In case the packet is
+unsequenced, i.e. does not require an ACK by the EC, the packet is completed
+with success on the transmitter thread.
+
+Transmission of sequenced packets is limited by the number of concurrently
+pending packets, i.e. a limit on how many packets may be waiting for an ACK
+from the EC in parallel. This limit is currently set to one (see
+Documentation/driver-api/surface_aggregator/ssh.rst for the reasoning behind
+this). Control packets (i.e. ACK and NAK) can always be transmitted.
+
+Receiver Thread
+---------------
+
+Any data received from the EC is put into a FIFO buffer for further
+processing. This processing happens on the receiver thread. The receiver
+thread parses and validates the received message into its |ssh_frame| and
+corresponding payload. It prepares and submits the necessary ACK (and on
+validation error or invalid data NAK) packets for the received messages.
+
+This thread also handles further processing, such as matching ACK messages
+to the corresponding pending packet (via sequence ID) and completing it, as
+well as initiating re-submission of all currently pending packets on
+receival of a NAK message (re-submission in case of a NAK is similar to
+re-submission due to timeout, see below for more details on that). Note that
+the successful completion of a sequenced packet will always run on the
+receiver thread (whereas any failure-indicating completion will run on the
+process where the failure occurred).
+
+Any payload data is forwarded via a callback to the next upper layer, i.e.
+the request transport layer.
+
+Timeout Reaper
+--------------
+
+The packet acknowledgment timeout is a per-packet timeout for sequenced
+packets, started when the respective packet begins (re-)transmission (i.e.
+this timeout is armed once per transmission attempt on the transmitter
+thread). It is used to trigger re-submission or, when the number of tries
+has been exceeded, cancellation of the packet in question.
+
+This timeout is handled via a dedicated reaper task, which is essentially a
+work item (re-)scheduled to run when the next packet is set to time out. The
+work item then checks the set of pending packets for any packets that have
+exceeded the timeout and, if there are any remaining packets, re-schedules
+itself to the next appropriate point in time.
+
+If a timeout has been detected by the reaper, the packet will either be
+re-submitted if it still has some remaining tries left, or completed with
+``-ETIMEDOUT`` as status if not. Note that re-submission, in this case and
+triggered by receival of a NAK, means that the packet is added to the queue
+with a now incremented number of tries, yielding a higher priority. The
+timeout for the packet will be disabled until the next transmission attempt
+and the packet remains on the pending set.
+
+Note that due to transmission and packet acknowledgment timeouts, the packet
+transport layer is always guaranteed to make progress, if only through
+timing out packets, and will never fully block.
+
+Concurrency and Locking
+-----------------------
+
+There are two main locks in the packet transport layer: One guarding access
+to the packet queue and one guarding access to the pending set. These
+collections may only be accessed and modified under the respective lock. If
+access to both collections is needed, the pending lock must be acquired
+before the queue lock to avoid deadlocks.
+
+In addition to guarding the collections, after initial packet submission
+certain packet fields may only be accessed under one of the locks.
+Specifically, the packet priority must only be accessed while holding the
+queue lock and the packet timestamp must only be accessed while holding the
+pending lock.
+
+Other parts of the packet transport layer are guarded independently. State
+flags are managed by atomic bit operations and, if necessary, memory
+barriers. Modifications to the timeout reaper work item and expiration date
+are guarded by their own lock.
+
+The reference of the packet to the packet transport layer (``ptl``) is
+somewhat special. It is either set when the upper layer request is submitted
+or, if there is none, when the packet is first submitted. After it is set,
+it will not change its value. Functions that may run concurrently with
+submission, i.e. cancellation, can not rely on the ``ptl`` reference to be
+set. Access to it in these functions is guarded by ``READ_ONCE()``, whereas
+setting ``ptl`` is equally guarded with ``WRITE_ONCE()`` for symmetry.
+
+Some packet fields may be read outside of the respective locks guarding
+them, specifically priority and state for tracing. In those cases, proper
+access is ensured by employing ``WRITE_ONCE()`` and ``READ_ONCE()``. Such
+read-only access is only allowed when stale values are not critical.
+
+With respect to the interface for higher layers, packet submission
+(|ssh_ptl_submit|), packet cancellation (|ssh_ptl_cancel|), data receival
+(|ssh_ptl_rx_rcvbuf|), and layer shutdown (|ssh_ptl_shutdown|) may always be
+executed concurrently with respect to each other. Note that packet
+submission may not run concurrently with itself for the same packet.
+Equally, shutdown and data receival may also not run concurrently with
+themselves (but may run concurrently with each other).
+
+
+Request Transport Layer
+=======================
+
+The request transport layer is represented via |ssh_rtl| and builds on top
+of the packet transport layer. It deals with requests, i.e. SSH packets sent
+by the host containing a |ssh_command| as frame payload. This layer
+separates responses to requests from events, which are also sent by the EC
+via a |ssh_command| payload. While responses are handled in this layer,
+events are relayed to the next upper layer, i.e. the controller layer, via
+the corresponding callback. The request transport layer is structured around
+the following key concepts:
+
+Request
+-------
+
+Requests are packets with a command-type payload, sent from host to EC to
+query data from or trigger an action on it (or both simultaneously). They
+are represented by |ssh_request|, wrapping the underlying |ssh_packet|
+storing its message data (i.e. SSH frame with command payload). Note that
+all top-level representations, e.g. |ssam_request_sync| are built upon this
+struct.
+
+As |ssh_request| extends |ssh_packet|, its lifetime is also managed by the
+reference counter inside the packet struct (which can be accessed via
+|ssh_request_get| and |ssh_request_put|). Once the counter reaches zero, the
+``release()`` callback of the |ssh_request_ops| reference of the request is
+called.
+
+Requests can have an optional response that is equally sent via a SSH
+message with command-type payload (from EC to host). The party constructing
+the request must know if a response is expected and mark this in the request
+flags provided to |ssh_request_init|, so that the request transport layer
+can wait for this response.
+
+Similar to |ssh_packet|, |ssh_request| also has a ``complete()`` callback
+provided via its request ops reference and is guaranteed to be completed
+before it is released once it has been submitted to the request transport
+layer via |ssh_rtl_submit|. For a request without a response, successful
+completion will occur once the underlying packet has been successfully
+transmitted by the packet transport layer (i.e. from within the packet
+completion callback). For a request with response, successful completion
+will occur once the response has been received and matched to the request
+via its request ID (which happens on the packet layer's data-received
+callback running on the receiver thread). If the request is completed with
+an error, the status value will be set to the corresponding (negative) errno
+value.
+
+The state of a request is again managed via its ``state`` flags
+(|ssh_request_flags|), which also encode the request type. In particular,
+the following bits are noteworthy:
+
+* ``SSH_REQUEST_SF_LOCKED_BIT``: This bit is set when completion, either
+  through error or success, is imminent. It indicates that no further
+  references of the request should be taken and any existing references
+  should be dropped as soon as possible. The process setting this bit is
+  responsible for removing any references to this request from the request
+  queue and pending set.
+
+* ``SSH_REQUEST_SF_COMPLETED_BIT``: This bit is set by the process running the
+  ``complete()`` callback and is used to ensure that this callback only runs
+  once.
+
+* ``SSH_REQUEST_SF_QUEUED_BIT``: This bit is set when the request is queued on
+  the request queue and cleared when it is dequeued.
+
+* ``SSH_REQUEST_SF_PENDING_BIT``: This bit is set when the request is added to
+  the pending set and cleared when it is removed from it.
+
+Request Queue
+-------------
+
+The request queue is the first of the two fundamental collections in the
+request transport layer. In contrast to the packet queue of the packet
+transport layer, it is not a priority queue and the simple first come first
+serve principle applies.
+
+All requests to be transmitted by the request transport layer must be
+submitted to this queue via |ssh_rtl_submit|. Once submitted, requests may
+not be re-submitted, and will not be re-submitted automatically on timeout.
+Instead, the request is completed with a timeout error. If desired, the
+caller can create and submit a new request for another try, but it must not
+submit the same request again.
+
+Pending Set
+-----------
+
+The pending set is the second of the two fundamental collections in the
+request transport layer. This collection stores references to all pending
+requests, i.e. requests awaiting a response from the EC (similar to what the
+pending set of the packet transport layer does for packets).
+
+Transmitter Task
+----------------
+
+The transmitter task is scheduled when a new request is available for
+transmission. It checks if the next request on the request queue can be
+transmitted and, if so, submits its underlying packet to the packet
+transport layer. This check ensures that only a limited number of
+requests can be pending, i.e. waiting for a response, at the same time. If
+the request requires a response, the request is added to the pending set
+before its packet is submitted.
+
+Packet Completion Callback
+--------------------------
+
+The packet completion callback is executed once the underlying packet of a
+request has been completed. In case of an error completion, the
+corresponding request is completed with the error value provided in this
+callback.
+
+On successful packet completion, further processing depends on the request.
+If the request expects a response, it is marked as transmitted and the
+request timeout is started. If the request does not expect a response, it is
+completed with success.
+
+Data-Received Callback
+----------------------
+
+The data received callback notifies the request transport layer of data
+being received by the underlying packet transport layer via a data-type
+frame. In general, this is expected to be a command-type payload.
+
+If the request ID of the command is one of the request IDs reserved for
+events (one to ``SSH_NUM_EVENTS``, inclusively), it is forwarded to the
+event callback registered in the request transport layer. If the request ID
+indicates a response to a request, the respective request is looked up in
+the pending set and, if found and marked as transmitted, completed with
+success.
+
+Timeout Reaper
+--------------
+
+The request-response-timeout is a per-request timeout for requests expecting
+a response. It is used to ensure that a request does not wait indefinitely
+on a response from the EC and is started after the underlying packet has
+been successfully completed.
+
+This timeout is, similar to the packet acknowledgment timeout on the packet
+transport layer, handled via a dedicated reaper task. This task is
+essentially a work-item (re-)scheduled to run when the next request is set
+to time out. The work item then scans the set of pending requests for any
+requests that have timed out and completes them with ``-ETIMEDOUT`` as
+status. Requests will not be re-submitted automatically. Instead, the issuer
+of the request must construct and submit a new request, if so desired.
+
+Note that this timeout, in combination with packet transmission and
+acknowledgment timeouts, guarantees that the request layer will always make
+progress, even if only through timing out packets, and never fully block.
+
+Concurrency and Locking
+-----------------------
+
+Similar to the packet transport layer, there are two main locks in the
+request transport layer: One guarding access to the request queue and one
+guarding access to the pending set. These collections may only be accessed
+and modified under the respective lock.
+
+Other parts of the request transport layer are guarded independently. State
+flags are (again) managed by atomic bit operations and, if necessary, memory
+barriers. Modifications to the timeout reaper work item and expiration date
+are guarded by their own lock.
+
+Some request fields may be read outside of the respective locks guarding
+them, specifically the state for tracing. In those cases, proper access is
+ensured by employing ``WRITE_ONCE()`` and ``READ_ONCE()``. Such read-only
+access is only allowed when stale values are not critical.
+
+With respect to the interface for higher layers, request submission
+(|ssh_rtl_submit|), request cancellation (|ssh_rtl_cancel|), and layer
+shutdown (|ssh_rtl_shutdown|) may always be executed concurrently with
+respect to each other. Note that request submission may not run concurrently
+with itself for the same request (and also may only be called once per
+request). Equally, shutdown may also not run concurrently with itself.
+
+
+Controller Layer
+================
+
+The controller layer extends on the request transport layer to provide an
+easy-to-use interface for client drivers. It is represented by
+|ssam_controller| and the SSH driver. While the lower level transport layers
+take care of transmitting and handling packets and requests, the controller
+layer takes on more of a management role. Specifically, it handles device
+initialization, power management, and event handling, including event
+delivery and registration via the (event) completion system (|ssam_cplt|).
+
+Event Registration
+------------------
+
+In general, an event (or rather a class of events) has to be explicitly
+requested by the host before the EC will send it (HID input events seem to
+be the exception). This is done via an event-enable request (similarly,
+events should be disabled via an event-disable request once no longer
+desired).
+
+The specific request used to enable (or disable) an event is given via an
+event registry, i.e. the governing authority of this event (so to speak),
+represented by |ssam_event_registry|. As parameters to this request, the
+target category and, depending on the event registry, instance ID of the
+event to be enabled must be provided. This (optional) instance ID must be
+zero if the registry does not use it. Together, target category and instance
+ID form the event ID, represented by |ssam_event_id|. In short, both, event
+registry and event ID, are required to uniquely identify a respective class
+of events.
+
+Note that a further *request ID* parameter must be provided for the
+enable-event request. This parameter does not influence the class of events
+being enabled, but instead is set as the request ID (RQID) on each event of
+this class sent by the EC. It is used to identify events (as a limited
+number of request IDs is reserved for use in events only, specifically one
+to ``SSH_NUM_EVENTS`` inclusively) and also map events to their specific
+class. Currently, the controller always sets this parameter to the target
+category specified in |ssam_event_id|.
+
+As multiple client drivers may rely on the same (or overlapping) classes of
+events and enable/disable calls are strictly binary (i.e. on/off), the
+controller has to manage access to these events. It does so via reference
+counting, storing the counter inside an RB-tree based mapping with event
+registry and ID as key (there is no known list of valid event registry and
+event ID combinations). See |ssam_nf|, |ssam_nf_refcount_inc|, and
+|ssam_nf_refcount_dec| for details.
+
+This management is done together with notifier registration (described in
+the next section) via the top-level |ssam_notifier_register| and
+|ssam_notifier_unregister| functions.
+
+Event Delivery
+--------------
+
+To receive events, a client driver has to register an event notifier via
+|ssam_notifier_register|. This increments the reference counter for that
+specific class of events (as detailed in the previous section), enables the
+class on the EC (if it has not been enabled already), and installs the
+provided notifier callback.
+
+Notifier callbacks are stored in lists, with one (RCU) list per target
+category (provided via the event ID; NB: there is a fixed known number of
+target categories). There is no known association from the combination of
+event registry and event ID to the command data (target ID, target category,
+command ID, and instance ID) that can be provided by an event class, apart
+from target category and instance ID given via the event ID.
+
+Note that due to the way notifiers are (or rather have to be) stored, client
+drivers may receive events that they have not requested and need to account
+for them. Specifically, they will, by default, receive all events from the
+same target category. To simplify dealing with this, filtering of events by
+target ID (provided via the event registry) and instance ID (provided via
+the event ID) can be requested when registering a notifier. This filtering
+is applied when iterating over the notifiers at the time they are executed.
+
+All notifier callbacks are executed on a dedicated workqueue, the so-called
+completion workqueue. After an event has been received via the callback
+installed in the request layer (running on the receiver thread of the packet
+transport layer), it will be put on its respective event queue
+(|ssam_event_queue|). From this event queue the completion work item of that
+queue (running on the completion workqueue) will pick up the event and
+execute the notifier callback. This is done to avoid blocking on the
+receiver thread.
+
+There is one event queue per combination of target ID and target category.
+This is done to ensure that notifier callbacks are executed in sequence for
+events of the same target ID and target category. Callbacks can be executed
+in parallel for events with a different combination of target ID and target
+category.
+
+Concurrency and Locking
+-----------------------
+
+Most of the concurrency related safety guarantees of the controller are
+provided by the lower-level request transport layer. In addition to this,
+event (un-)registration is guarded by its own lock.
+
+Access to the controller state is guarded by the state lock. This lock is a
+read/write semaphore. The reader part can be used to ensure that the state
+does not change while functions depending on the state to stay the same
+(e.g. |ssam_notifier_register|, |ssam_notifier_unregister|,
+|ssam_request_sync_submit|, and derivatives) are executed and this guarantee
+is not already provided otherwise (e.g. through |ssam_client_bind| or
+|ssam_client_link|). The writer part guards any transitions that will change
+the state, i.e. initialization, destruction, suspension, and resumption.
+
+The controller state may be accessed (read-only) outside the state lock for
+smoke-testing against invalid API usage (e.g. in |ssam_request_sync_submit|).
+Note that such checks are not supposed to (and will not) protect against all
+invalid usages, but rather aim to help catch them. In those cases, proper
+variable access is ensured by employing ``WRITE_ONCE()`` and ``READ_ONCE()``.
+
+Assuming any preconditions on the state not changing have been satisfied,
+all non-initialization and non-shutdown functions may run concurrently with
+each other. This includes |ssam_notifier_register|, |ssam_notifier_unregister|,
+|ssam_request_sync_submit|, as well as all functions building on top of those.
author	Linus Torvalds <torvalds@linux-foundation.org>	2023-02-21 18:24:12 -0800
committer	Linus Torvalds <torvalds@linux-foundation.org>	2023-02-21 18:24:12 -0800
commit	5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 (patch)
tree	cc5c2d0a898769fd59549594fedb3ee6f84e59a0 /Documentation/driver-api/surface_aggregator/internal.rst
download	linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.tar.gz linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.zip