From 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Tue, 21 Feb 2023 18:24:12 -0800 Subject: Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - Add dedicated kmem_cache for typical/small skb->head, avoid having to access struct page at kfree time, and improve memory use. - Introduce sysctl to set default RPS configuration for new netdevs. - Define Netlink protocol specification format which can be used to describe messages used by each family and auto-generate parsers. Add tools for generating kernel data structures and uAPI headers. - Expose all net/core sysctls inside netns. - Remove 4s sleep in netpoll if carrier is instantly detected on boot. - Add configurable limit of MDB entries per port, and port-vlan. - Continue populating drop reasons throughout the stack. - Retire a handful of legacy Qdiscs and classifiers. Protocols: - Support IPv4 big TCP (TSO frames larger than 64kB). - Add IP_LOCAL_PORT_RANGE socket option, to control local port range on socket by socket basis. - Track and report in procfs number of MPTCP sockets used. - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path manager. - IPv6: don't check net.ipv6.route.max_size and rely on garbage collection to free memory (similarly to IPv4). - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986). - ICMP: add per-rate limit counters. - Add support for user scanning requests in ieee802154. - Remove static WEP support. - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate reporting. - WiFi 7 EHT channel puncturing support (client & AP). BPF: - Add a rbtree data structure following the "next-gen data structure" precedent set by recently added linked list, that is, by using kfunc + kptr instead of adding a new BPF map type. - Expose XDP hints via kfuncs with initial support for RX hash and timestamp metadata. - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better support decap on GRE tunnel devices not operating in collect metadata. - Improve x86 JIT's codegen for PROBE_MEM runtime error checks. - Remove the need for trace_printk_lock for bpf_trace_printk and bpf_trace_vprintk helpers. - Extend libbpf's bpf_tracing.h support for tracing arguments of kprobes/uprobes and syscall as a special case. - Significantly reduce the search time for module symbols by livepatch and BPF. - Enable cpumasks to be used as kptrs, which is useful for tracing programs tracking which tasks end up running on which CPUs in different time intervals. - Add support for BPF trampoline on s390x and riscv64. - Add capability to export the XDP features supported by the NIC. - Add __bpf_kfunc tag for marking kernel functions as kfuncs. - Add cgroup.memory=nobpf kernel parameter option to disable BPF memory accounting for container environments. Netfilter: - Remove the CLUSTERIP target. It has been marked as obsolete for years, and we still have WARN splats wrt races of the out-of-band /proc interface installed by this target. - Add 'destroy' commands to nf_tables. They are identical to the existing 'delete' commands, but do not return an error if the referenced object (set, chain, rule...) did not exist. Driver API: - Improve cpumask_local_spread() locality to help NICs set the right IRQ affinity on AMD platforms. - Separate C22 and C45 MDIO bus transactions more clearly. - Introduce new DCB table to control DSCP rewrite on egress. - Support configuration of Physical Layer Collision Avoidance (PLCA) Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of shared medium Ethernet. - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing preemption of low priority frames by high priority frames. - Add support for controlling MACSec offload using netlink SET. - Rework devlink instance refcounts to allow registration and de-registration under the instance lock. Split the code into multiple files, drop some of the unnecessarily granular locks and factor out common parts of netlink operation handling. - Add TX frame aggregation parameters (for USB drivers). - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning messages with notifications for debug. - Allow offloading of UDP NEW connections via act_ct. - Add support for per action HW stats in TC. - Support hardware miss to TC action (continue processing in SW from a specific point in the action chain). - Warn if old Wireless Extension user space interface is used with modern cfg80211/mac80211 drivers. Do not support Wireless Extensions for Wi-Fi 7 devices at all. Everyone should switch to using nl80211 interface instead. - Improve the CAN bit timing configuration. Use extack to return error messages directly to user space, update the SJW handling, including the definition of a new default value that will benefit CAN-FD controllers, by increasing their oscillator tolerance. New hardware / drivers: - Ethernet: - nVidia BlueField-3 support (control traffic driver) - Ethernet support for imx93 SoCs - Motorcomm yt8531 gigabit Ethernet PHY - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA) - Microchip LAN8841 PHY (incl. cable diagnostics and PTP) - Amlogic gxl MDIO mux - WiFi: - RealTek RTL8188EU (rtl8xxxu) - Qualcomm Wi-Fi 7 devices (ath12k) - CAN: - Renesas R-Car V4H Drivers: - Bluetooth: - Set Per Platform Antenna Gain (PPAG) for Intel controllers. - Ethernet NICs: - Intel (1G, igc): - support TSN / Qbv / packet scheduling features of i226 model - Intel (100G, ice): - use GNSS subsystem instead of TTY - multi-buffer XDP support - extend support for GPIO pins to E823 devices - nVidia/Mellanox: - update the shared buffer configuration on PFC commands - implement PTP adjphase function for HW offset control - TC support for Geneve and GRE with VF tunnel offload - more efficient crypto key management method - multi-port eswitch support - Netronome/Corigine: - add DCB IEEE support - support IPsec offloading for NFP3800 - Freescale/NXP (enetc): - support XDP_REDIRECT for XDP non-linear buffers - improve reconfig, avoid link flap and waiting for idle - support MAC Merge layer - Other NICs: - sfc/ef100: add basic devlink support for ef100 - ionic: rx_push mode operation (writing descriptors via MMIO) - bnxt: use the auxiliary bus abstraction for RDMA - r8169: disable ASPM and reset bus in case of tx timeout - cpsw: support QSGMII mode for J721e CPSW9G - cpts: support pulse-per-second output - ngbe: add an mdio bus driver - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing - r8152: handle devices with FW with NCM support - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation - virtio-net: support multi buffer XDP - virtio/vsock: replace virtio_vsock_pkt with sk_buff - tsnep: XDP support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add support for latency TLV (in FW control messages) - Microchip (sparx5): - separate explicit and implicit traffic forwarding rules, make the implicit rules always active - add support for egress DSCP rewrite - IS0 VCAP support (Ingress Classification) - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS etc.) - ES2 VCAP support (Egress Access Control) - support for Per-Stream Filtering and Policing (802.1Q, 8.6.5.1) - Ethernet embedded switches: - Marvell (mv88e6xxx): - add MAB (port auth) offload support - enable PTP receive for mv88e6390 - NXP (ocelot): - support MAC Merge layer - support for the the vsc7512 internal copper phys - Microchip: - lan9303: convert to PHYLINK - lan966x: support TC flower filter statistics - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x - lan937x: support Credit Based Shaper configuration - ksz9477: support Energy Efficient Ethernet - other: - qca8k: convert to regmap read/write API, use bulk operations - rswitch: Improve TX timestamp accuracy - Intel WiFi (iwlwifi): - EHT (Wi-Fi 7) rate reporting - STEP equalizer support: transfer some STEP (connection to radio on platforms with integrated wifi) related parameters from the BIOS to the firmware. - Qualcomm 802.11ax WiFi (ath11k): - IPQ5018 support - Fine Timing Measurement (FTM) responder role support - channel 177 support - MediaTek WiFi (mt76): - per-PHY LED support - mt7996: EHT (Wi-Fi 7) support - Wireless Ethernet Dispatch (WED) reset support - switch to using page pool allocator - RealTek WiFi (rtw89): - support new version of Bluetooth co-existance - Mobile: - rmnet: support TX aggregation" * tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits) page_pool: add a comment explaining the fragment counter usage net: ethtool: fix __ethtool_dev_mm_supported() implementation ethtool: pse-pd: Fix double word in comments xsk: add linux/vmalloc.h to xsk.c sefltests: netdevsim: wait for devlink instance after netns removal selftest: fib_tests: Always cleanup before exit net/mlx5e: Align IPsec ASO result memory to be as required by hardware net/mlx5e: TC, Set CT miss to the specific ct action instance net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG net/mlx5: Refactor tc miss handling to a single function net/mlx5: Kconfig: Make tc offload depend on tc skb extension net/sched: flower: Support hardware miss to tc action net/sched: flower: Move filter handle initialization earlier net/sched: cls_api: Support hardware miss to tc action net/sched: Rename user cookie and act cookie sfc: fix builds without CONFIG_RTC_LIB sfc: clean up some inconsistent indentings net/mlx4_en: Introduce flexible array to silence overflow warning net: lan966x: Fix possible deadlock inside PTP net/ulp: Remove redundant ->clone() test in inet_clone_ulp(). ... --- .../driver-api/surface_aggregator/ssh.rst | 346 +++++++++++++++++++++ 1 file changed, 346 insertions(+) create mode 100644 Documentation/driver-api/surface_aggregator/ssh.rst (limited to 'Documentation/driver-api/surface_aggregator/ssh.rst') diff --git a/Documentation/driver-api/surface_aggregator/ssh.rst b/Documentation/driver-api/surface_aggregator/ssh.rst new file mode 100644 index 000000000..18fd0f0ae --- /dev/null +++ b/Documentation/driver-api/surface_aggregator/ssh.rst @@ -0,0 +1,346 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +.. |u8| replace:: :c:type:`u8 ` +.. |u16| replace:: :c:type:`u16 ` +.. |TYPE| replace:: ``TYPE`` +.. |LEN| replace:: ``LEN`` +.. |SEQ| replace:: ``SEQ`` +.. |SYN| replace:: ``SYN`` +.. |NAK| replace:: ``NAK`` +.. |ACK| replace:: ``ACK`` +.. |DATA| replace:: ``DATA`` +.. |DATA_SEQ| replace:: ``DATA_SEQ`` +.. |DATA_NSQ| replace:: ``DATA_NSQ`` +.. |TC| replace:: ``TC`` +.. |TID| replace:: ``TID`` +.. |SID| replace:: ``SID`` +.. |IID| replace:: ``IID`` +.. |RQID| replace:: ``RQID`` +.. |CID| replace:: ``CID`` + +=========================== +Surface Serial Hub Protocol +=========================== + +The Surface Serial Hub (SSH) is the central communication interface for the +embedded Surface Aggregator Module controller (SAM or EC), found on newer +Surface generations. We will refer to this protocol and interface as +SAM-over-SSH, as opposed to SAM-over-HID for the older generations. + +On Surface devices with SAM-over-SSH, SAM is connected to the host via UART +and defined in ACPI as device with ID ``MSHW0084``. On these devices, +significant functionality is provided via SAM, including access to battery +and power information and events, thermal read-outs and events, and many +more. For Surface Laptops, keyboard input is handled via HID directed +through SAM, on the Surface Laptop 3 and Surface Book 3 this also includes +touchpad input. + +Note that the standard disclaimer for this subsystem also applies to this +document: All of this has been reverse-engineered and may thus be erroneous +and/or incomplete. + +All CRCs used in the following are two-byte ``crc_ccitt_false(0xffff, ...)``. +All multi-byte values are little-endian, there is no implicit padding between +values. + + +SSH Packet Protocol: Definitions +================================ + +The fundamental communication unit of the SSH protocol is a frame +(:c:type:`struct ssh_frame `). A frame consists of the following +fields, packed together and in order: + +.. flat-table:: SSH Frame + :widths: 1 1 4 + :header-rows: 1 + + * - Field + - Type + - Description + + * - |TYPE| + - |u8| + - Type identifier of the frame. + + * - |LEN| + - |u16| + - Length of the payload associated with the frame. + + * - |SEQ| + - |u8| + - Sequence ID (see explanation below). + +Each frame structure is followed by a CRC over this structure. The CRC over +the frame structure (|TYPE|, |LEN|, and |SEQ| fields) is placed directly +after the frame structure and before the payload. The payload is followed by +its own CRC (over all payload bytes). If the payload is not present (i.e. +the frame has ``LEN=0``), the CRC of the payload is still present and will +evaluate to ``0xffff``. The |LEN| field does not include any of the CRCs, it +equals the number of bytes inbetween the CRC of the frame and the CRC of the +payload. + +Additionally, the following fixed two-byte sequences are used: + +.. flat-table:: SSH Byte Sequences + :widths: 1 1 4 + :header-rows: 1 + + * - Name + - Value + - Description + + * - |SYN| + - ``[0xAA, 0x55]`` + - Synchronization bytes. + +A message consists of |SYN|, followed by the frame (|TYPE|, |LEN|, |SEQ| and +CRC) and, if specified in the frame (i.e. ``LEN > 0``), payload bytes, +followed finally, regardless if the payload is present, the payload CRC. The +messages corresponding to an exchange are, in part, identified by having the +same sequence ID (|SEQ|), stored inside the frame (more on this in the next +section). The sequence ID is a wrapping counter. + +A frame can have the following types +(:c:type:`enum ssh_frame_type `): + +.. flat-table:: SSH Frame Types + :widths: 1 1 4 + :header-rows: 1 + + * - Name + - Value + - Short Description + + * - |NAK| + - ``0x04`` + - Sent on error in previously received message. + + * - |ACK| + - ``0x40`` + - Sent to acknowledge receival of |DATA| frame. + + * - |DATA_SEQ| + - ``0x80`` + - Sent to transfer data. Sequenced. + + * - |DATA_NSQ| + - ``0x00`` + - Same as |DATA_SEQ|, but does not need to be ACKed. + +Both |NAK|- and |ACK|-type frames are used to control flow of messages and +thus do not carry a payload. |DATA_SEQ|- and |DATA_NSQ|-type frames on the +other hand must carry a payload. The flow sequence and interaction of +different frame types will be described in more depth in the next section. + + +SSH Packet Protocol: Flow Sequence +================================== + +Each exchange begins with |SYN|, followed by a |DATA_SEQ|- or +|DATA_NSQ|-type frame, followed by its CRC, payload, and payload CRC. In +case of a |DATA_NSQ|-type frame, the exchange is then finished. In case of a +|DATA_SEQ|-type frame, the receiving party has to acknowledge receival of +the frame by responding with a message containing an |ACK|-type frame with +the same sequence ID of the |DATA| frame. In other words, the sequence ID of +the |ACK| frame specifies the |DATA| frame to be acknowledged. In case of an +error, e.g. an invalid CRC, the receiving party responds with a message +containing an |NAK|-type frame. As the sequence ID of the previous data +frame, for which an error is indicated via the |NAK| frame, cannot be relied +upon, the sequence ID of the |NAK| frame should not be used and is set to +zero. After receival of an |NAK| frame, the sending party should re-send all +outstanding (non-ACKed) messages. + +Sequence IDs are not synchronized between the two parties, meaning that they +are managed independently for each party. Identifying the messages +corresponding to a single exchange thus relies on the sequence ID as well as +the type of the message, and the context. Specifically, the sequence ID is +used to associate an ``ACK`` with its ``DATA_SEQ``-type frame, but not +``DATA_SEQ``- or ``DATA_NSQ``-type frames with other ``DATA``- type frames. + +An example exchange might look like this: + +:: + + tx: -- SYN FRAME(D) CRC(F) PAYLOAD CRC(P) ----------------------------- + rx: ------------------------------------- SYN FRAME(A) CRC(F) CRC(P) -- + +where both frames have the same sequence ID (``SEQ``). Here, ``FRAME(D)`` +indicates a |DATA_SEQ|-type frame, ``FRAME(A)`` an ``ACK``-type frame, +``CRC(F)`` the CRC over the previous frame, ``CRC(P)`` the CRC over the +previous payload. In case of an error, the exchange would look like this: + +:: + + tx: -- SYN FRAME(D) CRC(F) PAYLOAD CRC(P) ----------------------------- + rx: ------------------------------------- SYN FRAME(N) CRC(F) CRC(P) -- + +upon which the sender should re-send the message. ``FRAME(N)`` indicates an +|NAK|-type frame. Note that the sequence ID of the |NAK|-type frame is fixed +to zero. For |DATA_NSQ|-type frames, both exchanges are the same: + +:: + + tx: -- SYN FRAME(DATA_NSQ) CRC(F) PAYLOAD CRC(P) ---------------------- + rx: ------------------------------------------------------------------- + +Here, an error can be detected, but not corrected or indicated to the +sending party. These exchanges are symmetric, i.e. switching ``rx`` and +``tx`` results again in a valid exchange. Currently, no longer exchanges are +known. + + +Commands: Requests, Responses, and Events +========================================= + +Commands are sent as payload inside a data frame. Currently, this is the +only known payload type of |DATA| frames, with a payload-type value of +``0x80`` (:c:type:`SSH_PLD_TYPE_CMD `). + +The command-type payload (:c:type:`struct ssh_command `) +consists of an eight-byte command structure, followed by optional and +variable length command data. The length of this optional data is derived +from the frame payload length given in the corresponding frame, i.e. it is +``frame.len - sizeof(struct ssh_command)``. The command struct contains the +following fields, packed together and in order: + +.. flat-table:: SSH Command + :widths: 1 1 4 + :header-rows: 1 + + * - Field + - Type + - Description + + * - |TYPE| + - |u8| + - Type of the payload. For commands always ``0x80``. + + * - |TC| + - |u8| + - Target category. + + * - |TID| + - |u8| + - Target ID for commands/messages. + + * - |SID| + - |u8| + - Source ID for commands/messages. + + * - |IID| + - |u8| + - Instance ID. + + * - |RQID| + - |u16| + - Request ID. + + * - |CID| + - |u8| + - Command ID. + +The command struct and data, in general, does not contain any failure +detection mechanism (e.g. CRCs), this is solely done on the frame level. + +Command-type payloads are used by the host to send commands and requests to +the EC as well as by the EC to send responses and events back to the host. +We differentiate between requests (sent by the host), responses (sent by the +EC in response to a request), and events (sent by the EC without a preceding +request). + +Commands and events are uniquely identified by their target category +(``TC``) and command ID (``CID``). The target category specifies a general +category for the command (e.g. system in general, vs. battery and AC, vs. +temperature, and so on), while the command ID specifies the command inside +that category. Only the combination of |TC| + |CID| is unique. Additionally, +commands have an instance ID (``IID``), which is used to differentiate +between different sub-devices. For example ``TC=3`` ``CID=1`` is a +request to get the temperature on a thermal sensor, where |IID| specifies +the respective sensor. If the instance ID is not used, it should be set to +zero. If instance IDs are used, they, in general, start with a value of one, +whereas zero may be used for instance independent queries, if applicable. A +response to a request should have the same target category, command ID, and +instance ID as the corresponding request. + +Responses are matched to their corresponding request via the request ID +(``RQID``) field. This is a 16 bit wrapping counter similar to the sequence +ID on the frames. Note that the sequence ID of the frames for a +request-response pair does not match. Only the request ID has to match. +Frame-protocol wise these are two separate exchanges, and may even be +separated, e.g. by an event being sent after the request but before the +response. Not all commands produce a response, and this is not detectable by +|TC| + |CID|. It is the responsibility of the issuing party to wait for a +response (or signal this to the communication framework, as is done in +SAN/ACPI via the ``SNC`` flag). + +Events are identified by unique and reserved request IDs. These IDs should +not be used by the host when sending a new request. They are used on the +host to, first, detect events and, second, match them with a registered +event handler. Request IDs for events are chosen by the host and directed to +the EC when setting up and enabling an event source (via the +enable-event-source request). The EC then uses the specified request ID for +events sent from the respective source. Note that an event should still be +identified by its target category, command ID, and, if applicable, instance +ID, as a single event source can send multiple different event types. In +general, however, a single target category should map to a single reserved +event request ID. + +Furthermore, requests, responses, and events have an associated target ID +(``TID``) and source ID (``SID``). These two fields indicate where a message +originates from (``SID``) and what the intended target of the message is +(``TID``). Note that a response to a specific request therefore has the source +and target IDs swapped when compared to the original request (i.e. the request +target is the response source and the request source is the response target). +See (:c:type:`enum ssh_request_id `) for possible values of +both. + +Note that, even though requests and events should be uniquely identifiable by +target category and command ID alone, the EC may require specific target ID and +instance ID values to accept a command. A command that is accepted for +``TID=1``, for example, may not be accepted for ``TID=2`` and vice versa. While +this may not always hold in reality, you can think of different target/source +IDs indicating different physical ECs with potentially different feature sets. + + +Limitations and Observations +============================ + +The protocol can, in theory, handle up to ``U8_MAX`` frames in parallel, +with up to ``U16_MAX`` pending requests (neglecting request IDs reserved for +events). In practice, however, this is more limited. From our testing +(although via a python and thus a user-space program), it seems that the EC +can handle up to four requests (mostly) reliably in parallel at a certain +time. With five or more requests in parallel, consistent discarding of +commands (ACKed frame but no command response) has been observed. For five +simultaneous commands, this reproducibly resulted in one command being +dropped and four commands being handled. + +However, it has also been noted that, even with three requests in parallel, +occasional frame drops happen. Apart from this, with a limit of three +pending requests, no dropped commands (i.e. command being dropped but frame +carrying command being ACKed) have been observed. In any case, frames (and +possibly also commands) should be re-sent by the host if a certain timeout +is exceeded. This is done by the EC for frames with a timeout of one second, +up to two re-tries (i.e. three transmissions in total). The limit of +re-tries also applies to received NAKs, and, in a worst case scenario, can +lead to entire messages being dropped. + +While this also seems to work fine for pending data frames as long as no +transmission failures occur, implementation and handling of these seems to +depend on the assumption that there is only one non-acknowledged data frame. +In particular, the detection of repeated frames relies on the last sequence +number. This means that, if a frame that has been successfully received by +the EC is sent again, e.g. due to the host not receiving an |ACK|, the EC +will only detect this if it has the sequence ID of the last frame received +by the EC. As an example: Sending two frames with ``SEQ=0`` and ``SEQ=1`` +followed by a repetition of ``SEQ=0`` will not detect the second ``SEQ=0`` +frame as such, and thus execute the command in this frame each time it has +been received, i.e. twice in this example. Sending ``SEQ=0``, ``SEQ=1`` and +then repeating ``SEQ=1`` will detect the second ``SEQ=1`` as repetition of +the first one and ignore it, thus executing the contained command only once. + +In conclusion, this suggests a limit of at most one pending un-ACKed frame +(per party, effectively leading to synchronous communication regarding +frames) and at most three pending commands. The limit to synchronous frame +transfers seems to be consistent with behavior observed on Windows. -- cgit v1.2.3