From 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Tue, 21 Feb 2023 18:24:12 -0800 Subject: Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - Add dedicated kmem_cache for typical/small skb->head, avoid having to access struct page at kfree time, and improve memory use. - Introduce sysctl to set default RPS configuration for new netdevs. - Define Netlink protocol specification format which can be used to describe messages used by each family and auto-generate parsers. Add tools for generating kernel data structures and uAPI headers. - Expose all net/core sysctls inside netns. - Remove 4s sleep in netpoll if carrier is instantly detected on boot. - Add configurable limit of MDB entries per port, and port-vlan. - Continue populating drop reasons throughout the stack. - Retire a handful of legacy Qdiscs and classifiers. Protocols: - Support IPv4 big TCP (TSO frames larger than 64kB). - Add IP_LOCAL_PORT_RANGE socket option, to control local port range on socket by socket basis. - Track and report in procfs number of MPTCP sockets used. - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path manager. - IPv6: don't check net.ipv6.route.max_size and rely on garbage collection to free memory (similarly to IPv4). - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986). - ICMP: add per-rate limit counters. - Add support for user scanning requests in ieee802154. - Remove static WEP support. - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate reporting. - WiFi 7 EHT channel puncturing support (client & AP). BPF: - Add a rbtree data structure following the "next-gen data structure" precedent set by recently added linked list, that is, by using kfunc + kptr instead of adding a new BPF map type. - Expose XDP hints via kfuncs with initial support for RX hash and timestamp metadata. - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better support decap on GRE tunnel devices not operating in collect metadata. - Improve x86 JIT's codegen for PROBE_MEM runtime error checks. - Remove the need for trace_printk_lock for bpf_trace_printk and bpf_trace_vprintk helpers. - Extend libbpf's bpf_tracing.h support for tracing arguments of kprobes/uprobes and syscall as a special case. - Significantly reduce the search time for module symbols by livepatch and BPF. - Enable cpumasks to be used as kptrs, which is useful for tracing programs tracking which tasks end up running on which CPUs in different time intervals. - Add support for BPF trampoline on s390x and riscv64. - Add capability to export the XDP features supported by the NIC. - Add __bpf_kfunc tag for marking kernel functions as kfuncs. - Add cgroup.memory=nobpf kernel parameter option to disable BPF memory accounting for container environments. Netfilter: - Remove the CLUSTERIP target. It has been marked as obsolete for years, and we still have WARN splats wrt races of the out-of-band /proc interface installed by this target. - Add 'destroy' commands to nf_tables. They are identical to the existing 'delete' commands, but do not return an error if the referenced object (set, chain, rule...) did not exist. Driver API: - Improve cpumask_local_spread() locality to help NICs set the right IRQ affinity on AMD platforms. - Separate C22 and C45 MDIO bus transactions more clearly. - Introduce new DCB table to control DSCP rewrite on egress. - Support configuration of Physical Layer Collision Avoidance (PLCA) Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of shared medium Ethernet. - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing preemption of low priority frames by high priority frames. - Add support for controlling MACSec offload using netlink SET. - Rework devlink instance refcounts to allow registration and de-registration under the instance lock. Split the code into multiple files, drop some of the unnecessarily granular locks and factor out common parts of netlink operation handling. - Add TX frame aggregation parameters (for USB drivers). - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning messages with notifications for debug. - Allow offloading of UDP NEW connections via act_ct. - Add support for per action HW stats in TC. - Support hardware miss to TC action (continue processing in SW from a specific point in the action chain). - Warn if old Wireless Extension user space interface is used with modern cfg80211/mac80211 drivers. Do not support Wireless Extensions for Wi-Fi 7 devices at all. Everyone should switch to using nl80211 interface instead. - Improve the CAN bit timing configuration. Use extack to return error messages directly to user space, update the SJW handling, including the definition of a new default value that will benefit CAN-FD controllers, by increasing their oscillator tolerance. New hardware / drivers: - Ethernet: - nVidia BlueField-3 support (control traffic driver) - Ethernet support for imx93 SoCs - Motorcomm yt8531 gigabit Ethernet PHY - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA) - Microchip LAN8841 PHY (incl. cable diagnostics and PTP) - Amlogic gxl MDIO mux - WiFi: - RealTek RTL8188EU (rtl8xxxu) - Qualcomm Wi-Fi 7 devices (ath12k) - CAN: - Renesas R-Car V4H Drivers: - Bluetooth: - Set Per Platform Antenna Gain (PPAG) for Intel controllers. - Ethernet NICs: - Intel (1G, igc): - support TSN / Qbv / packet scheduling features of i226 model - Intel (100G, ice): - use GNSS subsystem instead of TTY - multi-buffer XDP support - extend support for GPIO pins to E823 devices - nVidia/Mellanox: - update the shared buffer configuration on PFC commands - implement PTP adjphase function for HW offset control - TC support for Geneve and GRE with VF tunnel offload - more efficient crypto key management method - multi-port eswitch support - Netronome/Corigine: - add DCB IEEE support - support IPsec offloading for NFP3800 - Freescale/NXP (enetc): - support XDP_REDIRECT for XDP non-linear buffers - improve reconfig, avoid link flap and waiting for idle - support MAC Merge layer - Other NICs: - sfc/ef100: add basic devlink support for ef100 - ionic: rx_push mode operation (writing descriptors via MMIO) - bnxt: use the auxiliary bus abstraction for RDMA - r8169: disable ASPM and reset bus in case of tx timeout - cpsw: support QSGMII mode for J721e CPSW9G - cpts: support pulse-per-second output - ngbe: add an mdio bus driver - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing - r8152: handle devices with FW with NCM support - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation - virtio-net: support multi buffer XDP - virtio/vsock: replace virtio_vsock_pkt with sk_buff - tsnep: XDP support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add support for latency TLV (in FW control messages) - Microchip (sparx5): - separate explicit and implicit traffic forwarding rules, make the implicit rules always active - add support for egress DSCP rewrite - IS0 VCAP support (Ingress Classification) - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS etc.) - ES2 VCAP support (Egress Access Control) - support for Per-Stream Filtering and Policing (802.1Q, 8.6.5.1) - Ethernet embedded switches: - Marvell (mv88e6xxx): - add MAB (port auth) offload support - enable PTP receive for mv88e6390 - NXP (ocelot): - support MAC Merge layer - support for the the vsc7512 internal copper phys - Microchip: - lan9303: convert to PHYLINK - lan966x: support TC flower filter statistics - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x - lan937x: support Credit Based Shaper configuration - ksz9477: support Energy Efficient Ethernet - other: - qca8k: convert to regmap read/write API, use bulk operations - rswitch: Improve TX timestamp accuracy - Intel WiFi (iwlwifi): - EHT (Wi-Fi 7) rate reporting - STEP equalizer support: transfer some STEP (connection to radio on platforms with integrated wifi) related parameters from the BIOS to the firmware. - Qualcomm 802.11ax WiFi (ath11k): - IPQ5018 support - Fine Timing Measurement (FTM) responder role support - channel 177 support - MediaTek WiFi (mt76): - per-PHY LED support - mt7996: EHT (Wi-Fi 7) support - Wireless Ethernet Dispatch (WED) reset support - switch to using page pool allocator - RealTek WiFi (rtw89): - support new version of Bluetooth co-existance - Mobile: - rmnet: support TX aggregation" * tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits) page_pool: add a comment explaining the fragment counter usage net: ethtool: fix __ethtool_dev_mm_supported() implementation ethtool: pse-pd: Fix double word in comments xsk: add linux/vmalloc.h to xsk.c sefltests: netdevsim: wait for devlink instance after netns removal selftest: fib_tests: Always cleanup before exit net/mlx5e: Align IPsec ASO result memory to be as required by hardware net/mlx5e: TC, Set CT miss to the specific ct action instance net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG net/mlx5: Refactor tc miss handling to a single function net/mlx5: Kconfig: Make tc offload depend on tc skb extension net/sched: flower: Support hardware miss to tc action net/sched: flower: Move filter handle initialization earlier net/sched: cls_api: Support hardware miss to tc action net/sched: Rename user cookie and act cookie sfc: fix builds without CONFIG_RTC_LIB sfc: clean up some inconsistent indentings net/mlx4_en: Introduce flexible array to silence overflow warning net: lan966x: Fix possible deadlock inside PTP net/ulp: Remove redundant ->clone() test in inet_clone_ulp(). ... --- .../device_drivers/ethernet/intel/ice.rst | 1042 ++++++++++++++++++++ 1 file changed, 1042 insertions(+) create mode 100644 Documentation/networking/device_drivers/ethernet/intel/ice.rst (limited to 'Documentation/networking/device_drivers/ethernet/intel/ice.rst') diff --git a/Documentation/networking/device_drivers/ethernet/intel/ice.rst b/Documentation/networking/device_drivers/ethernet/intel/ice.rst new file mode 100644 index 000000000..5efea4dd1 --- /dev/null +++ b/Documentation/networking/device_drivers/ethernet/intel/ice.rst @@ -0,0 +1,1042 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +================================================================= +Linux Base Driver for the Intel(R) Ethernet Controller 800 Series +================================================================= + +Intel ice Linux driver. +Copyright(c) 2018-2021 Intel Corporation. + +Contents +======== + +- Overview +- Identifying Your Adapter +- Important Notes +- Additional Features & Configurations +- Performance Optimization + + +The associated Virtual Function (VF) driver for this driver is iavf. + +Driver information can be obtained using ethtool and lspci. + +For questions related to hardware requirements, refer to the documentation +supplied with your Intel adapter. All hardware requirements listed apply to use +with Linux. + +This driver supports XDP (Express Data Path) and AF_XDP zero-copy. Note that +XDP is blocked for frame sizes larger than 3KB. + + +Identifying Your Adapter +======================== +For information on how to identify your adapter, and for the latest Intel +network drivers, refer to the Intel Support website: +https://www.intel.com/support + + +Important Notes +=============== + +Packet drops may occur under receive stress +------------------------------------------- +Devices based on the Intel(R) Ethernet Controller 800 Series are designed to +tolerate a limited amount of system latency during PCIe and DMA transactions. +If these transactions take longer than the tolerated latency, it can impact the +length of time the packets are buffered in the device and associated memory, +which may result in dropped packets. These packets drops typically do not have +a noticeable impact on throughput and performance under standard workloads. + +If these packet drops appear to affect your workload, the following may improve +the situation: + +1) Make sure that your system's physical memory is in a high-performance + configuration, as recommended by the platform vendor. A common + recommendation is for all channels to be populated with a single DIMM + module. +2) In your system's BIOS/UEFI settings, select the "Performance" profile. +3) Your distribution may provide tools like "tuned," which can help tweak + kernel settings to achieve better standard settings for different workloads. + + +Configuring SR-IOV for improved network security +------------------------------------------------ +In a virtualized environment, on Intel(R) Ethernet Network Adapters that +support SR-IOV, the virtual function (VF) may be subject to malicious behavior. +Software-generated layer two frames, like IEEE 802.3x (link flow control), IEEE +802.1Qbb (priority based flow-control), and others of this type, are not +expected and can throttle traffic between the host and the virtual switch, +reducing performance. To resolve this issue, and to ensure isolation from +unintended traffic streams, configure all SR-IOV enabled ports for VLAN tagging +from the administrative interface on the PF. This configuration allows +unexpected, and potentially malicious, frames to be dropped. + +See "Configuring VLAN Tagging on SR-IOV Enabled Adapter Ports" later in this +README for configuration instructions. + + +Do not unload port driver if VF with active VM is bound to it +------------------------------------------------------------- +Do not unload a port's driver if a Virtual Function (VF) with an active Virtual +Machine (VM) is bound to it. Doing so will cause the port to appear to hang. +Once the VM shuts down, or otherwise releases the VF, the command will +complete. + + +Important notes for SR-IOV and Link Aggregation +----------------------------------------------- +Link Aggregation is mutually exclusive with SR-IOV. + +- If Link Aggregation is active, SR-IOV VFs cannot be created on the PF. +- If SR-IOV is active, you cannot set up Link Aggregation on the interface. + +Bridging and MACVLAN are also affected by this. If you wish to use bridging or +MACVLAN with SR-IOV, you must set up bridging or MACVLAN before enabling +SR-IOV. If you are using bridging or MACVLAN in conjunction with SR-IOV, and +you want to remove the interface from the bridge or MACVLAN, you must follow +these steps: + +1. Destroy SR-IOV VFs if they exist +2. Remove the interface from the bridge or MACVLAN +3. Recreate SRIOV VFs as needed + + +Additional Features and Configurations +====================================== + +ethtool +------- +The driver utilizes the ethtool interface for driver configuration and +diagnostics, as well as displaying statistical information. The latest ethtool +version is required for this functionality. Download it at: +https://kernel.org/pub/software/network/ethtool/ + +NOTE: The rx_bytes value of ethtool does not match the rx_bytes value of +Netdev, due to the 4-byte CRC being stripped by the device. The difference +between the two rx_bytes values will be 4 x the number of Rx packets. For +example, if Rx packets are 10 and Netdev (software statistics) displays +rx_bytes as "X", then ethtool (hardware statistics) will display rx_bytes as +"X+40" (4 bytes CRC x 10 packets). + + +Viewing Link Messages +--------------------- +Link messages will not be displayed to the console if the distribution is +restricting system messages. In order to see network driver link messages on +your console, set dmesg to eight by entering the following:: + + # dmesg -n 8 + +NOTE: This setting is not saved across reboots. + + +Dynamic Device Personalization +------------------------------ +Dynamic Device Personalization (DDP) allows you to change the packet processing +pipeline of a device by applying a profile package to the device at runtime. +Profiles can be used to, for example, add support for new protocols, change +existing protocols, or change default settings. DDP profiles can also be rolled +back without rebooting the system. + +The DDP package loads during device initialization. The driver looks for +``intel/ice/ddp/ice.pkg`` in your firmware root (typically ``/lib/firmware/`` +or ``/lib/firmware/updates/``) and checks that it contains a valid DDP package +file. + +NOTE: Your distribution should likely have provided the latest DDP file, but if +ice.pkg is missing, you can find it in the linux-firmware repository or from +intel.com. + +If the driver is unable to load the DDP package, the device will enter Safe +Mode. Safe Mode disables advanced and performance features and supports only +basic traffic and minimal functionality, such as updating the NVM or +downloading a new driver or DDP package. Safe Mode only applies to the affected +physical function and does not impact any other PFs. See the "Intel(R) Ethernet +Adapters and Devices User Guide" for more details on DDP and Safe Mode. + +NOTES: + +- If you encounter issues with the DDP package file, you may need to download + an updated driver or DDP package file. See the log messages for more + information. + +- The ice.pkg file is a symbolic link to the default DDP package file. + +- You cannot update the DDP package if any PF drivers are already loaded. To + overwrite a package, unload all PFs and then reload the driver with the new + package. + +- Only the first loaded PF per device can download a package for that device. + +You can install specific DDP package files for different physical devices in +the same system. To install a specific DDP package file: + +1. Download the DDP package file you want for your device. + +2. Rename the file ice-xxxxxxxxxxxxxxxx.pkg, where 'xxxxxxxxxxxxxxxx' is the + unique 64-bit PCI Express device serial number (in hex) of the device you + want the package downloaded on. The filename must include the complete + serial number (including leading zeros) and be all lowercase. For example, + if the 64-bit serial number is b887a3ffffca0568, then the file name would be + ice-b887a3ffffca0568.pkg. + + To find the serial number from the PCI bus address, you can use the + following command:: + + # lspci -vv -s af:00.0 | grep -i Serial + Capabilities: [150 v1] Device Serial Number b8-87-a3-ff-ff-ca-05-68 + + You can use the following command to format the serial number without the + dashes:: + + # lspci -vv -s af:00.0 | grep -i Serial | awk '{print $7}' | sed s/-//g + b887a3ffffca0568 + +3. Copy the renamed DDP package file to + ``/lib/firmware/updates/intel/ice/ddp/``. If the directory does not yet + exist, create it before copying the file. + +4. Unload all of the PFs on the device. + +5. Reload the driver with the new package. + +NOTE: The presence of a device-specific DDP package file overrides the loading +of the default DDP package file (ice.pkg). + + +Intel(R) Ethernet Flow Director +------------------------------- +The Intel Ethernet Flow Director performs the following tasks: + +- Directs receive packets according to their flows to different queues +- Enables tight control on routing a flow in the platform +- Matches flows and CPU cores for flow affinity + +NOTE: This driver supports the following flow types: + +- IPv4 +- TCPv4 +- UDPv4 +- SCTPv4 +- IPv6 +- TCPv6 +- UDPv6 +- SCTPv6 + +Each flow type supports valid combinations of IP addresses (source or +destination) and UDP/TCP/SCTP ports (source and destination). You can supply +only a source IP address, a source IP address and a destination port, or any +combination of one or more of these four parameters. + +NOTE: This driver allows you to filter traffic based on a user-defined flexible +two-byte pattern and offset by using the ethtool user-def and mask fields. Only +L3 and L4 flow types are supported for user-defined flexible filters. For a +given flow type, you must clear all Intel Ethernet Flow Director filters before +changing the input set (for that flow type). + + +Flow Director Filters +--------------------- +Flow Director filters are used to direct traffic that matches specified +characteristics. They are enabled through ethtool's ntuple interface. To enable +or disable the Intel Ethernet Flow Director and these filters:: + + # ethtool -K ntuple + +NOTE: When you disable ntuple filters, all the user programmed filters are +flushed from the driver cache and hardware. All needed filters must be re-added +when ntuple is re-enabled. + +To display all of the active filters:: + + # ethtool -u + +To add a new filter:: + + # ethtool -U flow-type src-ip [m ] dst-ip + [m ] src-port [m ] dst-port [m ] + action + + Where: + - the Ethernet device to program + - can be ip4, tcp4, udp4, sctp4, ip6, tcp6, udp6, sctp6 + - the IP address to match on + - the IPv4 address to mask on + NOTE: These filters use inverted masks. + - the port number to match on + - the 16-bit integer for masking + NOTE: These filters use inverted masks. + - the queue to direct traffic toward (-1 discards the + matched traffic) + +To delete a filter:: + + # ethtool -U delete + + Where is the filter ID displayed when printing all the active filters, + and may also have been specified using "loc " when adding the filter. + +EXAMPLES: + +To add a filter that directs packet to queue 2:: + + # ethtool -U flow-type tcp4 src-ip 192.168.10.1 dst-ip \ + 192.168.10.2 src-port 2000 dst-port 2001 action 2 [loc 1] + +To set a filter using only the source and destination IP address:: + + # ethtool -U flow-type tcp4 src-ip 192.168.10.1 dst-ip \ + 192.168.10.2 action 2 [loc 1] + +To set a filter based on a user-defined pattern and offset:: + + # ethtool -U flow-type tcp4 src-ip 192.168.10.1 dst-ip \ + 192.168.10.2 user-def 0x4FFFF action 2 [loc 1] + + where the value of the user-def field contains the offset (4 bytes) and + the pattern (0xffff). + +To match TCP traffic sent from 192.168.0.1, port 5300, directed to 192.168.0.5, +port 80, and then send it to queue 7:: + + # ethtool -U enp130s0 flow-type tcp4 src-ip 192.168.0.1 dst-ip 192.168.0.5 + src-port 5300 dst-port 80 action 7 + +To add a TCPv4 filter with a partial mask for a source IP subnet:: + + # ethtool -U flow-type tcp4 src-ip 192.168.0.0 m 0.255.255.255 dst-ip + 192.168.5.12 src-port 12600 dst-port 31 action 12 + +NOTES: + +For each flow-type, the programmed filters must all have the same matching +input set. For example, issuing the following two commands is acceptable:: + + # ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7 + # ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.5 src-port 55 action 10 + +Issuing the next two commands, however, is not acceptable, since the first +specifies src-ip and the second specifies dst-ip:: + + # ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7 + # ethtool -U enp130s0 flow-type ip4 dst-ip 192.168.0.5 src-port 55 action 10 + +The second command will fail with an error. You may program multiple filters +with the same fields, using different values, but, on one device, you may not +program two tcp4 filters with different matching fields. + +The ice driver does not support matching on a subportion of a field, thus +partial mask fields are not supported. + + +Flex Byte Flow Director Filters +------------------------------- +The driver also supports matching user-defined data within the packet payload. +This flexible data is specified using the "user-def" field of the ethtool +command in the following way: + +.. table:: + + ============================== ============================ + ``31 28 24 20 16`` ``15 12 8 4 0`` + ``offset into packet payload`` ``2 bytes of flexible data`` + ============================== ============================ + +For example, + +:: + + ... user-def 0x4FFFF ... + +tells the filter to look 4 bytes into the payload and match that value against +0xFFFF. The offset is based on the beginning of the payload, and not the +beginning of the packet. Thus + +:: + + flow-type tcp4 ... user-def 0x8BEAF ... + +would match TCP/IPv4 packets which have the value 0xBEAF 8 bytes into the +TCP/IPv4 payload. + +Note that ICMP headers are parsed as 4 bytes of header and 4 bytes of payload. +Thus to match the first byte of the payload, you must actually add 4 bytes to +the offset. Also note that ip4 filters match both ICMP frames as well as raw +(unknown) ip4 frames, where the payload will be the L3 payload of the IP4 +frame. + +The maximum offset is 64. The hardware will only read up to 64 bytes of data +from the payload. The offset must be even because the flexible data is 2 bytes +long and must be aligned to byte 0 of the packet payload. + +The user-defined flexible offset is also considered part of the input set and +cannot be programmed separately for multiple filters of the same type. However, +the flexible data is not part of the input set and multiple filters may use the +same offset but match against different data. + + +RSS Hash Flow +------------- +Allows you to set the hash bytes per flow type and any combination of one or +more options for Receive Side Scaling (RSS) hash byte configuration. + +:: + + # ethtool -N rx-flow-hash