From 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Tue, 21 Feb 2023 18:24:12 -0800 Subject: Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - Add dedicated kmem_cache for typical/small skb->head, avoid having to access struct page at kfree time, and improve memory use. - Introduce sysctl to set default RPS configuration for new netdevs. - Define Netlink protocol specification format which can be used to describe messages used by each family and auto-generate parsers. Add tools for generating kernel data structures and uAPI headers. - Expose all net/core sysctls inside netns. - Remove 4s sleep in netpoll if carrier is instantly detected on boot. - Add configurable limit of MDB entries per port, and port-vlan. - Continue populating drop reasons throughout the stack. - Retire a handful of legacy Qdiscs and classifiers. Protocols: - Support IPv4 big TCP (TSO frames larger than 64kB). - Add IP_LOCAL_PORT_RANGE socket option, to control local port range on socket by socket basis. - Track and report in procfs number of MPTCP sockets used. - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path manager. - IPv6: don't check net.ipv6.route.max_size and rely on garbage collection to free memory (similarly to IPv4). - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986). - ICMP: add per-rate limit counters. - Add support for user scanning requests in ieee802154. - Remove static WEP support. - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate reporting. - WiFi 7 EHT channel puncturing support (client & AP). BPF: - Add a rbtree data structure following the "next-gen data structure" precedent set by recently added linked list, that is, by using kfunc + kptr instead of adding a new BPF map type. - Expose XDP hints via kfuncs with initial support for RX hash and timestamp metadata. - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better support decap on GRE tunnel devices not operating in collect metadata. - Improve x86 JIT's codegen for PROBE_MEM runtime error checks. - Remove the need for trace_printk_lock for bpf_trace_printk and bpf_trace_vprintk helpers. - Extend libbpf's bpf_tracing.h support for tracing arguments of kprobes/uprobes and syscall as a special case. - Significantly reduce the search time for module symbols by livepatch and BPF. - Enable cpumasks to be used as kptrs, which is useful for tracing programs tracking which tasks end up running on which CPUs in different time intervals. - Add support for BPF trampoline on s390x and riscv64. - Add capability to export the XDP features supported by the NIC. - Add __bpf_kfunc tag for marking kernel functions as kfuncs. - Add cgroup.memory=nobpf kernel parameter option to disable BPF memory accounting for container environments. Netfilter: - Remove the CLUSTERIP target. It has been marked as obsolete for years, and we still have WARN splats wrt races of the out-of-band /proc interface installed by this target. - Add 'destroy' commands to nf_tables. They are identical to the existing 'delete' commands, but do not return an error if the referenced object (set, chain, rule...) did not exist. Driver API: - Improve cpumask_local_spread() locality to help NICs set the right IRQ affinity on AMD platforms. - Separate C22 and C45 MDIO bus transactions more clearly. - Introduce new DCB table to control DSCP rewrite on egress. - Support configuration of Physical Layer Collision Avoidance (PLCA) Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of shared medium Ethernet. - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing preemption of low priority frames by high priority frames. - Add support for controlling MACSec offload using netlink SET. - Rework devlink instance refcounts to allow registration and de-registration under the instance lock. Split the code into multiple files, drop some of the unnecessarily granular locks and factor out common parts of netlink operation handling. - Add TX frame aggregation parameters (for USB drivers). - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning messages with notifications for debug. - Allow offloading of UDP NEW connections via act_ct. - Add support for per action HW stats in TC. - Support hardware miss to TC action (continue processing in SW from a specific point in the action chain). - Warn if old Wireless Extension user space interface is used with modern cfg80211/mac80211 drivers. Do not support Wireless Extensions for Wi-Fi 7 devices at all. Everyone should switch to using nl80211 interface instead. - Improve the CAN bit timing configuration. Use extack to return error messages directly to user space, update the SJW handling, including the definition of a new default value that will benefit CAN-FD controllers, by increasing their oscillator tolerance. New hardware / drivers: - Ethernet: - nVidia BlueField-3 support (control traffic driver) - Ethernet support for imx93 SoCs - Motorcomm yt8531 gigabit Ethernet PHY - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA) - Microchip LAN8841 PHY (incl. cable diagnostics and PTP) - Amlogic gxl MDIO mux - WiFi: - RealTek RTL8188EU (rtl8xxxu) - Qualcomm Wi-Fi 7 devices (ath12k) - CAN: - Renesas R-Car V4H Drivers: - Bluetooth: - Set Per Platform Antenna Gain (PPAG) for Intel controllers. - Ethernet NICs: - Intel (1G, igc): - support TSN / Qbv / packet scheduling features of i226 model - Intel (100G, ice): - use GNSS subsystem instead of TTY - multi-buffer XDP support - extend support for GPIO pins to E823 devices - nVidia/Mellanox: - update the shared buffer configuration on PFC commands - implement PTP adjphase function for HW offset control - TC support for Geneve and GRE with VF tunnel offload - more efficient crypto key management method - multi-port eswitch support - Netronome/Corigine: - add DCB IEEE support - support IPsec offloading for NFP3800 - Freescale/NXP (enetc): - support XDP_REDIRECT for XDP non-linear buffers - improve reconfig, avoid link flap and waiting for idle - support MAC Merge layer - Other NICs: - sfc/ef100: add basic devlink support for ef100 - ionic: rx_push mode operation (writing descriptors via MMIO) - bnxt: use the auxiliary bus abstraction for RDMA - r8169: disable ASPM and reset bus in case of tx timeout - cpsw: support QSGMII mode for J721e CPSW9G - cpts: support pulse-per-second output - ngbe: add an mdio bus driver - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing - r8152: handle devices with FW with NCM support - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation - virtio-net: support multi buffer XDP - virtio/vsock: replace virtio_vsock_pkt with sk_buff - tsnep: XDP support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add support for latency TLV (in FW control messages) - Microchip (sparx5): - separate explicit and implicit traffic forwarding rules, make the implicit rules always active - add support for egress DSCP rewrite - IS0 VCAP support (Ingress Classification) - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS etc.) - ES2 VCAP support (Egress Access Control) - support for Per-Stream Filtering and Policing (802.1Q, 8.6.5.1) - Ethernet embedded switches: - Marvell (mv88e6xxx): - add MAB (port auth) offload support - enable PTP receive for mv88e6390 - NXP (ocelot): - support MAC Merge layer - support for the the vsc7512 internal copper phys - Microchip: - lan9303: convert to PHYLINK - lan966x: support TC flower filter statistics - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x - lan937x: support Credit Based Shaper configuration - ksz9477: support Energy Efficient Ethernet - other: - qca8k: convert to regmap read/write API, use bulk operations - rswitch: Improve TX timestamp accuracy - Intel WiFi (iwlwifi): - EHT (Wi-Fi 7) rate reporting - STEP equalizer support: transfer some STEP (connection to radio on platforms with integrated wifi) related parameters from the BIOS to the firmware. - Qualcomm 802.11ax WiFi (ath11k): - IPQ5018 support - Fine Timing Measurement (FTM) responder role support - channel 177 support - MediaTek WiFi (mt76): - per-PHY LED support - mt7996: EHT (Wi-Fi 7) support - Wireless Ethernet Dispatch (WED) reset support - switch to using page pool allocator - RealTek WiFi (rtw89): - support new version of Bluetooth co-existance - Mobile: - rmnet: support TX aggregation" * tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits) page_pool: add a comment explaining the fragment counter usage net: ethtool: fix __ethtool_dev_mm_supported() implementation ethtool: pse-pd: Fix double word in comments xsk: add linux/vmalloc.h to xsk.c sefltests: netdevsim: wait for devlink instance after netns removal selftest: fib_tests: Always cleanup before exit net/mlx5e: Align IPsec ASO result memory to be as required by hardware net/mlx5e: TC, Set CT miss to the specific ct action instance net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG net/mlx5: Refactor tc miss handling to a single function net/mlx5: Kconfig: Make tc offload depend on tc skb extension net/sched: flower: Support hardware miss to tc action net/sched: flower: Move filter handle initialization earlier net/sched: cls_api: Support hardware miss to tc action net/sched: Rename user cookie and act cookie sfc: fix builds without CONFIG_RTC_LIB sfc: clean up some inconsistent indentings net/mlx4_en: Introduce flexible array to silence overflow warning net: lan966x: Fix possible deadlock inside PTP net/ulp: Remove redundant ->clone() test in inet_clone_ulp(). ... --- drivers/net/ethernet/intel/ice/ice_arfs.c | 656 ++++++++++++++++++++++++++++++ 1 file changed, 656 insertions(+) create mode 100644 drivers/net/ethernet/intel/ice/ice_arfs.c (limited to 'drivers/net/ethernet/intel/ice/ice_arfs.c') diff --git a/drivers/net/ethernet/intel/ice/ice_arfs.c b/drivers/net/ethernet/intel/ice/ice_arfs.c new file mode 100644 index 000000000..fba178e07 --- /dev/null +++ b/drivers/net/ethernet/intel/ice/ice_arfs.c @@ -0,0 +1,656 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2018-2020, Intel Corporation. */ + +#include "ice.h" + +/** + * ice_is_arfs_active - helper to check is aRFS is active + * @vsi: VSI to check + */ +static bool ice_is_arfs_active(struct ice_vsi *vsi) +{ + return !!vsi->arfs_fltr_list; +} + +/** + * ice_is_arfs_using_perfect_flow - check if aRFS has active perfect filters + * @hw: pointer to the HW structure + * @flow_type: flow type as Flow Director understands it + * + * Flow Director will query this function to see if aRFS is currently using + * the specified flow_type for perfect (4-tuple) filters. + */ +bool +ice_is_arfs_using_perfect_flow(struct ice_hw *hw, enum ice_fltr_ptype flow_type) +{ + struct ice_arfs_active_fltr_cntrs *arfs_fltr_cntrs; + struct ice_pf *pf = hw->back; + struct ice_vsi *vsi; + + vsi = ice_get_main_vsi(pf); + if (!vsi) + return false; + + arfs_fltr_cntrs = vsi->arfs_fltr_cntrs; + + /* active counters can be updated by multiple CPUs */ + smp_mb__before_atomic(); + switch (flow_type) { + case ICE_FLTR_PTYPE_NONF_IPV4_UDP: + return atomic_read(&arfs_fltr_cntrs->active_udpv4_cnt) > 0; + case ICE_FLTR_PTYPE_NONF_IPV6_UDP: + return atomic_read(&arfs_fltr_cntrs->active_udpv6_cnt) > 0; + case ICE_FLTR_PTYPE_NONF_IPV4_TCP: + return atomic_read(&arfs_fltr_cntrs->active_tcpv4_cnt) > 0; + case ICE_FLTR_PTYPE_NONF_IPV6_TCP: + return atomic_read(&arfs_fltr_cntrs->active_tcpv6_cnt) > 0; + default: + return false; + } +} + +/** + * ice_arfs_update_active_fltr_cntrs - update active filter counters for aRFS + * @vsi: VSI that aRFS is active on + * @entry: aRFS entry used to change counters + * @add: true to increment counter, false to decrement + */ +static void +ice_arfs_update_active_fltr_cntrs(struct ice_vsi *vsi, + struct ice_arfs_entry *entry, bool add) +{ + struct ice_arfs_active_fltr_cntrs *fltr_cntrs = vsi->arfs_fltr_cntrs; + + switch (entry->fltr_info.flow_type) { + case ICE_FLTR_PTYPE_NONF_IPV4_TCP: + if (add) + atomic_inc(&fltr_cntrs->active_tcpv4_cnt); + else + atomic_dec(&fltr_cntrs->active_tcpv4_cnt); + break; + case ICE_FLTR_PTYPE_NONF_IPV6_TCP: + if (add) + atomic_inc(&fltr_cntrs->active_tcpv6_cnt); + else + atomic_dec(&fltr_cntrs->active_tcpv6_cnt); + break; + case ICE_FLTR_PTYPE_NONF_IPV4_UDP: + if (add) + atomic_inc(&fltr_cntrs->active_udpv4_cnt); + else + atomic_dec(&fltr_cntrs->active_udpv4_cnt); + break; + case ICE_FLTR_PTYPE_NONF_IPV6_UDP: + if (add) + atomic_inc(&fltr_cntrs->active_udpv6_cnt); + else + atomic_dec(&fltr_cntrs->active_udpv6_cnt); + break; + default: + dev_err(ice_pf_to_dev(vsi->back), "aRFS: Failed to update filter counters, invalid filter type %d\n", + entry->fltr_info.flow_type); + } +} + +/** + * ice_arfs_del_flow_rules - delete the rules passed in from HW + * @vsi: VSI for the flow rules that need to be deleted + * @del_list_head: head of the list of ice_arfs_entry(s) for rule deletion + * + * Loop through the delete list passed in and remove the rules from HW. After + * each rule is deleted, disconnect and free the ice_arfs_entry because it is no + * longer being referenced by the aRFS hash table. + */ +static void +ice_arfs_del_flow_rules(struct ice_vsi *vsi, struct hlist_head *del_list_head) +{ + struct ice_arfs_entry *e; + struct hlist_node *n; + struct device *dev; + + dev = ice_pf_to_dev(vsi->back); + + hlist_for_each_entry_safe(e, n, del_list_head, list_entry) { + int result; + + result = ice_fdir_write_fltr(vsi->back, &e->fltr_info, false, + false); + if (!result) + ice_arfs_update_active_fltr_cntrs(vsi, e, false); + else + dev_dbg(dev, "Unable to delete aRFS entry, err %d fltr_state %d fltr_id %d flow_id %d Q %d\n", + result, e->fltr_state, e->fltr_info.fltr_id, + e->flow_id, e->fltr_info.q_index); + + /* The aRFS hash table is no longer referencing this entry */ + hlist_del(&e->list_entry); + devm_kfree(dev, e); + } +} + +/** + * ice_arfs_add_flow_rules - add the rules passed in from HW + * @vsi: VSI for the flow rules that need to be added + * @add_list_head: head of the list of ice_arfs_entry_ptr(s) for rule addition + * + * Loop through the add list passed in and remove the rules from HW. After each + * rule is added, disconnect and free the ice_arfs_entry_ptr node. Don't free + * the ice_arfs_entry(s) because they are still being referenced in the aRFS + * hash table. + */ +static void +ice_arfs_add_flow_rules(struct ice_vsi *vsi, struct hlist_head *add_list_head) +{ + struct ice_arfs_entry_ptr *ep; + struct hlist_node *n; + struct device *dev; + + dev = ice_pf_to_dev(vsi->back); + + hlist_for_each_entry_safe(ep, n, add_list_head, list_entry) { + int result; + + result = ice_fdir_write_fltr(vsi->back, + &ep->arfs_entry->fltr_info, true, + false); + if (!result) + ice_arfs_update_active_fltr_cntrs(vsi, ep->arfs_entry, + true); + else + dev_dbg(dev, "Unable to add aRFS entry, err %d fltr_state %d fltr_id %d flow_id %d Q %d\n", + result, ep->arfs_entry->fltr_state, + ep->arfs_entry->fltr_info.fltr_id, + ep->arfs_entry->flow_id, + ep->arfs_entry->fltr_info.q_index); + + hlist_del(&ep->list_entry); + devm_kfree(dev, ep); + } +} + +/** + * ice_arfs_is_flow_expired - check if the aRFS entry has expired + * @vsi: VSI containing the aRFS entry + * @arfs_entry: aRFS entry that's being checked for expiration + * + * Return true if the flow has expired, else false. This function should be used + * to determine whether or not an aRFS entry should be removed from the hardware + * and software structures. + */ +static bool +ice_arfs_is_flow_expired(struct ice_vsi *vsi, struct ice_arfs_entry *arfs_entry) +{ +#define ICE_ARFS_TIME_DELTA_EXPIRATION msecs_to_jiffies(5000) + if (rps_may_expire_flow(vsi->netdev, arfs_entry->fltr_info.q_index, + arfs_entry->flow_id, + arfs_entry->fltr_info.fltr_id)) + return true; + + /* expiration timer only used for UDP filters */ + if (arfs_entry->fltr_info.flow_type != ICE_FLTR_PTYPE_NONF_IPV4_UDP && + arfs_entry->fltr_info.flow_type != ICE_FLTR_PTYPE_NONF_IPV6_UDP) + return false; + + return time_in_range64(arfs_entry->time_activated + + ICE_ARFS_TIME_DELTA_EXPIRATION, + arfs_entry->time_activated, get_jiffies_64()); +} + +/** + * ice_arfs_update_flow_rules - add/delete aRFS rules in HW + * @vsi: the VSI to be forwarded to + * @idx: index into the table of aRFS filter lists. Obtained from skb->hash + * @add_list: list to populate with filters to be added to Flow Director + * @del_list: list to populate with filters to be deleted from Flow Director + * + * Iterate over the hlist at the index given in the aRFS hash table and + * determine if there are any aRFS entries that need to be either added or + * deleted in the HW. If the aRFS entry is marked as ICE_ARFS_INACTIVE the + * filter needs to be added to HW, else if it's marked as ICE_ARFS_ACTIVE and + * the flow has expired delete the filter from HW. The caller of this function + * is expected to add/delete rules on the add_list/del_list respectively. + */ +static void +ice_arfs_update_flow_rules(struct ice_vsi *vsi, u16 idx, + struct hlist_head *add_list, + struct hlist_head *del_list) +{ + struct ice_arfs_entry *e; + struct hlist_node *n; + struct device *dev; + + dev = ice_pf_to_dev(vsi->back); + + /* go through the aRFS hlist at this idx and check for needed updates */ + hlist_for_each_entry_safe(e, n, &vsi->arfs_fltr_list[idx], list_entry) + /* check if filter needs to be added to HW */ + if (e->fltr_state == ICE_ARFS_INACTIVE) { + enum ice_fltr_ptype flow_type = e->fltr_info.flow_type; + struct ice_arfs_entry_ptr *ep = + devm_kzalloc(dev, sizeof(*ep), GFP_ATOMIC); + + if (!ep) + continue; + INIT_HLIST_NODE(&ep->list_entry); + /* reference aRFS entry to add HW filter */ + ep->arfs_entry = e; + hlist_add_head(&ep->list_entry, add_list); + e->fltr_state = ICE_ARFS_ACTIVE; + /* expiration timer only used for UDP flows */ + if (flow_type == ICE_FLTR_PTYPE_NONF_IPV4_UDP || + flow_type == ICE_FLTR_PTYPE_NONF_IPV6_UDP) + e->time_activated = get_jiffies_64(); + } else if (e->fltr_state == ICE_ARFS_ACTIVE) { + /* check if filter needs to be removed from HW */ + if (ice_arfs_is_flow_expired(vsi, e)) { + /* remove aRFS entry from hash table for delete + * and to prevent referencing it the next time + * through this hlist index + */ + hlist_del(&e->list_entry); + e->fltr_state = ICE_ARFS_TODEL; + /* save reference to aRFS entry for delete */ + hlist_add_head(&e->list_entry, del_list); + } + } +} + +/** + * ice_sync_arfs_fltrs - update all aRFS filters + * @pf: board private structure + */ +void ice_sync_arfs_fltrs(struct ice_pf *pf) +{ + HLIST_HEAD(tmp_del_list); + HLIST_HEAD(tmp_add_list); + struct ice_vsi *pf_vsi; + unsigned int i; + + pf_vsi = ice_get_main_vsi(pf); + if (!pf_vsi) + return; + + if (!ice_is_arfs_active(pf_vsi)) + return; + + spin_lock_bh(&pf_vsi->arfs_lock); + /* Once we process aRFS for the PF VSI get out */ + for (i = 0; i < ICE_MAX_ARFS_LIST; i++) + ice_arfs_update_flow_rules(pf_vsi, i, &tmp_add_list, + &tmp_del_list); + spin_unlock_bh(&pf_vsi->arfs_lock); + + /* use list of ice_arfs_entry(s) for delete */ + ice_arfs_del_flow_rules(pf_vsi, &tmp_del_list); + + /* use list of ice_arfs_entry_ptr(s) for add */ + ice_arfs_add_flow_rules(pf_vsi, &tmp_add_list); +} + +/** + * ice_arfs_build_entry - builds an aRFS entry based on input + * @vsi: destination VSI for this flow + * @fk: flow dissector keys for creating the tuple + * @rxq_idx: Rx queue to steer this flow to + * @flow_id: passed down from the stack and saved for flow expiration + * + * returns an aRFS entry on success and NULL on failure + */ +static struct ice_arfs_entry * +ice_arfs_build_entry(struct ice_vsi *vsi, const struct flow_keys *fk, + u16 rxq_idx, u32 flow_id) +{ + struct ice_arfs_entry *arfs_entry; + struct ice_fdir_fltr *fltr_info; + u8 ip_proto; + + arfs_entry = devm_kzalloc(ice_pf_to_dev(vsi->back), + sizeof(*arfs_entry), + GFP_ATOMIC | __GFP_NOWARN); + if (!arfs_entry) + return NULL; + + fltr_info = &arfs_entry->fltr_info; + fltr_info->q_index = rxq_idx; + fltr_info->dest_ctl = ICE_FLTR_PRGM_DESC_DEST_DIRECT_PKT_QINDEX; + fltr_info->dest_vsi = vsi->idx; + ip_proto = fk->basic.ip_proto; + + if (fk->basic.n_proto == htons(ETH_P_IP)) { + fltr_info->ip.v4.proto = ip_proto; + fltr_info->flow_type = (ip_proto == IPPROTO_TCP) ? + ICE_FLTR_PTYPE_NONF_IPV4_TCP : + ICE_FLTR_PTYPE_NONF_IPV4_UDP; + fltr_info->ip.v4.src_ip = fk->addrs.v4addrs.src; + fltr_info->ip.v4.dst_ip = fk->addrs.v4addrs.dst; + fltr_info->ip.v4.src_port = fk->ports.src; + fltr_info->ip.v4.dst_port = fk->ports.dst; + } else { /* ETH_P_IPV6 */ + fltr_info->ip.v6.proto = ip_proto; + fltr_info->flow_type = (ip_proto == IPPROTO_TCP) ? + ICE_FLTR_PTYPE_NONF_IPV6_TCP : + ICE_FLTR_PTYPE_NONF_IPV6_UDP; + memcpy(&fltr_info->ip.v6.src_ip, &fk->addrs.v6addrs.src, + sizeof(struct in6_addr)); + memcpy(&fltr_info->ip.v6.dst_ip, &fk->addrs.v6addrs.dst, + sizeof(struct in6_addr)); + fltr_info->ip.v6.src_port = fk->ports.src; + fltr_info->ip.v6.dst_port = fk->ports.dst; + } + + arfs_entry->flow_id = flow_id; + fltr_info->fltr_id = + atomic_inc_return(vsi->arfs_last_fltr_id) % RPS_NO_FILTER; + + return arfs_entry; +} + +/** + * ice_arfs_is_perfect_flow_set - Check to see if perfect flow is set + * @hw: pointer to HW structure + * @l3_proto: ETH_P_IP or ETH_P_IPV6 in network order + * @l4_proto: IPPROTO_UDP or IPPROTO_TCP + * + * We only support perfect (4-tuple) filters for aRFS. This function allows aRFS + * to check if perfect (4-tuple) flow rules are currently in place by Flow + * Director. + */ +static bool +ice_arfs_is_perfect_flow_set(struct ice_hw *hw, __be16 l3_proto, u8 l4_proto) +{ + unsigned long *perfect_fltr = hw->fdir_perfect_fltr; + + /* advanced Flow Director disabled, perfect filters always supported */ + if (!perfect_fltr) + return true; + + if (l3_proto == htons(ETH_P_IP) && l4_proto == IPPROTO_UDP) + return test_bit(ICE_FLTR_PTYPE_NONF_IPV4_UDP, perfect_fltr); + else if (l3_proto == htons(ETH_P_IP) && l4_proto == IPPROTO_TCP) + return test_bit(ICE_FLTR_PTYPE_NONF_IPV4_TCP, perfect_fltr); + else if (l3_proto == htons(ETH_P_IPV6) && l4_proto == IPPROTO_UDP) + return test_bit(ICE_FLTR_PTYPE_NONF_IPV6_UDP, perfect_fltr); + else if (l3_proto == htons(ETH_P_IPV6) && l4_proto == IPPROTO_TCP) + return test_bit(ICE_FLTR_PTYPE_NONF_IPV6_TCP, perfect_fltr); + + return false; +} + +/** + * ice_rx_flow_steer - steer the Rx flow to where application is being run + * @netdev: ptr to the netdev being adjusted + * @skb: buffer with required header information + * @rxq_idx: queue to which the flow needs to move + * @flow_id: flow identifier provided by the netdev + * + * Based on the skb, rxq_idx, and flow_id passed in add/update an entry in the + * aRFS hash table. Iterate over one of the hlists in the aRFS hash table and + * if the flow_id already exists in the hash table but the rxq_idx has changed + * mark the entry as ICE_ARFS_INACTIVE so it can get updated in HW, else + * if the entry is marked as ICE_ARFS_TODEL delete it from the aRFS hash table. + * If neither of the previous conditions are true then add a new entry in the + * aRFS hash table, which gets set to ICE_ARFS_INACTIVE by default so it can be + * added to HW. + */ +int +ice_rx_flow_steer(struct net_device *netdev, const struct sk_buff *skb, + u16 rxq_idx, u32 flow_id) +{ + struct ice_netdev_priv *np = netdev_priv(netdev); + struct ice_arfs_entry *arfs_entry; + struct ice_vsi *vsi = np->vsi; + struct flow_keys fk; + struct ice_pf *pf; + __be16 n_proto; + u8 ip_proto; + u16 idx; + int ret; + + /* failed to allocate memory for aRFS so don't crash */ + if (unlikely(!vsi->arfs_fltr_list)) + return -ENODEV; + + pf = vsi->back; + + if (skb->encapsulation) + return -EPROTONOSUPPORT; + + if (!skb_flow_dissect_flow_keys(skb, &fk, 0)) + return -EPROTONOSUPPORT; + + n_proto = fk.basic.n_proto; + /* Support only IPV4 and IPV6 */ + if ((n_proto == htons(ETH_P_IP) && !ip_is_fragment(ip_hdr(skb))) || + n_proto == htons(ETH_P_IPV6)) + ip_proto = fk.basic.ip_proto; + else + return -EPROTONOSUPPORT; + + /* Support only TCP and UDP */ + if (ip_proto != IPPROTO_TCP && ip_proto != IPPROTO_UDP) + return -EPROTONOSUPPORT; + + /* only support 4-tuple filters for aRFS */ + if (!ice_arfs_is_perfect_flow_set(&pf->hw, n_proto, ip_proto)) + return -EOPNOTSUPP; + + /* choose the aRFS list bucket based on skb hash */ + idx = skb_get_hash_raw(skb) & ICE_ARFS_LST_MASK; + /* search for entry in the bucket */ + spin_lock_bh(&vsi->arfs_lock); + hlist_for_each_entry(arfs_entry, &vsi->arfs_fltr_list[idx], + list_entry) { + struct ice_fdir_fltr *fltr_info; + + /* keep searching for the already existing arfs_entry flow */ + if (arfs_entry->flow_id != flow_id) + continue; + + fltr_info = &arfs_entry->fltr_info; + ret = fltr_info->fltr_id; + + if (fltr_info->q_index == rxq_idx || + arfs_entry->fltr_state != ICE_ARFS_ACTIVE) + goto out; + + /* update the queue to forward to on an already existing flow */ + fltr_info->q_index = rxq_idx; + arfs_entry->fltr_state = ICE_ARFS_INACTIVE; + ice_arfs_update_active_fltr_cntrs(vsi, arfs_entry, false); + goto out_schedule_service_task; + } + + arfs_entry = ice_arfs_build_entry(vsi, &fk, rxq_idx, flow_id); + if (!arfs_entry) { + ret = -ENOMEM; + goto out; + } + + ret = arfs_entry->fltr_info.fltr_id; + INIT_HLIST_NODE(&arfs_entry->list_entry); + hlist_add_head(&arfs_entry->list_entry, &vsi->arfs_fltr_list[idx]); +out_schedule_service_task: + ice_service_task_schedule(pf); +out: + spin_unlock_bh(&vsi->arfs_lock); + return ret; +} + +/** + * ice_init_arfs_cntrs - initialize aRFS counter values + * @vsi: VSI that aRFS counters need to be initialized on + */ +static int ice_init_arfs_cntrs(struct ice_vsi *vsi) +{ + if (!vsi || vsi->type != ICE_VSI_PF) + return -EINVAL; + + vsi->arfs_fltr_cntrs = kzalloc(sizeof(*vsi->arfs_fltr_cntrs), + GFP_KERNEL); + if (!vsi->arfs_fltr_cntrs) + return -ENOMEM; + + vsi->arfs_last_fltr_id = kzalloc(sizeof(*vsi->arfs_last_fltr_id), + GFP_KERNEL); + if (!vsi->arfs_last_fltr_id) { + kfree(vsi->arfs_fltr_cntrs); + vsi->arfs_fltr_cntrs = NULL; + return -ENOMEM; + } + + return 0; +} + +/** + * ice_init_arfs - initialize aRFS resources + * @vsi: the VSI to be forwarded to + */ +void ice_init_arfs(struct ice_vsi *vsi) +{ + struct hlist_head *arfs_fltr_list; + unsigned int i; + + if (!vsi || vsi->type != ICE_VSI_PF) + return; + + arfs_fltr_list = kcalloc(ICE_MAX_ARFS_LIST, sizeof(*arfs_fltr_list), + GFP_KERNEL); + if (!arfs_fltr_list) + return; + + if (ice_init_arfs_cntrs(vsi)) + goto free_arfs_fltr_list; + + for (i = 0; i < ICE_MAX_ARFS_LIST; i++) + INIT_HLIST_HEAD(&arfs_fltr_list[i]); + + spin_lock_init(&vsi->arfs_lock); + + vsi->arfs_fltr_list = arfs_fltr_list; + + return; + +free_arfs_fltr_list: + kfree(arfs_fltr_list); +} + +/** + * ice_clear_arfs - clear the aRFS hash table and any memory used for aRFS + * @vsi: the VSI to be forwarded to + */ +void ice_clear_arfs(struct ice_vsi *vsi) +{ + struct device *dev; + unsigned int i; + + if (!vsi || vsi->type != ICE_VSI_PF || !vsi->back || + !vsi->arfs_fltr_list) + return; + + dev = ice_pf_to_dev(vsi->back); + for (i = 0; i < ICE_MAX_ARFS_LIST; i++) { + struct ice_arfs_entry *r; + struct hlist_node *n; + + spin_lock_bh(&vsi->arfs_lock); + hlist_for_each_entry_safe(r, n, &vsi->arfs_fltr_list[i], + list_entry) { + hlist_del(&r->list_entry); + devm_kfree(dev, r); + } + spin_unlock_bh(&vsi->arfs_lock); + } + + kfree(vsi->arfs_fltr_list); + vsi->arfs_fltr_list = NULL; + kfree(vsi->arfs_last_fltr_id); + vsi->arfs_last_fltr_id = NULL; + kfree(vsi->arfs_fltr_cntrs); + vsi->arfs_fltr_cntrs = NULL; +} + +/** + * ice_free_cpu_rx_rmap - free setup CPU reverse map + * @vsi: the VSI to be forwarded to + */ +void ice_free_cpu_rx_rmap(struct ice_vsi *vsi) +{ + struct net_device *netdev; + + if (!vsi || vsi->type != ICE_VSI_PF) + return; + + netdev = vsi->netdev; + if (!netdev || !netdev->rx_cpu_rmap) + return; + + free_irq_cpu_rmap(netdev->rx_cpu_rmap); + netdev->rx_cpu_rmap = NULL; +} + +/** + * ice_set_cpu_rx_rmap - setup CPU reverse map for each queue + * @vsi: the VSI to be forwarded to + */ +int ice_set_cpu_rx_rmap(struct ice_vsi *vsi) +{ + struct net_device *netdev; + struct ice_pf *pf; + int base_idx, i; + + if (!vsi || vsi->type != ICE_VSI_PF) + return 0; + + pf = vsi->back; + netdev = vsi->netdev; + if (!pf || !netdev || !vsi->num_q_vectors) + return -EINVAL; + + netdev_dbg(netdev, "Setup CPU RMAP: vsi type 0x%x, ifname %s, q_vectors %d\n", + vsi->type, netdev->name, vsi->num_q_vectors); + + netdev->rx_cpu_rmap = alloc_irq_cpu_rmap(vsi->num_q_vectors); + if (unlikely(!netdev->rx_cpu_rmap)) + return -EINVAL; + + base_idx = vsi->base_vector; + ice_for_each_q_vector(vsi, i) + if (irq_cpu_rmap_add(netdev->rx_cpu_rmap, + pf->msix_entries[base_idx + i].vector)) { + ice_free_cpu_rx_rmap(vsi); + return -EINVAL; + } + + return 0; +} + +/** + * ice_remove_arfs - remove/clear all aRFS resources + * @pf: device private structure + */ +void ice_remove_arfs(struct ice_pf *pf) +{ + struct ice_vsi *pf_vsi; + + pf_vsi = ice_get_main_vsi(pf); + if (!pf_vsi) + return; + + ice_clear_arfs(pf_vsi); +} + +/** + * ice_rebuild_arfs - remove/clear all aRFS resources and rebuild after reset + * @pf: device private structure + */ +void ice_rebuild_arfs(struct ice_pf *pf) +{ + struct ice_vsi *pf_vsi; + + pf_vsi = ice_get_main_vsi(pf); + if (!pf_vsi) + return; + + ice_remove_arfs(pf); + ice_init_arfs(pf_vsi); +} -- cgit v1.2.3