diff options
author | 2023-02-21 18:24:12 -0800 | |
---|---|---|
committer | 2023-02-21 18:24:12 -0800 | |
commit | 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 (patch) | |
tree | cc5c2d0a898769fd59549594fedb3ee6f84e59a0 /net/sched/em_meta.c | |
download | linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.tar.gz linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.zip |
Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextgrafted
Pull networking updates from Jakub Kicinski:
"Core:
- Add dedicated kmem_cache for typical/small skb->head, avoid having
to access struct page at kfree time, and improve memory use.
- Introduce sysctl to set default RPS configuration for new netdevs.
- Define Netlink protocol specification format which can be used to
describe messages used by each family and auto-generate parsers.
Add tools for generating kernel data structures and uAPI headers.
- Expose all net/core sysctls inside netns.
- Remove 4s sleep in netpoll if carrier is instantly detected on
boot.
- Add configurable limit of MDB entries per port, and port-vlan.
- Continue populating drop reasons throughout the stack.
- Retire a handful of legacy Qdiscs and classifiers.
Protocols:
- Support IPv4 big TCP (TSO frames larger than 64kB).
- Add IP_LOCAL_PORT_RANGE socket option, to control local port range
on socket by socket basis.
- Track and report in procfs number of MPTCP sockets used.
- Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path
manager.
- IPv6: don't check net.ipv6.route.max_size and rely on garbage
collection to free memory (similarly to IPv4).
- Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).
- ICMP: add per-rate limit counters.
- Add support for user scanning requests in ieee802154.
- Remove static WEP support.
- Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
reporting.
- WiFi 7 EHT channel puncturing support (client & AP).
BPF:
- Add a rbtree data structure following the "next-gen data structure"
precedent set by recently added linked list, that is, by using
kfunc + kptr instead of adding a new BPF map type.
- Expose XDP hints via kfuncs with initial support for RX hash and
timestamp metadata.
- Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to
better support decap on GRE tunnel devices not operating in collect
metadata.
- Improve x86 JIT's codegen for PROBE_MEM runtime error checks.
- Remove the need for trace_printk_lock for bpf_trace_printk and
bpf_trace_vprintk helpers.
- Extend libbpf's bpf_tracing.h support for tracing arguments of
kprobes/uprobes and syscall as a special case.
- Significantly reduce the search time for module symbols by
livepatch and BPF.
- Enable cpumasks to be used as kptrs, which is useful for tracing
programs tracking which tasks end up running on which CPUs in
different time intervals.
- Add support for BPF trampoline on s390x and riscv64.
- Add capability to export the XDP features supported by the NIC.
- Add __bpf_kfunc tag for marking kernel functions as kfuncs.
- Add cgroup.memory=nobpf kernel parameter option to disable BPF
memory accounting for container environments.
Netfilter:
- Remove the CLUSTERIP target. It has been marked as obsolete for
years, and we still have WARN splats wrt races of the out-of-band
/proc interface installed by this target.
- Add 'destroy' commands to nf_tables. They are identical to the
existing 'delete' commands, but do not return an error if the
referenced object (set, chain, rule...) did not exist.
Driver API:
- Improve cpumask_local_spread() locality to help NICs set the right
IRQ affinity on AMD platforms.
- Separate C22 and C45 MDIO bus transactions more clearly.
- Introduce new DCB table to control DSCP rewrite on egress.
- Support configuration of Physical Layer Collision Avoidance (PLCA)
Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
shared medium Ethernet.
- Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
preemption of low priority frames by high priority frames.
- Add support for controlling MACSec offload using netlink SET.
- Rework devlink instance refcounts to allow registration and
de-registration under the instance lock. Split the code into
multiple files, drop some of the unnecessarily granular locks and
factor out common parts of netlink operation handling.
- Add TX frame aggregation parameters (for USB drivers).
- Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
messages with notifications for debug.
- Allow offloading of UDP NEW connections via act_ct.
- Add support for per action HW stats in TC.
- Support hardware miss to TC action (continue processing in SW from
a specific point in the action chain).
- Warn if old Wireless Extension user space interface is used with
modern cfg80211/mac80211 drivers. Do not support Wireless
Extensions for Wi-Fi 7 devices at all. Everyone should switch to
using nl80211 interface instead.
- Improve the CAN bit timing configuration. Use extack to return
error messages directly to user space, update the SJW handling,
including the definition of a new default value that will benefit
CAN-FD controllers, by increasing their oscillator tolerance.
New hardware / drivers:
- Ethernet:
- nVidia BlueField-3 support (control traffic driver)
- Ethernet support for imx93 SoCs
- Motorcomm yt8531 gigabit Ethernet PHY
- onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
- Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
- Amlogic gxl MDIO mux
- WiFi:
- RealTek RTL8188EU (rtl8xxxu)
- Qualcomm Wi-Fi 7 devices (ath12k)
- CAN:
- Renesas R-Car V4H
Drivers:
- Bluetooth:
- Set Per Platform Antenna Gain (PPAG) for Intel controllers.
- Ethernet NICs:
- Intel (1G, igc):
- support TSN / Qbv / packet scheduling features of i226 model
- Intel (100G, ice):
- use GNSS subsystem instead of TTY
- multi-buffer XDP support
- extend support for GPIO pins to E823 devices
- nVidia/Mellanox:
- update the shared buffer configuration on PFC commands
- implement PTP adjphase function for HW offset control
- TC support for Geneve and GRE with VF tunnel offload
- more efficient crypto key management method
- multi-port eswitch support
- Netronome/Corigine:
- add DCB IEEE support
- support IPsec offloading for NFP3800
- Freescale/NXP (enetc):
- support XDP_REDIRECT for XDP non-linear buffers
- improve reconfig, avoid link flap and waiting for idle
- support MAC Merge layer
- Other NICs:
- sfc/ef100: add basic devlink support for ef100
- ionic: rx_push mode operation (writing descriptors via MMIO)
- bnxt: use the auxiliary bus abstraction for RDMA
- r8169: disable ASPM and reset bus in case of tx timeout
- cpsw: support QSGMII mode for J721e CPSW9G
- cpts: support pulse-per-second output
- ngbe: add an mdio bus driver
- usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
- r8152: handle devices with FW with NCM support
- amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
- virtio-net: support multi buffer XDP
- virtio/vsock: replace virtio_vsock_pkt with sk_buff
- tsnep: XDP support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add support for latency TLV (in FW control messages)
- Microchip (sparx5):
- separate explicit and implicit traffic forwarding rules, make
the implicit rules always active
- add support for egress DSCP rewrite
- IS0 VCAP support (Ingress Classification)
- IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS
etc.)
- ES2 VCAP support (Egress Access Control)
- support for Per-Stream Filtering and Policing (802.1Q,
8.6.5.1)
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- add MAB (port auth) offload support
- enable PTP receive for mv88e6390
- NXP (ocelot):
- support MAC Merge layer
- support for the the vsc7512 internal copper phys
- Microchip:
- lan9303: convert to PHYLINK
- lan966x: support TC flower filter statistics
- lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
- lan937x: support Credit Based Shaper configuration
- ksz9477: support Energy Efficient Ethernet
- other:
- qca8k: convert to regmap read/write API, use bulk operations
- rswitch: Improve TX timestamp accuracy
- Intel WiFi (iwlwifi):
- EHT (Wi-Fi 7) rate reporting
- STEP equalizer support: transfer some STEP (connection to radio
on platforms with integrated wifi) related parameters from the
BIOS to the firmware.
- Qualcomm 802.11ax WiFi (ath11k):
- IPQ5018 support
- Fine Timing Measurement (FTM) responder role support
- channel 177 support
- MediaTek WiFi (mt76):
- per-PHY LED support
- mt7996: EHT (Wi-Fi 7) support
- Wireless Ethernet Dispatch (WED) reset support
- switch to using page pool allocator
- RealTek WiFi (rtw89):
- support new version of Bluetooth co-existance
- Mobile:
- rmnet: support TX aggregation"
* tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits)
page_pool: add a comment explaining the fragment counter usage
net: ethtool: fix __ethtool_dev_mm_supported() implementation
ethtool: pse-pd: Fix double word in comments
xsk: add linux/vmalloc.h to xsk.c
sefltests: netdevsim: wait for devlink instance after netns removal
selftest: fib_tests: Always cleanup before exit
net/mlx5e: Align IPsec ASO result memory to be as required by hardware
net/mlx5e: TC, Set CT miss to the specific ct action instance
net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
net/mlx5: Refactor tc miss handling to a single function
net/mlx5: Kconfig: Make tc offload depend on tc skb extension
net/sched: flower: Support hardware miss to tc action
net/sched: flower: Move filter handle initialization earlier
net/sched: cls_api: Support hardware miss to tc action
net/sched: Rename user cookie and act cookie
sfc: fix builds without CONFIG_RTC_LIB
sfc: clean up some inconsistent indentings
net/mlx4_en: Introduce flexible array to silence overflow warning
net: lan966x: Fix possible deadlock inside PTP
net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
...
Diffstat (limited to 'net/sched/em_meta.c')
-rw-r--r-- | net/sched/em_meta.c | 1014 |
1 files changed, 1014 insertions, 0 deletions
diff --git a/net/sched/em_meta.c b/net/sched/em_meta.c new file mode 100644 index 000000000..49bae3d50 --- /dev/null +++ b/net/sched/em_meta.c @@ -0,0 +1,1014 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * net/sched/em_meta.c Metadata ematch + * + * Authors: Thomas Graf <tgraf@suug.ch> + * + * ========================================================================== + * + * The metadata ematch compares two meta objects where each object + * represents either a meta value stored in the kernel or a static + * value provided by userspace. The objects are not provided by + * userspace itself but rather a definition providing the information + * to build them. Every object is of a certain type which must be + * equal to the object it is being compared to. + * + * The definition of a objects conists of the type (meta type), a + * identifier (meta id) and additional type specific information. + * The meta id is either TCF_META_TYPE_VALUE for values provided by + * userspace or a index to the meta operations table consisting of + * function pointers to type specific meta data collectors returning + * the value of the requested meta value. + * + * lvalue rvalue + * +-----------+ +-----------+ + * | type: INT | | type: INT | + * def | id: DEV | | id: VALUE | + * | data: | | data: 3 | + * +-----------+ +-----------+ + * | | + * ---> meta_ops[INT][DEV](...) | + * | | + * ----------- | + * V V + * +-----------+ +-----------+ + * | type: INT | | type: INT | + * obj | id: DEV | | id: VALUE | + * | data: 2 |<--data got filled out | data: 3 | + * +-----------+ +-----------+ + * | | + * --------------> 2 equals 3 <-------------- + * + * This is a simplified schema, the complexity varies depending + * on the meta type. Obviously, the length of the data must also + * be provided for non-numeric types. + * + * Additionally, type dependent modifiers such as shift operators + * or mask may be applied to extend the functionaliy. As of now, + * the variable length type supports shifting the byte string to + * the right, eating up any number of octets and thus supporting + * wildcard interface name comparisons such as "ppp%" matching + * ppp0..9. + * + * NOTE: Certain meta values depend on other subsystems and are + * only available if that subsystem is enabled in the kernel. + */ + +#include <linux/slab.h> +#include <linux/module.h> +#include <linux/types.h> +#include <linux/kernel.h> +#include <linux/sched.h> +#include <linux/sched/loadavg.h> +#include <linux/string.h> +#include <linux/skbuff.h> +#include <linux/random.h> +#include <linux/if_vlan.h> +#include <linux/tc_ematch/tc_em_meta.h> +#include <net/dst.h> +#include <net/route.h> +#include <net/pkt_cls.h> +#include <net/sock.h> + +struct meta_obj { + unsigned long value; + unsigned int len; +}; + +struct meta_value { + struct tcf_meta_val hdr; + unsigned long val; + unsigned int len; +}; + +struct meta_match { + struct meta_value lvalue; + struct meta_value rvalue; +}; + +static inline int meta_id(struct meta_value *v) +{ + return TCF_META_ID(v->hdr.kind); +} + +static inline int meta_type(struct meta_value *v) +{ + return TCF_META_TYPE(v->hdr.kind); +} + +#define META_COLLECTOR(FUNC) static void meta_##FUNC(struct sk_buff *skb, \ + struct tcf_pkt_info *info, struct meta_value *v, \ + struct meta_obj *dst, int *err) + +/************************************************************************** + * System status & misc + **************************************************************************/ + +META_COLLECTOR(int_random) +{ + get_random_bytes(&dst->value, sizeof(dst->value)); +} + +static inline unsigned long fixed_loadavg(int load) +{ + int rnd_load = load + (FIXED_1/200); + int rnd_frac = ((rnd_load & (FIXED_1-1)) * 100) >> FSHIFT; + + return ((rnd_load >> FSHIFT) * 100) + rnd_frac; +} + +META_COLLECTOR(int_loadavg_0) +{ + dst->value = fixed_loadavg(avenrun[0]); +} + +META_COLLECTOR(int_loadavg_1) +{ + dst->value = fixed_loadavg(avenrun[1]); +} + +META_COLLECTOR(int_loadavg_2) +{ + dst->value = fixed_loadavg(avenrun[2]); +} + +/************************************************************************** + * Device names & indices + **************************************************************************/ + +static inline int int_dev(struct net_device *dev, struct meta_obj *dst) +{ + if (unlikely(dev == NULL)) + return -1; + + dst->value = dev->ifindex; + return 0; +} + +static inline int var_dev(struct net_device *dev, struct meta_obj *dst) +{ + if (unlikely(dev == NULL)) + return -1; + + dst->value = (unsigned long) dev->name; + dst->len = strlen(dev->name); + return 0; +} + +META_COLLECTOR(int_dev) +{ + *err = int_dev(skb->dev, dst); +} + +META_COLLECTOR(var_dev) +{ + *err = var_dev(skb->dev, dst); +} + +/************************************************************************** + * vlan tag + **************************************************************************/ + +META_COLLECTOR(int_vlan_tag) +{ + unsigned short tag; + + if (skb_vlan_tag_present(skb)) + dst->value = skb_vlan_tag_get(skb); + else if (!__vlan_get_tag(skb, &tag)) + dst->value = tag; + else + *err = -1; +} + + + +/************************************************************************** + * skb attributes + **************************************************************************/ + +META_COLLECTOR(int_priority) +{ + dst->value = skb->priority; +} + +META_COLLECTOR(int_protocol) +{ + /* Let userspace take care of the byte ordering */ + dst->value = skb_protocol(skb, false); +} + +META_COLLECTOR(int_pkttype) +{ + dst->value = skb->pkt_type; +} + +META_COLLECTOR(int_pktlen) +{ + dst->value = skb->len; +} + +META_COLLECTOR(int_datalen) +{ + dst->value = skb->data_len; +} + +META_COLLECTOR(int_maclen) +{ + dst->value = skb->mac_len; +} + +META_COLLECTOR(int_rxhash) +{ + dst->value = skb_get_hash(skb); +} + +/************************************************************************** + * Netfilter + **************************************************************************/ + +META_COLLECTOR(int_mark) +{ + dst->value = skb->mark; +} + +/************************************************************************** + * Traffic Control + **************************************************************************/ + +META_COLLECTOR(int_tcindex) +{ + dst->value = skb->tc_index; +} + +/************************************************************************** + * Routing + **************************************************************************/ + +META_COLLECTOR(int_rtclassid) +{ + if (unlikely(skb_dst(skb) == NULL)) + *err = -1; + else +#ifdef CONFIG_IP_ROUTE_CLASSID + dst->value = skb_dst(skb)->tclassid; +#else + dst->value = 0; +#endif +} + +META_COLLECTOR(int_rtiif) +{ + if (unlikely(skb_rtable(skb) == NULL)) + *err = -1; + else + dst->value = inet_iif(skb); +} + +/************************************************************************** + * Socket Attributes + **************************************************************************/ + +#define skip_nonlocal(skb) \ + (unlikely(skb->sk == NULL)) + +META_COLLECTOR(int_sk_family) +{ + if (skip_nonlocal(skb)) { + *err = -1; + return; + } + dst->value = skb->sk->sk_family; +} + +META_COLLECTOR(int_sk_state) +{ + if (skip_nonlocal(skb)) { + *err = -1; + return; + } + dst->value = skb->sk->sk_state; +} + +META_COLLECTOR(int_sk_reuse) +{ + if (skip_nonlocal(skb)) { + *err = -1; + return; + } + dst->value = skb->sk->sk_reuse; +} + +META_COLLECTOR(int_sk_bound_if) +{ + if (skip_nonlocal(skb)) { + *err = -1; + return; + } + /* No error if bound_dev_if is 0, legal userspace check */ + dst->value = skb->sk->sk_bound_dev_if; +} + +META_COLLECTOR(var_sk_bound_if) +{ + int bound_dev_if; + + if (skip_nonlocal(skb)) { + *err = -1; + return; + } + + bound_dev_if = READ_ONCE(skb->sk->sk_bound_dev_if); + if (bound_dev_if == 0) { + dst->value = (unsigned long) "any"; + dst->len = 3; + } else { + struct net_device *dev; + + rcu_read_lock(); + dev = dev_get_by_index_rcu(sock_net(skb->sk), + bound_dev_if); + *err = var_dev(dev, dst); + rcu_read_unlock(); + } +} + +META_COLLECTOR(int_sk_refcnt) +{ + if (skip_nonlocal(skb)) { + *err = -1; + return; + } + dst->value = refcount_read(&skb->sk->sk_refcnt); +} + +META_COLLECTOR(int_sk_rcvbuf) +{ + const struct sock *sk = skb_to_full_sk(skb); + + if (!sk) { + *err = -1; + return; + } + dst->value = sk->sk_rcvbuf; +} + +META_COLLECTOR(int_sk_shutdown) +{ + const struct sock *sk = skb_to_full_sk(skb); + + if (!sk) { + *err = -1; + return; + } + dst->value = sk->sk_shutdown; +} + +META_COLLECTOR(int_sk_proto) +{ + const struct sock *sk = skb_to_full_sk(skb); + + if (!sk) { + *err = -1; + return; + } + dst->value = sk->sk_protocol; +} + +META_COLLECTOR(int_sk_type) +{ + const struct sock *sk = skb_to_full_sk(skb); + + if (!sk) { + *err = -1; + return; + } + dst->value = sk->sk_type; +} + +META_COLLECTOR(int_sk_rmem_alloc) +{ + const struct sock *sk = skb_to_full_sk(skb); + + if (!sk) { + *err = -1; + return; + } + dst->value = sk_rmem_alloc_get(sk); +} + +META_COLLECTOR(int_sk_wmem_alloc) +{ + const struct sock *sk = skb_to_full_sk(skb); + + if (!sk) { + *err = -1; + return; + } + dst->value = sk_wmem_alloc_get(sk); +} + +META_COLLECTOR(int_sk_omem_alloc) +{ + const struct sock *sk = skb_to_full_sk(skb); + + if (!sk) { + *err = -1; + return; + } + dst->value = atomic_read(&sk->sk_omem_alloc); +} + +META_COLLECTOR(int_sk_rcv_qlen) +{ + const struct sock *sk = skb_to_full_sk(skb); + + if (!sk) { + *err = -1; + return; + } + dst->value = sk->sk_receive_queue.qlen; +} + +META_COLLECTOR(int_sk_snd_qlen) +{ + const struct sock *sk = skb_to_full_sk(skb); + + if (!sk) { + *err = -1; + return; + } + dst->value = sk->sk_write_queue.qlen; +} + +META_COLLECTOR(int_sk_wmem_queued) +{ + const struct sock *sk = skb_to_full_sk(skb); + + if (!sk) { + *err = -1; + return; + } + dst->value = READ_ONCE(sk->sk_wmem_queued); +} + +META_COLLECTOR(int_sk_fwd_alloc) +{ + const struct sock *sk = skb_to_full_sk(skb); + + if (!sk) { + *err = -1; + return; + } + dst->value = sk_forward_alloc_get(sk); +} + +META_COLLECTOR(int_sk_sndbuf) +{ + const struct sock *sk = skb_to_full_sk(skb); + + if (!sk) { + *err = -1; + return; + } + dst->value = sk->sk_sndbuf; +} + +META_COLLECTOR(int_sk_alloc) +{ + const struct sock *sk = skb_to_full_sk(skb); + + if (!sk) { + *err = -1; + return; + } + dst->value = (__force int) sk->sk_allocation; +} + +META_COLLECTOR(int_sk_hash) +{ + if (skip_nonlocal(skb)) { + *err = -1; + return; + } + dst->value = skb->sk->sk_hash; +} + +META_COLLECTOR(int_sk_lingertime) +{ + const struct sock *sk = skb_to_full_sk(skb); + + if (!sk) { + *err = -1; + return; + } + dst->value = sk->sk_lingertime / HZ; +} + +META_COLLECTOR(int_sk_err_qlen) +{ + const struct sock *sk = skb_to_full_sk(skb); + + if (!sk) { + *err = -1; + return; + } + dst->value = sk->sk_error_queue.qlen; +} + +META_COLLECTOR(int_sk_ack_bl) +{ + const struct sock *sk = skb_to_full_sk(skb); + + if (!sk) { + *err = -1; + return; + } + dst->value = READ_ONCE(sk->sk_ack_backlog); +} + +META_COLLECTOR(int_sk_max_ack_bl) +{ + const struct sock *sk = skb_to_full_sk(skb); + + if (!sk) { + *err = -1; + return; + } + dst->value = READ_ONCE(sk->sk_max_ack_backlog); +} + +META_COLLECTOR(int_sk_prio) +{ + const struct sock *sk = skb_to_full_sk(skb); + + if (!sk) { + *err = -1; + return; + } + dst->value = sk->sk_priority; +} + +META_COLLECTOR(int_sk_rcvlowat) +{ + const struct sock *sk = skb_to_full_sk(skb); + + if (!sk) { + *err = -1; + return; + } + dst->value = READ_ONCE(sk->sk_rcvlowat); +} + +META_COLLECTOR(int_sk_rcvtimeo) +{ + const struct sock *sk = skb_to_full_sk(skb); + + if (!sk) { + *err = -1; + return; + } + dst->value = sk->sk_rcvtimeo / HZ; +} + +META_COLLECTOR(int_sk_sndtimeo) +{ + const struct sock *sk = skb_to_full_sk(skb); + + if (!sk) { + *err = -1; + return; + } + dst->value = sk->sk_sndtimeo / HZ; +} + +META_COLLECTOR(int_sk_sendmsg_off) +{ + const struct sock *sk = skb_to_full_sk(skb); + + if (!sk) { + *err = -1; + return; + } + dst->value = sk->sk_frag.offset; +} + +META_COLLECTOR(int_sk_write_pend) +{ + const struct sock *sk = skb_to_full_sk(skb); + + if (!sk) { + *err = -1; + return; + } + dst->value = sk->sk_write_pending; +} + +/************************************************************************** + * Meta value collectors assignment table + **************************************************************************/ + +struct meta_ops { + void (*get)(struct sk_buff *, struct tcf_pkt_info *, + struct meta_value *, struct meta_obj *, int *); +}; + +#define META_ID(name) TCF_META_ID_##name +#define META_FUNC(name) { .get = meta_##name } + +/* Meta value operations table listing all meta value collectors and + * assigns them to a type and meta id. */ +static struct meta_ops __meta_ops[TCF_META_TYPE_MAX + 1][TCF_META_ID_MAX + 1] = { + [TCF_META_TYPE_VAR] = { + [META_ID(DEV)] = META_FUNC(var_dev), + [META_ID(SK_BOUND_IF)] = META_FUNC(var_sk_bound_if), + }, + [TCF_META_TYPE_INT] = { + [META_ID(RANDOM)] = META_FUNC(int_random), + [META_ID(LOADAVG_0)] = META_FUNC(int_loadavg_0), + [META_ID(LOADAVG_1)] = META_FUNC(int_loadavg_1), + [META_ID(LOADAVG_2)] = META_FUNC(int_loadavg_2), + [META_ID(DEV)] = META_FUNC(int_dev), + [META_ID(PRIORITY)] = META_FUNC(int_priority), + [META_ID(PROTOCOL)] = META_FUNC(int_protocol), + [META_ID(PKTTYPE)] = META_FUNC(int_pkttype), + [META_ID(PKTLEN)] = META_FUNC(int_pktlen), + [META_ID(DATALEN)] = META_FUNC(int_datalen), + [META_ID(MACLEN)] = META_FUNC(int_maclen), + [META_ID(NFMARK)] = META_FUNC(int_mark), + [META_ID(TCINDEX)] = META_FUNC(int_tcindex), + [META_ID(RTCLASSID)] = META_FUNC(int_rtclassid), + [META_ID(RTIIF)] = META_FUNC(int_rtiif), + [META_ID(SK_FAMILY)] = META_FUNC(int_sk_family), + [META_ID(SK_STATE)] = META_FUNC(int_sk_state), + [META_ID(SK_REUSE)] = META_FUNC(int_sk_reuse), + [META_ID(SK_BOUND_IF)] = META_FUNC(int_sk_bound_if), + [META_ID(SK_REFCNT)] = META_FUNC(int_sk_refcnt), + [META_ID(SK_RCVBUF)] = META_FUNC(int_sk_rcvbuf), + [META_ID(SK_SNDBUF)] = META_FUNC(int_sk_sndbuf), + [META_ID(SK_SHUTDOWN)] = META_FUNC(int_sk_shutdown), + [META_ID(SK_PROTO)] = META_FUNC(int_sk_proto), + [META_ID(SK_TYPE)] = META_FUNC(int_sk_type), + [META_ID(SK_RMEM_ALLOC)] = META_FUNC(int_sk_rmem_alloc), + [META_ID(SK_WMEM_ALLOC)] = META_FUNC(int_sk_wmem_alloc), + [META_ID(SK_OMEM_ALLOC)] = META_FUNC(int_sk_omem_alloc), + [META_ID(SK_WMEM_QUEUED)] = META_FUNC(int_sk_wmem_queued), + [META_ID(SK_RCV_QLEN)] = META_FUNC(int_sk_rcv_qlen), + [META_ID(SK_SND_QLEN)] = META_FUNC(int_sk_snd_qlen), + [META_ID(SK_ERR_QLEN)] = META_FUNC(int_sk_err_qlen), + [META_ID(SK_FORWARD_ALLOCS)] = META_FUNC(int_sk_fwd_alloc), + [META_ID(SK_ALLOCS)] = META_FUNC(int_sk_alloc), + [META_ID(SK_HASH)] = META_FUNC(int_sk_hash), + [META_ID(SK_LINGERTIME)] = META_FUNC(int_sk_lingertime), + [META_ID(SK_ACK_BACKLOG)] = META_FUNC(int_sk_ack_bl), + [META_ID(SK_MAX_ACK_BACKLOG)] = META_FUNC(int_sk_max_ack_bl), + [META_ID(SK_PRIO)] = META_FUNC(int_sk_prio), + [META_ID(SK_RCVLOWAT)] = META_FUNC(int_sk_rcvlowat), + [META_ID(SK_RCVTIMEO)] = META_FUNC(int_sk_rcvtimeo), + [META_ID(SK_SNDTIMEO)] = META_FUNC(int_sk_sndtimeo), + [META_ID(SK_SENDMSG_OFF)] = META_FUNC(int_sk_sendmsg_off), + [META_ID(SK_WRITE_PENDING)] = META_FUNC(int_sk_write_pend), + [META_ID(VLAN_TAG)] = META_FUNC(int_vlan_tag), + [META_ID(RXHASH)] = META_FUNC(int_rxhash), + } +}; + +static inline struct meta_ops *meta_ops(struct meta_value *val) +{ + return &__meta_ops[meta_type(val)][meta_id(val)]; +} + +/************************************************************************** + * Type specific operations for TCF_META_TYPE_VAR + **************************************************************************/ + +static int meta_var_compare(struct meta_obj *a, struct meta_obj *b) +{ + int r = a->len - b->len; + + if (r == 0) + r = memcmp((void *) a->value, (void *) b->value, a->len); + + return r; +} + +static int meta_var_change(struct meta_value *dst, struct nlattr *nla) +{ + int len = nla_len(nla); + + dst->val = (unsigned long)kmemdup(nla_data(nla), len, GFP_KERNEL); + if (dst->val == 0UL) + return -ENOMEM; + dst->len = len; + return 0; +} + +static void meta_var_destroy(struct meta_value *v) +{ + kfree((void *) v->val); +} + +static void meta_var_apply_extras(struct meta_value *v, + struct meta_obj *dst) +{ + int shift = v->hdr.shift; + + if (shift && shift < dst->len) + dst->len -= shift; +} + +static int meta_var_dump(struct sk_buff *skb, struct meta_value *v, int tlv) +{ + if (v->val && v->len && + nla_put(skb, tlv, v->len, (void *) v->val)) + goto nla_put_failure; + return 0; + +nla_put_failure: + return -1; +} + +/************************************************************************** + * Type specific operations for TCF_META_TYPE_INT + **************************************************************************/ + +static int meta_int_compare(struct meta_obj *a, struct meta_obj *b) +{ + /* Let gcc optimize it, the unlikely is not really based on + * some numbers but jump free code for mismatches seems + * more logical. */ + if (unlikely(a->value == b->value)) + return 0; + else if (a->value < b->value) + return -1; + else + return 1; +} + +static int meta_int_change(struct meta_value *dst, struct nlattr *nla) +{ + if (nla_len(nla) >= sizeof(unsigned long)) { + dst->val = *(unsigned long *) nla_data(nla); + dst->len = sizeof(unsigned long); + } else if (nla_len(nla) == sizeof(u32)) { + dst->val = nla_get_u32(nla); + dst->len = sizeof(u32); + } else + return -EINVAL; + + return 0; +} + +static void meta_int_apply_extras(struct meta_value *v, + struct meta_obj *dst) +{ + if (v->hdr.shift) + dst->value >>= v->hdr.shift; + + if (v->val) + dst->value &= v->val; +} + +static int meta_int_dump(struct sk_buff *skb, struct meta_value *v, int tlv) +{ + if (v->len == sizeof(unsigned long)) { + if (nla_put(skb, tlv, sizeof(unsigned long), &v->val)) + goto nla_put_failure; + } else if (v->len == sizeof(u32)) { + if (nla_put_u32(skb, tlv, v->val)) + goto nla_put_failure; + } + + return 0; + +nla_put_failure: + return -1; +} + +/************************************************************************** + * Type specific operations table + **************************************************************************/ + +struct meta_type_ops { + void (*destroy)(struct meta_value *); + int (*compare)(struct meta_obj *, struct meta_obj *); + int (*change)(struct meta_value *, struct nlattr *); + void (*apply_extras)(struct meta_value *, struct meta_obj *); + int (*dump)(struct sk_buff *, struct meta_value *, int); +}; + +static const struct meta_type_ops __meta_type_ops[TCF_META_TYPE_MAX + 1] = { + [TCF_META_TYPE_VAR] = { + .destroy = meta_var_destroy, + .compare = meta_var_compare, + .change = meta_var_change, + .apply_extras = meta_var_apply_extras, + .dump = meta_var_dump + }, + [TCF_META_TYPE_INT] = { + .compare = meta_int_compare, + .change = meta_int_change, + .apply_extras = meta_int_apply_extras, + .dump = meta_int_dump + } +}; + +static inline const struct meta_type_ops *meta_type_ops(struct meta_value *v) +{ + return &__meta_type_ops[meta_type(v)]; +} + +/************************************************************************** + * Core + **************************************************************************/ + +static int meta_get(struct sk_buff *skb, struct tcf_pkt_info *info, + struct meta_value *v, struct meta_obj *dst) +{ + int err = 0; + + if (meta_id(v) == TCF_META_ID_VALUE) { + dst->value = v->val; + dst->len = v->len; + return 0; + } + + meta_ops(v)->get(skb, info, v, dst, &err); + if (err < 0) + return err; + + if (meta_type_ops(v)->apply_extras) + meta_type_ops(v)->apply_extras(v, dst); + + return 0; +} + +static int em_meta_match(struct sk_buff *skb, struct tcf_ematch *m, + struct tcf_pkt_info *info) +{ + int r; + struct meta_match *meta = (struct meta_match *) m->data; + struct meta_obj l_value, r_value; + + if (meta_get(skb, info, &meta->lvalue, &l_value) < 0 || + meta_get(skb, info, &meta->rvalue, &r_value) < 0) + return 0; + + r = meta_type_ops(&meta->lvalue)->compare(&l_value, &r_value); + + switch (meta->lvalue.hdr.op) { + case TCF_EM_OPND_EQ: + return !r; + case TCF_EM_OPND_LT: + return r < 0; + case TCF_EM_OPND_GT: + return r > 0; + } + + return 0; +} + +static void meta_delete(struct meta_match *meta) +{ + if (meta) { + const struct meta_type_ops *ops = meta_type_ops(&meta->lvalue); + + if (ops && ops->destroy) { + ops->destroy(&meta->lvalue); + ops->destroy(&meta->rvalue); + } + } + + kfree(meta); +} + +static inline int meta_change_data(struct meta_value *dst, struct nlattr *nla) +{ + if (nla) { + if (nla_len(nla) == 0) + return -EINVAL; + + return meta_type_ops(dst)->change(dst, nla); + } + + return 0; +} + +static inline int meta_is_supported(struct meta_value *val) +{ + return !meta_id(val) || meta_ops(val)->get; +} + +static const struct nla_policy meta_policy[TCA_EM_META_MAX + 1] = { + [TCA_EM_META_HDR] = { .len = sizeof(struct tcf_meta_hdr) }, +}; + +static int em_meta_change(struct net *net, void *data, int len, + struct tcf_ematch *m) +{ + int err; + struct nlattr *tb[TCA_EM_META_MAX + 1]; + struct tcf_meta_hdr *hdr; + struct meta_match *meta = NULL; + + err = nla_parse_deprecated(tb, TCA_EM_META_MAX, data, len, + meta_policy, NULL); + if (err < 0) + goto errout; + + err = -EINVAL; + if (tb[TCA_EM_META_HDR] == NULL) + goto errout; + hdr = nla_data(tb[TCA_EM_META_HDR]); + + if (TCF_META_TYPE(hdr->left.kind) != TCF_META_TYPE(hdr->right.kind) || + TCF_META_TYPE(hdr->left.kind) > TCF_META_TYPE_MAX || + TCF_META_ID(hdr->left.kind) > TCF_META_ID_MAX || + TCF_META_ID(hdr->right.kind) > TCF_META_ID_MAX) + goto errout; + + meta = kzalloc(sizeof(*meta), GFP_KERNEL); + if (meta == NULL) { + err = -ENOMEM; + goto errout; + } + + memcpy(&meta->lvalue.hdr, &hdr->left, sizeof(hdr->left)); + memcpy(&meta->rvalue.hdr, &hdr->right, sizeof(hdr->right)); + + if (!meta_is_supported(&meta->lvalue) || + !meta_is_supported(&meta->rvalue)) { + err = -EOPNOTSUPP; + goto errout; + } + + if (meta_change_data(&meta->lvalue, tb[TCA_EM_META_LVALUE]) < 0 || + meta_change_data(&meta->rvalue, tb[TCA_EM_META_RVALUE]) < 0) + goto errout; + + m->datalen = sizeof(*meta); + m->data = (unsigned long) meta; + + err = 0; +errout: + if (err && meta) + meta_delete(meta); + return err; +} + +static void em_meta_destroy(struct tcf_ematch *m) +{ + if (m) + meta_delete((struct meta_match *) m->data); +} + +static int em_meta_dump(struct sk_buff *skb, struct tcf_ematch *em) +{ + struct meta_match *meta = (struct meta_match *) em->data; + struct tcf_meta_hdr hdr; + const struct meta_type_ops *ops; + + memset(&hdr, 0, sizeof(hdr)); + memcpy(&hdr.left, &meta->lvalue.hdr, sizeof(hdr.left)); + memcpy(&hdr.right, &meta->rvalue.hdr, sizeof(hdr.right)); + + if (nla_put(skb, TCA_EM_META_HDR, sizeof(hdr), &hdr)) + goto nla_put_failure; + + ops = meta_type_ops(&meta->lvalue); + if (ops->dump(skb, &meta->lvalue, TCA_EM_META_LVALUE) < 0 || + ops->dump(skb, &meta->rvalue, TCA_EM_META_RVALUE) < 0) + goto nla_put_failure; + + return 0; + +nla_put_failure: + return -1; +} + +static struct tcf_ematch_ops em_meta_ops = { + .kind = TCF_EM_META, + .change = em_meta_change, + .match = em_meta_match, + .destroy = em_meta_destroy, + .dump = em_meta_dump, + .owner = THIS_MODULE, + .link = LIST_HEAD_INIT(em_meta_ops.link) +}; + +static int __init init_em_meta(void) +{ + return tcf_em_register(&em_meta_ops); +} + +static void __exit exit_em_meta(void) +{ + tcf_em_unregister(&em_meta_ops); +} + +MODULE_LICENSE("GPL"); + +module_init(init_em_meta); +module_exit(exit_em_meta); + +MODULE_ALIAS_TCF_EMATCH(TCF_EM_META); |