diff options
author | 2023-02-21 18:24:12 -0800 | |
---|---|---|
committer | 2023-02-21 18:24:12 -0800 | |
commit | 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 (patch) | |
tree | cc5c2d0a898769fd59549594fedb3ee6f84e59a0 /tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c | |
download | linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.tar.gz linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.zip |
Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextgrafted
Pull networking updates from Jakub Kicinski:
"Core:
- Add dedicated kmem_cache for typical/small skb->head, avoid having
to access struct page at kfree time, and improve memory use.
- Introduce sysctl to set default RPS configuration for new netdevs.
- Define Netlink protocol specification format which can be used to
describe messages used by each family and auto-generate parsers.
Add tools for generating kernel data structures and uAPI headers.
- Expose all net/core sysctls inside netns.
- Remove 4s sleep in netpoll if carrier is instantly detected on
boot.
- Add configurable limit of MDB entries per port, and port-vlan.
- Continue populating drop reasons throughout the stack.
- Retire a handful of legacy Qdiscs and classifiers.
Protocols:
- Support IPv4 big TCP (TSO frames larger than 64kB).
- Add IP_LOCAL_PORT_RANGE socket option, to control local port range
on socket by socket basis.
- Track and report in procfs number of MPTCP sockets used.
- Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path
manager.
- IPv6: don't check net.ipv6.route.max_size and rely on garbage
collection to free memory (similarly to IPv4).
- Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).
- ICMP: add per-rate limit counters.
- Add support for user scanning requests in ieee802154.
- Remove static WEP support.
- Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
reporting.
- WiFi 7 EHT channel puncturing support (client & AP).
BPF:
- Add a rbtree data structure following the "next-gen data structure"
precedent set by recently added linked list, that is, by using
kfunc + kptr instead of adding a new BPF map type.
- Expose XDP hints via kfuncs with initial support for RX hash and
timestamp metadata.
- Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to
better support decap on GRE tunnel devices not operating in collect
metadata.
- Improve x86 JIT's codegen for PROBE_MEM runtime error checks.
- Remove the need for trace_printk_lock for bpf_trace_printk and
bpf_trace_vprintk helpers.
- Extend libbpf's bpf_tracing.h support for tracing arguments of
kprobes/uprobes and syscall as a special case.
- Significantly reduce the search time for module symbols by
livepatch and BPF.
- Enable cpumasks to be used as kptrs, which is useful for tracing
programs tracking which tasks end up running on which CPUs in
different time intervals.
- Add support for BPF trampoline on s390x and riscv64.
- Add capability to export the XDP features supported by the NIC.
- Add __bpf_kfunc tag for marking kernel functions as kfuncs.
- Add cgroup.memory=nobpf kernel parameter option to disable BPF
memory accounting for container environments.
Netfilter:
- Remove the CLUSTERIP target. It has been marked as obsolete for
years, and we still have WARN splats wrt races of the out-of-band
/proc interface installed by this target.
- Add 'destroy' commands to nf_tables. They are identical to the
existing 'delete' commands, but do not return an error if the
referenced object (set, chain, rule...) did not exist.
Driver API:
- Improve cpumask_local_spread() locality to help NICs set the right
IRQ affinity on AMD platforms.
- Separate C22 and C45 MDIO bus transactions more clearly.
- Introduce new DCB table to control DSCP rewrite on egress.
- Support configuration of Physical Layer Collision Avoidance (PLCA)
Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
shared medium Ethernet.
- Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
preemption of low priority frames by high priority frames.
- Add support for controlling MACSec offload using netlink SET.
- Rework devlink instance refcounts to allow registration and
de-registration under the instance lock. Split the code into
multiple files, drop some of the unnecessarily granular locks and
factor out common parts of netlink operation handling.
- Add TX frame aggregation parameters (for USB drivers).
- Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
messages with notifications for debug.
- Allow offloading of UDP NEW connections via act_ct.
- Add support for per action HW stats in TC.
- Support hardware miss to TC action (continue processing in SW from
a specific point in the action chain).
- Warn if old Wireless Extension user space interface is used with
modern cfg80211/mac80211 drivers. Do not support Wireless
Extensions for Wi-Fi 7 devices at all. Everyone should switch to
using nl80211 interface instead.
- Improve the CAN bit timing configuration. Use extack to return
error messages directly to user space, update the SJW handling,
including the definition of a new default value that will benefit
CAN-FD controllers, by increasing their oscillator tolerance.
New hardware / drivers:
- Ethernet:
- nVidia BlueField-3 support (control traffic driver)
- Ethernet support for imx93 SoCs
- Motorcomm yt8531 gigabit Ethernet PHY
- onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
- Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
- Amlogic gxl MDIO mux
- WiFi:
- RealTek RTL8188EU (rtl8xxxu)
- Qualcomm Wi-Fi 7 devices (ath12k)
- CAN:
- Renesas R-Car V4H
Drivers:
- Bluetooth:
- Set Per Platform Antenna Gain (PPAG) for Intel controllers.
- Ethernet NICs:
- Intel (1G, igc):
- support TSN / Qbv / packet scheduling features of i226 model
- Intel (100G, ice):
- use GNSS subsystem instead of TTY
- multi-buffer XDP support
- extend support for GPIO pins to E823 devices
- nVidia/Mellanox:
- update the shared buffer configuration on PFC commands
- implement PTP adjphase function for HW offset control
- TC support for Geneve and GRE with VF tunnel offload
- more efficient crypto key management method
- multi-port eswitch support
- Netronome/Corigine:
- add DCB IEEE support
- support IPsec offloading for NFP3800
- Freescale/NXP (enetc):
- support XDP_REDIRECT for XDP non-linear buffers
- improve reconfig, avoid link flap and waiting for idle
- support MAC Merge layer
- Other NICs:
- sfc/ef100: add basic devlink support for ef100
- ionic: rx_push mode operation (writing descriptors via MMIO)
- bnxt: use the auxiliary bus abstraction for RDMA
- r8169: disable ASPM and reset bus in case of tx timeout
- cpsw: support QSGMII mode for J721e CPSW9G
- cpts: support pulse-per-second output
- ngbe: add an mdio bus driver
- usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
- r8152: handle devices with FW with NCM support
- amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
- virtio-net: support multi buffer XDP
- virtio/vsock: replace virtio_vsock_pkt with sk_buff
- tsnep: XDP support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add support for latency TLV (in FW control messages)
- Microchip (sparx5):
- separate explicit and implicit traffic forwarding rules, make
the implicit rules always active
- add support for egress DSCP rewrite
- IS0 VCAP support (Ingress Classification)
- IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS
etc.)
- ES2 VCAP support (Egress Access Control)
- support for Per-Stream Filtering and Policing (802.1Q,
8.6.5.1)
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- add MAB (port auth) offload support
- enable PTP receive for mv88e6390
- NXP (ocelot):
- support MAC Merge layer
- support for the the vsc7512 internal copper phys
- Microchip:
- lan9303: convert to PHYLINK
- lan966x: support TC flower filter statistics
- lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
- lan937x: support Credit Based Shaper configuration
- ksz9477: support Energy Efficient Ethernet
- other:
- qca8k: convert to regmap read/write API, use bulk operations
- rswitch: Improve TX timestamp accuracy
- Intel WiFi (iwlwifi):
- EHT (Wi-Fi 7) rate reporting
- STEP equalizer support: transfer some STEP (connection to radio
on platforms with integrated wifi) related parameters from the
BIOS to the firmware.
- Qualcomm 802.11ax WiFi (ath11k):
- IPQ5018 support
- Fine Timing Measurement (FTM) responder role support
- channel 177 support
- MediaTek WiFi (mt76):
- per-PHY LED support
- mt7996: EHT (Wi-Fi 7) support
- Wireless Ethernet Dispatch (WED) reset support
- switch to using page pool allocator
- RealTek WiFi (rtw89):
- support new version of Bluetooth co-existance
- Mobile:
- rmnet: support TX aggregation"
* tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits)
page_pool: add a comment explaining the fragment counter usage
net: ethtool: fix __ethtool_dev_mm_supported() implementation
ethtool: pse-pd: Fix double word in comments
xsk: add linux/vmalloc.h to xsk.c
sefltests: netdevsim: wait for devlink instance after netns removal
selftest: fib_tests: Always cleanup before exit
net/mlx5e: Align IPsec ASO result memory to be as required by hardware
net/mlx5e: TC, Set CT miss to the specific ct action instance
net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
net/mlx5: Refactor tc miss handling to a single function
net/mlx5: Kconfig: Make tc offload depend on tc skb extension
net/sched: flower: Support hardware miss to tc action
net/sched: flower: Move filter handle initialization earlier
net/sched: cls_api: Support hardware miss to tc action
net/sched: Rename user cookie and act cookie
sfc: fix builds without CONFIG_RTC_LIB
sfc: clean up some inconsistent indentings
net/mlx4_en: Introduce flexible array to silence overflow warning
net: lan966x: Fix possible deadlock inside PTP
net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
...
Diffstat (limited to 'tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c')
-rw-r--r-- | tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c | 402 |
1 files changed, 402 insertions, 0 deletions
diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c new file mode 100644 index 000000000..e980188d4 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c @@ -0,0 +1,402 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2019 Facebook */ + +#include <linux/err.h> +#include <netinet/tcp.h> +#include <test_progs.h> +#include "network_helpers.h" +#include "bpf_dctcp.skel.h" +#include "bpf_cubic.skel.h" +#include "bpf_tcp_nogpl.skel.h" +#include "bpf_dctcp_release.skel.h" +#include "tcp_ca_write_sk_pacing.skel.h" +#include "tcp_ca_incompl_cong_ops.skel.h" +#include "tcp_ca_unsupp_cong_op.skel.h" + +#ifndef ENOTSUPP +#define ENOTSUPP 524 +#endif + +static const unsigned int total_bytes = 10 * 1024 * 1024; +static int expected_stg = 0xeB9F; +static int stop, duration; + +static int settcpca(int fd, const char *tcp_ca) +{ + int err; + + err = setsockopt(fd, IPPROTO_TCP, TCP_CONGESTION, tcp_ca, strlen(tcp_ca)); + if (CHECK(err == -1, "setsockopt(fd, TCP_CONGESTION)", "errno:%d\n", + errno)) + return -1; + + return 0; +} + +static void *server(void *arg) +{ + int lfd = (int)(long)arg, err = 0, fd; + ssize_t nr_sent = 0, bytes = 0; + char batch[1500]; + + fd = accept(lfd, NULL, NULL); + while (fd == -1) { + if (errno == EINTR) + continue; + err = -errno; + goto done; + } + + if (settimeo(fd, 0)) { + err = -errno; + goto done; + } + + while (bytes < total_bytes && !READ_ONCE(stop)) { + nr_sent = send(fd, &batch, + MIN(total_bytes - bytes, sizeof(batch)), 0); + if (nr_sent == -1 && errno == EINTR) + continue; + if (nr_sent == -1) { + err = -errno; + break; + } + bytes += nr_sent; + } + + CHECK(bytes != total_bytes, "send", "%zd != %u nr_sent:%zd errno:%d\n", + bytes, total_bytes, nr_sent, errno); + +done: + if (fd >= 0) + close(fd); + if (err) { + WRITE_ONCE(stop, 1); + return ERR_PTR(err); + } + return NULL; +} + +static void do_test(const char *tcp_ca, const struct bpf_map *sk_stg_map) +{ + struct sockaddr_in6 sa6 = {}; + ssize_t nr_recv = 0, bytes = 0; + int lfd = -1, fd = -1; + pthread_t srv_thread; + socklen_t addrlen = sizeof(sa6); + void *thread_ret; + char batch[1500]; + int err; + + WRITE_ONCE(stop, 0); + + lfd = socket(AF_INET6, SOCK_STREAM, 0); + if (CHECK(lfd == -1, "socket", "errno:%d\n", errno)) + return; + fd = socket(AF_INET6, SOCK_STREAM, 0); + if (CHECK(fd == -1, "socket", "errno:%d\n", errno)) { + close(lfd); + return; + } + + if (settcpca(lfd, tcp_ca) || settcpca(fd, tcp_ca) || + settimeo(lfd, 0) || settimeo(fd, 0)) + goto done; + + /* bind, listen and start server thread to accept */ + sa6.sin6_family = AF_INET6; + sa6.sin6_addr = in6addr_loopback; + err = bind(lfd, (struct sockaddr *)&sa6, addrlen); + if (CHECK(err == -1, "bind", "errno:%d\n", errno)) + goto done; + err = getsockname(lfd, (struct sockaddr *)&sa6, &addrlen); + if (CHECK(err == -1, "getsockname", "errno:%d\n", errno)) + goto done; + err = listen(lfd, 1); + if (CHECK(err == -1, "listen", "errno:%d\n", errno)) + goto done; + + if (sk_stg_map) { + err = bpf_map_update_elem(bpf_map__fd(sk_stg_map), &fd, + &expected_stg, BPF_NOEXIST); + if (CHECK(err, "bpf_map_update_elem(sk_stg_map)", + "err:%d errno:%d\n", err, errno)) + goto done; + } + + /* connect to server */ + err = connect(fd, (struct sockaddr *)&sa6, addrlen); + if (CHECK(err == -1, "connect", "errno:%d\n", errno)) + goto done; + + if (sk_stg_map) { + int tmp_stg; + + err = bpf_map_lookup_elem(bpf_map__fd(sk_stg_map), &fd, + &tmp_stg); + if (CHECK(!err || errno != ENOENT, + "bpf_map_lookup_elem(sk_stg_map)", + "err:%d errno:%d\n", err, errno)) + goto done; + } + + err = pthread_create(&srv_thread, NULL, server, (void *)(long)lfd); + if (CHECK(err != 0, "pthread_create", "err:%d errno:%d\n", err, errno)) + goto done; + + /* recv total_bytes */ + while (bytes < total_bytes && !READ_ONCE(stop)) { + nr_recv = recv(fd, &batch, + MIN(total_bytes - bytes, sizeof(batch)), 0); + if (nr_recv == -1 && errno == EINTR) + continue; + if (nr_recv == -1) + break; + bytes += nr_recv; + } + + CHECK(bytes != total_bytes, "recv", "%zd != %u nr_recv:%zd errno:%d\n", + bytes, total_bytes, nr_recv, errno); + + WRITE_ONCE(stop, 1); + pthread_join(srv_thread, &thread_ret); + CHECK(IS_ERR(thread_ret), "pthread_join", "thread_ret:%ld", + PTR_ERR(thread_ret)); +done: + close(lfd); + close(fd); +} + +static void test_cubic(void) +{ + struct bpf_cubic *cubic_skel; + struct bpf_link *link; + + cubic_skel = bpf_cubic__open_and_load(); + if (CHECK(!cubic_skel, "bpf_cubic__open_and_load", "failed\n")) + return; + + link = bpf_map__attach_struct_ops(cubic_skel->maps.cubic); + if (!ASSERT_OK_PTR(link, "bpf_map__attach_struct_ops")) { + bpf_cubic__destroy(cubic_skel); + return; + } + + do_test("bpf_cubic", NULL); + + bpf_link__destroy(link); + bpf_cubic__destroy(cubic_skel); +} + +static void test_dctcp(void) +{ + struct bpf_dctcp *dctcp_skel; + struct bpf_link *link; + + dctcp_skel = bpf_dctcp__open_and_load(); + if (CHECK(!dctcp_skel, "bpf_dctcp__open_and_load", "failed\n")) + return; + + link = bpf_map__attach_struct_ops(dctcp_skel->maps.dctcp); + if (!ASSERT_OK_PTR(link, "bpf_map__attach_struct_ops")) { + bpf_dctcp__destroy(dctcp_skel); + return; + } + + do_test("bpf_dctcp", dctcp_skel->maps.sk_stg_map); + CHECK(dctcp_skel->bss->stg_result != expected_stg, + "Unexpected stg_result", "stg_result (%x) != expected_stg (%x)\n", + dctcp_skel->bss->stg_result, expected_stg); + + bpf_link__destroy(link); + bpf_dctcp__destroy(dctcp_skel); +} + +static char *err_str; +static bool found; + +static int libbpf_debug_print(enum libbpf_print_level level, + const char *format, va_list args) +{ + const char *prog_name, *log_buf; + + if (level != LIBBPF_WARN || + !strstr(format, "-- BEGIN PROG LOAD LOG --")) { + vprintf(format, args); + return 0; + } + + prog_name = va_arg(args, char *); + log_buf = va_arg(args, char *); + if (!log_buf) + goto out; + if (err_str && strstr(log_buf, err_str) != NULL) + found = true; +out: + printf(format, prog_name, log_buf); + return 0; +} + +static void test_invalid_license(void) +{ + libbpf_print_fn_t old_print_fn; + struct bpf_tcp_nogpl *skel; + + err_str = "struct ops programs must have a GPL compatible license"; + found = false; + old_print_fn = libbpf_set_print(libbpf_debug_print); + + skel = bpf_tcp_nogpl__open_and_load(); + ASSERT_NULL(skel, "bpf_tcp_nogpl"); + ASSERT_EQ(found, true, "expected_err_msg"); + + bpf_tcp_nogpl__destroy(skel); + libbpf_set_print(old_print_fn); +} + +static void test_dctcp_fallback(void) +{ + int err, lfd = -1, cli_fd = -1, srv_fd = -1; + struct network_helper_opts opts = { + .cc = "cubic", + }; + struct bpf_dctcp *dctcp_skel; + struct bpf_link *link = NULL; + char srv_cc[16]; + socklen_t cc_len = sizeof(srv_cc); + + dctcp_skel = bpf_dctcp__open(); + if (!ASSERT_OK_PTR(dctcp_skel, "dctcp_skel")) + return; + strcpy(dctcp_skel->rodata->fallback, "cubic"); + if (!ASSERT_OK(bpf_dctcp__load(dctcp_skel), "bpf_dctcp__load")) + goto done; + + link = bpf_map__attach_struct_ops(dctcp_skel->maps.dctcp); + if (!ASSERT_OK_PTR(link, "dctcp link")) + goto done; + + lfd = start_server(AF_INET6, SOCK_STREAM, "::1", 0, 0); + if (!ASSERT_GE(lfd, 0, "lfd") || + !ASSERT_OK(settcpca(lfd, "bpf_dctcp"), "lfd=>bpf_dctcp")) + goto done; + + cli_fd = connect_to_fd_opts(lfd, &opts); + if (!ASSERT_GE(cli_fd, 0, "cli_fd")) + goto done; + + srv_fd = accept(lfd, NULL, 0); + if (!ASSERT_GE(srv_fd, 0, "srv_fd")) + goto done; + ASSERT_STREQ(dctcp_skel->bss->cc_res, "cubic", "cc_res"); + ASSERT_EQ(dctcp_skel->bss->tcp_cdg_res, -ENOTSUPP, "tcp_cdg_res"); + /* All setsockopt(TCP_CONGESTION) in the recurred + * bpf_dctcp->init() should fail with -EBUSY. + */ + ASSERT_EQ(dctcp_skel->bss->ebusy_cnt, 3, "ebusy_cnt"); + + err = getsockopt(srv_fd, SOL_TCP, TCP_CONGESTION, srv_cc, &cc_len); + if (!ASSERT_OK(err, "getsockopt(srv_fd, TCP_CONGESTION)")) + goto done; + ASSERT_STREQ(srv_cc, "cubic", "srv_fd cc"); + +done: + bpf_link__destroy(link); + bpf_dctcp__destroy(dctcp_skel); + if (lfd != -1) + close(lfd); + if (srv_fd != -1) + close(srv_fd); + if (cli_fd != -1) + close(cli_fd); +} + +static void test_rel_setsockopt(void) +{ + struct bpf_dctcp_release *rel_skel; + libbpf_print_fn_t old_print_fn; + + err_str = "unknown func bpf_setsockopt"; + found = false; + + old_print_fn = libbpf_set_print(libbpf_debug_print); + rel_skel = bpf_dctcp_release__open_and_load(); + libbpf_set_print(old_print_fn); + + ASSERT_ERR_PTR(rel_skel, "rel_skel"); + ASSERT_TRUE(found, "expected_err_msg"); + + bpf_dctcp_release__destroy(rel_skel); +} + +static void test_write_sk_pacing(void) +{ + struct tcp_ca_write_sk_pacing *skel; + struct bpf_link *link; + + skel = tcp_ca_write_sk_pacing__open_and_load(); + if (!ASSERT_OK_PTR(skel, "open_and_load")) + return; + + link = bpf_map__attach_struct_ops(skel->maps.write_sk_pacing); + ASSERT_OK_PTR(link, "attach_struct_ops"); + + bpf_link__destroy(link); + tcp_ca_write_sk_pacing__destroy(skel); +} + +static void test_incompl_cong_ops(void) +{ + struct tcp_ca_incompl_cong_ops *skel; + struct bpf_link *link; + + skel = tcp_ca_incompl_cong_ops__open_and_load(); + if (!ASSERT_OK_PTR(skel, "open_and_load")) + return; + + /* That cong_avoid() and cong_control() are missing is only reported at + * this point: + */ + link = bpf_map__attach_struct_ops(skel->maps.incompl_cong_ops); + ASSERT_ERR_PTR(link, "attach_struct_ops"); + + bpf_link__destroy(link); + tcp_ca_incompl_cong_ops__destroy(skel); +} + +static void test_unsupp_cong_op(void) +{ + libbpf_print_fn_t old_print_fn; + struct tcp_ca_unsupp_cong_op *skel; + + err_str = "attach to unsupported member get_info"; + found = false; + old_print_fn = libbpf_set_print(libbpf_debug_print); + + skel = tcp_ca_unsupp_cong_op__open_and_load(); + ASSERT_NULL(skel, "open_and_load"); + ASSERT_EQ(found, true, "expected_err_msg"); + + tcp_ca_unsupp_cong_op__destroy(skel); + libbpf_set_print(old_print_fn); +} + +void test_bpf_tcp_ca(void) +{ + if (test__start_subtest("dctcp")) + test_dctcp(); + if (test__start_subtest("cubic")) + test_cubic(); + if (test__start_subtest("invalid_license")) + test_invalid_license(); + if (test__start_subtest("dctcp_fallback")) + test_dctcp_fallback(); + if (test__start_subtest("rel_setsockopt")) + test_rel_setsockopt(); + if (test__start_subtest("write_sk_pacing")) + test_write_sk_pacing(); + if (test__start_subtest("incompl_cong_ops")) + test_incompl_cong_ops(); + if (test__start_subtest("unsupp_cong_op")) + test_unsupp_cong_op(); +} |