diff options
author | 2023-02-21 18:24:12 -0800 | |
---|---|---|
committer | 2023-02-21 18:24:12 -0800 | |
commit | 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 (patch) | |
tree | cc5c2d0a898769fd59549594fedb3ee6f84e59a0 /net/rxrpc/conn_client.c | |
download | linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.tar.gz linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.zip |
Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextgrafted
Pull networking updates from Jakub Kicinski:
"Core:
- Add dedicated kmem_cache for typical/small skb->head, avoid having
to access struct page at kfree time, and improve memory use.
- Introduce sysctl to set default RPS configuration for new netdevs.
- Define Netlink protocol specification format which can be used to
describe messages used by each family and auto-generate parsers.
Add tools for generating kernel data structures and uAPI headers.
- Expose all net/core sysctls inside netns.
- Remove 4s sleep in netpoll if carrier is instantly detected on
boot.
- Add configurable limit of MDB entries per port, and port-vlan.
- Continue populating drop reasons throughout the stack.
- Retire a handful of legacy Qdiscs and classifiers.
Protocols:
- Support IPv4 big TCP (TSO frames larger than 64kB).
- Add IP_LOCAL_PORT_RANGE socket option, to control local port range
on socket by socket basis.
- Track and report in procfs number of MPTCP sockets used.
- Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path
manager.
- IPv6: don't check net.ipv6.route.max_size and rely on garbage
collection to free memory (similarly to IPv4).
- Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).
- ICMP: add per-rate limit counters.
- Add support for user scanning requests in ieee802154.
- Remove static WEP support.
- Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
reporting.
- WiFi 7 EHT channel puncturing support (client & AP).
BPF:
- Add a rbtree data structure following the "next-gen data structure"
precedent set by recently added linked list, that is, by using
kfunc + kptr instead of adding a new BPF map type.
- Expose XDP hints via kfuncs with initial support for RX hash and
timestamp metadata.
- Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to
better support decap on GRE tunnel devices not operating in collect
metadata.
- Improve x86 JIT's codegen for PROBE_MEM runtime error checks.
- Remove the need for trace_printk_lock for bpf_trace_printk and
bpf_trace_vprintk helpers.
- Extend libbpf's bpf_tracing.h support for tracing arguments of
kprobes/uprobes and syscall as a special case.
- Significantly reduce the search time for module symbols by
livepatch and BPF.
- Enable cpumasks to be used as kptrs, which is useful for tracing
programs tracking which tasks end up running on which CPUs in
different time intervals.
- Add support for BPF trampoline on s390x and riscv64.
- Add capability to export the XDP features supported by the NIC.
- Add __bpf_kfunc tag for marking kernel functions as kfuncs.
- Add cgroup.memory=nobpf kernel parameter option to disable BPF
memory accounting for container environments.
Netfilter:
- Remove the CLUSTERIP target. It has been marked as obsolete for
years, and we still have WARN splats wrt races of the out-of-band
/proc interface installed by this target.
- Add 'destroy' commands to nf_tables. They are identical to the
existing 'delete' commands, but do not return an error if the
referenced object (set, chain, rule...) did not exist.
Driver API:
- Improve cpumask_local_spread() locality to help NICs set the right
IRQ affinity on AMD platforms.
- Separate C22 and C45 MDIO bus transactions more clearly.
- Introduce new DCB table to control DSCP rewrite on egress.
- Support configuration of Physical Layer Collision Avoidance (PLCA)
Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
shared medium Ethernet.
- Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
preemption of low priority frames by high priority frames.
- Add support for controlling MACSec offload using netlink SET.
- Rework devlink instance refcounts to allow registration and
de-registration under the instance lock. Split the code into
multiple files, drop some of the unnecessarily granular locks and
factor out common parts of netlink operation handling.
- Add TX frame aggregation parameters (for USB drivers).
- Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
messages with notifications for debug.
- Allow offloading of UDP NEW connections via act_ct.
- Add support for per action HW stats in TC.
- Support hardware miss to TC action (continue processing in SW from
a specific point in the action chain).
- Warn if old Wireless Extension user space interface is used with
modern cfg80211/mac80211 drivers. Do not support Wireless
Extensions for Wi-Fi 7 devices at all. Everyone should switch to
using nl80211 interface instead.
- Improve the CAN bit timing configuration. Use extack to return
error messages directly to user space, update the SJW handling,
including the definition of a new default value that will benefit
CAN-FD controllers, by increasing their oscillator tolerance.
New hardware / drivers:
- Ethernet:
- nVidia BlueField-3 support (control traffic driver)
- Ethernet support for imx93 SoCs
- Motorcomm yt8531 gigabit Ethernet PHY
- onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
- Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
- Amlogic gxl MDIO mux
- WiFi:
- RealTek RTL8188EU (rtl8xxxu)
- Qualcomm Wi-Fi 7 devices (ath12k)
- CAN:
- Renesas R-Car V4H
Drivers:
- Bluetooth:
- Set Per Platform Antenna Gain (PPAG) for Intel controllers.
- Ethernet NICs:
- Intel (1G, igc):
- support TSN / Qbv / packet scheduling features of i226 model
- Intel (100G, ice):
- use GNSS subsystem instead of TTY
- multi-buffer XDP support
- extend support for GPIO pins to E823 devices
- nVidia/Mellanox:
- update the shared buffer configuration on PFC commands
- implement PTP adjphase function for HW offset control
- TC support for Geneve and GRE with VF tunnel offload
- more efficient crypto key management method
- multi-port eswitch support
- Netronome/Corigine:
- add DCB IEEE support
- support IPsec offloading for NFP3800
- Freescale/NXP (enetc):
- support XDP_REDIRECT for XDP non-linear buffers
- improve reconfig, avoid link flap and waiting for idle
- support MAC Merge layer
- Other NICs:
- sfc/ef100: add basic devlink support for ef100
- ionic: rx_push mode operation (writing descriptors via MMIO)
- bnxt: use the auxiliary bus abstraction for RDMA
- r8169: disable ASPM and reset bus in case of tx timeout
- cpsw: support QSGMII mode for J721e CPSW9G
- cpts: support pulse-per-second output
- ngbe: add an mdio bus driver
- usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
- r8152: handle devices with FW with NCM support
- amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
- virtio-net: support multi buffer XDP
- virtio/vsock: replace virtio_vsock_pkt with sk_buff
- tsnep: XDP support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add support for latency TLV (in FW control messages)
- Microchip (sparx5):
- separate explicit and implicit traffic forwarding rules, make
the implicit rules always active
- add support for egress DSCP rewrite
- IS0 VCAP support (Ingress Classification)
- IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS
etc.)
- ES2 VCAP support (Egress Access Control)
- support for Per-Stream Filtering and Policing (802.1Q,
8.6.5.1)
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- add MAB (port auth) offload support
- enable PTP receive for mv88e6390
- NXP (ocelot):
- support MAC Merge layer
- support for the the vsc7512 internal copper phys
- Microchip:
- lan9303: convert to PHYLINK
- lan966x: support TC flower filter statistics
- lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
- lan937x: support Credit Based Shaper configuration
- ksz9477: support Energy Efficient Ethernet
- other:
- qca8k: convert to regmap read/write API, use bulk operations
- rswitch: Improve TX timestamp accuracy
- Intel WiFi (iwlwifi):
- EHT (Wi-Fi 7) rate reporting
- STEP equalizer support: transfer some STEP (connection to radio
on platforms with integrated wifi) related parameters from the
BIOS to the firmware.
- Qualcomm 802.11ax WiFi (ath11k):
- IPQ5018 support
- Fine Timing Measurement (FTM) responder role support
- channel 177 support
- MediaTek WiFi (mt76):
- per-PHY LED support
- mt7996: EHT (Wi-Fi 7) support
- Wireless Ethernet Dispatch (WED) reset support
- switch to using page pool allocator
- RealTek WiFi (rtw89):
- support new version of Bluetooth co-existance
- Mobile:
- rmnet: support TX aggregation"
* tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits)
page_pool: add a comment explaining the fragment counter usage
net: ethtool: fix __ethtool_dev_mm_supported() implementation
ethtool: pse-pd: Fix double word in comments
xsk: add linux/vmalloc.h to xsk.c
sefltests: netdevsim: wait for devlink instance after netns removal
selftest: fib_tests: Always cleanup before exit
net/mlx5e: Align IPsec ASO result memory to be as required by hardware
net/mlx5e: TC, Set CT miss to the specific ct action instance
net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
net/mlx5: Refactor tc miss handling to a single function
net/mlx5: Kconfig: Make tc offload depend on tc skb extension
net/sched: flower: Support hardware miss to tc action
net/sched: flower: Move filter handle initialization earlier
net/sched: cls_api: Support hardware miss to tc action
net/sched: Rename user cookie and act cookie
sfc: fix builds without CONFIG_RTC_LIB
sfc: clean up some inconsistent indentings
net/mlx4_en: Introduce flexible array to silence overflow warning
net: lan966x: Fix possible deadlock inside PTP
net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
...
Diffstat (limited to '')
-rw-r--r-- | net/rxrpc/conn_client.c | 816 |
1 files changed, 816 insertions, 0 deletions
diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c new file mode 100644 index 000000000..981ca5b98 --- /dev/null +++ b/net/rxrpc/conn_client.c @@ -0,0 +1,816 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* Client connection-specific management code. + * + * Copyright (C) 2016, 2020 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + * + * Client connections need to be cached for a little while after they've made a + * call so as to handle retransmitted DATA packets in case the server didn't + * receive the final ACK or terminating ABORT we sent it. + * + * There are flags of relevance to the cache: + * + * (2) DONT_REUSE - The connection should be discarded as soon as possible and + * should not be reused. This is set when an exclusive connection is used + * or a call ID counter overflows. + * + * The caching state may only be changed if the cache lock is held. + * + * There are two idle client connection expiry durations. If the total number + * of connections is below the reap threshold, we use the normal duration; if + * it's above, we use the fast duration. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/slab.h> +#include <linux/idr.h> +#include <linux/timer.h> +#include <linux/sched/signal.h> + +#include "ar-internal.h" + +__read_mostly unsigned int rxrpc_reap_client_connections = 900; +__read_mostly unsigned long rxrpc_conn_idle_client_expiry = 2 * 60 * HZ; +__read_mostly unsigned long rxrpc_conn_idle_client_fast_expiry = 2 * HZ; + +static void rxrpc_activate_bundle(struct rxrpc_bundle *bundle) +{ + atomic_inc(&bundle->active); +} + +/* + * Release a connection ID for a client connection. + */ +static void rxrpc_put_client_connection_id(struct rxrpc_local *local, + struct rxrpc_connection *conn) +{ + idr_remove(&local->conn_ids, conn->proto.cid >> RXRPC_CIDSHIFT); +} + +/* + * Destroy the client connection ID tree. + */ +static void rxrpc_destroy_client_conn_ids(struct rxrpc_local *local) +{ + struct rxrpc_connection *conn; + int id; + + if (!idr_is_empty(&local->conn_ids)) { + idr_for_each_entry(&local->conn_ids, conn, id) { + pr_err("AF_RXRPC: Leaked client conn %p {%d}\n", + conn, refcount_read(&conn->ref)); + } + BUG(); + } + + idr_destroy(&local->conn_ids); +} + +/* + * Allocate a connection bundle. + */ +static struct rxrpc_bundle *rxrpc_alloc_bundle(struct rxrpc_call *call, + gfp_t gfp) +{ + struct rxrpc_bundle *bundle; + + bundle = kzalloc(sizeof(*bundle), gfp); + if (bundle) { + bundle->local = call->local; + bundle->peer = rxrpc_get_peer(call->peer, rxrpc_peer_get_bundle); + bundle->key = key_get(call->key); + bundle->security = call->security; + bundle->exclusive = test_bit(RXRPC_CALL_EXCLUSIVE, &call->flags); + bundle->upgrade = test_bit(RXRPC_CALL_UPGRADE, &call->flags); + bundle->service_id = call->dest_srx.srx_service; + bundle->security_level = call->security_level; + refcount_set(&bundle->ref, 1); + atomic_set(&bundle->active, 1); + INIT_LIST_HEAD(&bundle->waiting_calls); + trace_rxrpc_bundle(bundle->debug_id, 1, rxrpc_bundle_new); + } + return bundle; +} + +struct rxrpc_bundle *rxrpc_get_bundle(struct rxrpc_bundle *bundle, + enum rxrpc_bundle_trace why) +{ + int r; + + __refcount_inc(&bundle->ref, &r); + trace_rxrpc_bundle(bundle->debug_id, r + 1, why); + return bundle; +} + +static void rxrpc_free_bundle(struct rxrpc_bundle *bundle) +{ + trace_rxrpc_bundle(bundle->debug_id, 1, rxrpc_bundle_free); + rxrpc_put_peer(bundle->peer, rxrpc_peer_put_bundle); + key_put(bundle->key); + kfree(bundle); +} + +void rxrpc_put_bundle(struct rxrpc_bundle *bundle, enum rxrpc_bundle_trace why) +{ + unsigned int id; + bool dead; + int r; + + if (bundle) { + id = bundle->debug_id; + dead = __refcount_dec_and_test(&bundle->ref, &r); + trace_rxrpc_bundle(id, r - 1, why); + if (dead) + rxrpc_free_bundle(bundle); + } +} + +/* + * Get rid of outstanding client connection preallocations when a local + * endpoint is destroyed. + */ +void rxrpc_purge_client_connections(struct rxrpc_local *local) +{ + rxrpc_destroy_client_conn_ids(local); +} + +/* + * Allocate a client connection. + */ +static struct rxrpc_connection * +rxrpc_alloc_client_connection(struct rxrpc_bundle *bundle) +{ + struct rxrpc_connection *conn; + struct rxrpc_local *local = bundle->local; + struct rxrpc_net *rxnet = local->rxnet; + int id; + + _enter(""); + + conn = rxrpc_alloc_connection(rxnet, GFP_ATOMIC | __GFP_NOWARN); + if (!conn) + return ERR_PTR(-ENOMEM); + + id = idr_alloc_cyclic(&local->conn_ids, conn, 1, 0x40000000, + GFP_ATOMIC | __GFP_NOWARN); + if (id < 0) { + kfree(conn); + return ERR_PTR(id); + } + + refcount_set(&conn->ref, 1); + conn->proto.cid = id << RXRPC_CIDSHIFT; + conn->proto.epoch = local->rxnet->epoch; + conn->out_clientflag = RXRPC_CLIENT_INITIATED; + conn->bundle = rxrpc_get_bundle(bundle, rxrpc_bundle_get_client_conn); + conn->local = rxrpc_get_local(bundle->local, rxrpc_local_get_client_conn); + conn->peer = rxrpc_get_peer(bundle->peer, rxrpc_peer_get_client_conn); + conn->key = key_get(bundle->key); + conn->security = bundle->security; + conn->exclusive = bundle->exclusive; + conn->upgrade = bundle->upgrade; + conn->orig_service_id = bundle->service_id; + conn->security_level = bundle->security_level; + conn->state = RXRPC_CONN_CLIENT_UNSECURED; + conn->service_id = conn->orig_service_id; + + if (conn->security == &rxrpc_no_security) + conn->state = RXRPC_CONN_CLIENT; + + atomic_inc(&rxnet->nr_conns); + write_lock(&rxnet->conn_lock); + list_add_tail(&conn->proc_link, &rxnet->conn_proc_list); + write_unlock(&rxnet->conn_lock); + + rxrpc_see_connection(conn, rxrpc_conn_new_client); + + atomic_inc(&rxnet->nr_client_conns); + trace_rxrpc_client(conn, -1, rxrpc_client_alloc); + return conn; +} + +/* + * Determine if a connection may be reused. + */ +static bool rxrpc_may_reuse_conn(struct rxrpc_connection *conn) +{ + struct rxrpc_net *rxnet; + int id_cursor, id, distance, limit; + + if (!conn) + goto dont_reuse; + + rxnet = conn->rxnet; + if (test_bit(RXRPC_CONN_DONT_REUSE, &conn->flags)) + goto dont_reuse; + + if ((conn->state != RXRPC_CONN_CLIENT_UNSECURED && + conn->state != RXRPC_CONN_CLIENT) || + conn->proto.epoch != rxnet->epoch) + goto mark_dont_reuse; + + /* The IDR tree gets very expensive on memory if the connection IDs are + * widely scattered throughout the number space, so we shall want to + * kill off connections that, say, have an ID more than about four + * times the maximum number of client conns away from the current + * allocation point to try and keep the IDs concentrated. + */ + id_cursor = idr_get_cursor(&conn->local->conn_ids); + id = conn->proto.cid >> RXRPC_CIDSHIFT; + distance = id - id_cursor; + if (distance < 0) + distance = -distance; + limit = max_t(unsigned long, atomic_read(&rxnet->nr_conns) * 4, 1024); + if (distance > limit) + goto mark_dont_reuse; + + return true; + +mark_dont_reuse: + set_bit(RXRPC_CONN_DONT_REUSE, &conn->flags); +dont_reuse: + return false; +} + +/* + * Look up the conn bundle that matches the connection parameters, adding it if + * it doesn't yet exist. + */ +int rxrpc_look_up_bundle(struct rxrpc_call *call, gfp_t gfp) +{ + static atomic_t rxrpc_bundle_id; + struct rxrpc_bundle *bundle, *candidate; + struct rxrpc_local *local = call->local; + struct rb_node *p, **pp, *parent; + long diff; + bool upgrade = test_bit(RXRPC_CALL_UPGRADE, &call->flags); + + _enter("{%px,%x,%u,%u}", + call->peer, key_serial(call->key), call->security_level, + upgrade); + + if (test_bit(RXRPC_CALL_EXCLUSIVE, &call->flags)) { + call->bundle = rxrpc_alloc_bundle(call, gfp); + return call->bundle ? 0 : -ENOMEM; + } + + /* First, see if the bundle is already there. */ + _debug("search 1"); + spin_lock(&local->client_bundles_lock); + p = local->client_bundles.rb_node; + while (p) { + bundle = rb_entry(p, struct rxrpc_bundle, local_node); + +#define cmp(X, Y) ((long)(X) - (long)(Y)) + diff = (cmp(bundle->peer, call->peer) ?: + cmp(bundle->key, call->key) ?: + cmp(bundle->security_level, call->security_level) ?: + cmp(bundle->upgrade, upgrade)); +#undef cmp + if (diff < 0) + p = p->rb_left; + else if (diff > 0) + p = p->rb_right; + else + goto found_bundle; + } + spin_unlock(&local->client_bundles_lock); + _debug("not found"); + + /* It wasn't. We need to add one. */ + candidate = rxrpc_alloc_bundle(call, gfp); + if (!candidate) + return -ENOMEM; + + _debug("search 2"); + spin_lock(&local->client_bundles_lock); + pp = &local->client_bundles.rb_node; + parent = NULL; + while (*pp) { + parent = *pp; + bundle = rb_entry(parent, struct rxrpc_bundle, local_node); + +#define cmp(X, Y) ((long)(X) - (long)(Y)) + diff = (cmp(bundle->peer, call->peer) ?: + cmp(bundle->key, call->key) ?: + cmp(bundle->security_level, call->security_level) ?: + cmp(bundle->upgrade, upgrade)); +#undef cmp + if (diff < 0) + pp = &(*pp)->rb_left; + else if (diff > 0) + pp = &(*pp)->rb_right; + else + goto found_bundle_free; + } + + _debug("new bundle"); + candidate->debug_id = atomic_inc_return(&rxrpc_bundle_id); + rb_link_node(&candidate->local_node, parent, pp); + rb_insert_color(&candidate->local_node, &local->client_bundles); + call->bundle = rxrpc_get_bundle(candidate, rxrpc_bundle_get_client_call); + spin_unlock(&local->client_bundles_lock); + _leave(" = B=%u [new]", call->bundle->debug_id); + return 0; + +found_bundle_free: + rxrpc_free_bundle(candidate); +found_bundle: + call->bundle = rxrpc_get_bundle(bundle, rxrpc_bundle_get_client_call); + rxrpc_activate_bundle(bundle); + spin_unlock(&local->client_bundles_lock); + _leave(" = B=%u [found]", call->bundle->debug_id); + return 0; +} + +/* + * Allocate a new connection and add it into a bundle. + */ +static bool rxrpc_add_conn_to_bundle(struct rxrpc_bundle *bundle, + unsigned int slot) +{ + struct rxrpc_connection *conn, *old; + unsigned int shift = slot * RXRPC_MAXCALLS; + unsigned int i; + + old = bundle->conns[slot]; + if (old) { + bundle->conns[slot] = NULL; + trace_rxrpc_client(old, -1, rxrpc_client_replace); + rxrpc_put_connection(old, rxrpc_conn_put_noreuse); + } + + conn = rxrpc_alloc_client_connection(bundle); + if (IS_ERR(conn)) { + bundle->alloc_error = PTR_ERR(conn); + return false; + } + + rxrpc_activate_bundle(bundle); + conn->bundle_shift = shift; + bundle->conns[slot] = conn; + for (i = 0; i < RXRPC_MAXCALLS; i++) + set_bit(shift + i, &bundle->avail_chans); + return true; +} + +/* + * Add a connection to a bundle if there are no usable connections or we have + * connections waiting for extra capacity. + */ +static bool rxrpc_bundle_has_space(struct rxrpc_bundle *bundle) +{ + int slot = -1, i, usable; + + _enter(""); + + bundle->alloc_error = 0; + + /* See if there are any usable connections. */ + usable = 0; + for (i = 0; i < ARRAY_SIZE(bundle->conns); i++) { + if (rxrpc_may_reuse_conn(bundle->conns[i])) + usable++; + else if (slot == -1) + slot = i; + } + + if (!usable && bundle->upgrade) + bundle->try_upgrade = true; + + if (!usable) + goto alloc_conn; + + if (!bundle->avail_chans && + !bundle->try_upgrade && + usable < ARRAY_SIZE(bundle->conns)) + goto alloc_conn; + + _leave(""); + return usable; + +alloc_conn: + return slot >= 0 ? rxrpc_add_conn_to_bundle(bundle, slot) : false; +} + +/* + * Assign a channel to the call at the front of the queue and wake the call up. + * We don't increment the callNumber counter until this number has been exposed + * to the world. + */ +static void rxrpc_activate_one_channel(struct rxrpc_connection *conn, + unsigned int channel) +{ + struct rxrpc_channel *chan = &conn->channels[channel]; + struct rxrpc_bundle *bundle = conn->bundle; + struct rxrpc_call *call = list_entry(bundle->waiting_calls.next, + struct rxrpc_call, wait_link); + u32 call_id = chan->call_counter + 1; + + _enter("C=%x,%u", conn->debug_id, channel); + + list_del_init(&call->wait_link); + + trace_rxrpc_client(conn, channel, rxrpc_client_chan_activate); + + /* Cancel the final ACK on the previous call if it hasn't been sent yet + * as the DATA packet will implicitly ACK it. + */ + clear_bit(RXRPC_CONN_FINAL_ACK_0 + channel, &conn->flags); + clear_bit(conn->bundle_shift + channel, &bundle->avail_chans); + + rxrpc_see_call(call, rxrpc_call_see_activate_client); + call->conn = rxrpc_get_connection(conn, rxrpc_conn_get_activate_call); + call->cid = conn->proto.cid | channel; + call->call_id = call_id; + call->dest_srx.srx_service = conn->service_id; + call->cong_ssthresh = call->peer->cong_ssthresh; + if (call->cong_cwnd >= call->cong_ssthresh) + call->cong_mode = RXRPC_CALL_CONGEST_AVOIDANCE; + else + call->cong_mode = RXRPC_CALL_SLOW_START; + + chan->call_id = call_id; + chan->call_debug_id = call->debug_id; + chan->call = call; + + rxrpc_see_call(call, rxrpc_call_see_connected); + trace_rxrpc_connect_call(call); + call->tx_last_sent = ktime_get_real(); + rxrpc_start_call_timer(call); + rxrpc_set_call_state(call, RXRPC_CALL_CLIENT_SEND_REQUEST); + wake_up(&call->waitq); +} + +/* + * Remove a connection from the idle list if it's on it. + */ +static void rxrpc_unidle_conn(struct rxrpc_connection *conn) +{ + if (!list_empty(&conn->cache_link)) { + list_del_init(&conn->cache_link); + rxrpc_put_connection(conn, rxrpc_conn_put_unidle); + } +} + +/* + * Assign channels and callNumbers to waiting calls. + */ +static void rxrpc_activate_channels(struct rxrpc_bundle *bundle) +{ + struct rxrpc_connection *conn; + unsigned long avail, mask; + unsigned int channel, slot; + + trace_rxrpc_client(NULL, -1, rxrpc_client_activate_chans); + + if (bundle->try_upgrade) + mask = 1; + else + mask = ULONG_MAX; + + while (!list_empty(&bundle->waiting_calls)) { + avail = bundle->avail_chans & mask; + if (!avail) + break; + channel = __ffs(avail); + clear_bit(channel, &bundle->avail_chans); + + slot = channel / RXRPC_MAXCALLS; + conn = bundle->conns[slot]; + if (!conn) + break; + + if (bundle->try_upgrade) + set_bit(RXRPC_CONN_PROBING_FOR_UPGRADE, &conn->flags); + rxrpc_unidle_conn(conn); + + channel &= (RXRPC_MAXCALLS - 1); + conn->act_chans |= 1 << channel; + rxrpc_activate_one_channel(conn, channel); + } +} + +/* + * Connect waiting channels (called from the I/O thread). + */ +void rxrpc_connect_client_calls(struct rxrpc_local *local) +{ + struct rxrpc_call *call; + + while ((call = list_first_entry_or_null(&local->new_client_calls, + struct rxrpc_call, wait_link)) + ) { + struct rxrpc_bundle *bundle = call->bundle; + + spin_lock(&local->client_call_lock); + list_move_tail(&call->wait_link, &bundle->waiting_calls); + spin_unlock(&local->client_call_lock); + + if (rxrpc_bundle_has_space(bundle)) + rxrpc_activate_channels(bundle); + } +} + +/* + * Note that a call, and thus a connection, is about to be exposed to the + * world. + */ +void rxrpc_expose_client_call(struct rxrpc_call *call) +{ + unsigned int channel = call->cid & RXRPC_CHANNELMASK; + struct rxrpc_connection *conn = call->conn; + struct rxrpc_channel *chan = &conn->channels[channel]; + + if (!test_and_set_bit(RXRPC_CALL_EXPOSED, &call->flags)) { + /* Mark the call ID as being used. If the callNumber counter + * exceeds ~2 billion, we kill the connection after its + * outstanding calls have finished so that the counter doesn't + * wrap. + */ + chan->call_counter++; + if (chan->call_counter >= INT_MAX) + set_bit(RXRPC_CONN_DONT_REUSE, &conn->flags); + trace_rxrpc_client(conn, channel, rxrpc_client_exposed); + + spin_lock(&call->peer->lock); + hlist_add_head(&call->error_link, &call->peer->error_targets); + spin_unlock(&call->peer->lock); + } +} + +/* + * Set the reap timer. + */ +static void rxrpc_set_client_reap_timer(struct rxrpc_local *local) +{ + if (!local->kill_all_client_conns) { + unsigned long now = jiffies; + unsigned long reap_at = now + rxrpc_conn_idle_client_expiry; + + if (local->rxnet->live) + timer_reduce(&local->client_conn_reap_timer, reap_at); + } +} + +/* + * Disconnect a client call. + */ +void rxrpc_disconnect_client_call(struct rxrpc_bundle *bundle, struct rxrpc_call *call) +{ + struct rxrpc_connection *conn; + struct rxrpc_channel *chan = NULL; + struct rxrpc_local *local = bundle->local; + unsigned int channel; + bool may_reuse; + u32 cid; + + _enter("c=%x", call->debug_id); + + /* Calls that have never actually been assigned a channel can simply be + * discarded. + */ + conn = call->conn; + if (!conn) { + _debug("call is waiting"); + ASSERTCMP(call->call_id, ==, 0); + ASSERT(!test_bit(RXRPC_CALL_EXPOSED, &call->flags)); + list_del_init(&call->wait_link); + return; + } + + cid = call->cid; + channel = cid & RXRPC_CHANNELMASK; + chan = &conn->channels[channel]; + trace_rxrpc_client(conn, channel, rxrpc_client_chan_disconnect); + + if (WARN_ON(chan->call != call)) + return; + + may_reuse = rxrpc_may_reuse_conn(conn); + + /* If a client call was exposed to the world, we save the result for + * retransmission. + * + * We use a barrier here so that the call number and abort code can be + * read without needing to take a lock. + * + * TODO: Make the incoming packet handler check this and handle + * terminal retransmission without requiring access to the call. + */ + if (test_bit(RXRPC_CALL_EXPOSED, &call->flags)) { + _debug("exposed %u,%u", call->call_id, call->abort_code); + __rxrpc_disconnect_call(conn, call); + + if (test_and_clear_bit(RXRPC_CONN_PROBING_FOR_UPGRADE, &conn->flags)) { + trace_rxrpc_client(conn, channel, rxrpc_client_to_active); + bundle->try_upgrade = false; + if (may_reuse) + rxrpc_activate_channels(bundle); + } + } + + /* See if we can pass the channel directly to another call. */ + if (may_reuse && !list_empty(&bundle->waiting_calls)) { + trace_rxrpc_client(conn, channel, rxrpc_client_chan_pass); + rxrpc_activate_one_channel(conn, channel); + return; + } + + /* Schedule the final ACK to be transmitted in a short while so that it + * can be skipped if we find a follow-on call. The first DATA packet + * of the follow on call will implicitly ACK this call. + */ + if (call->completion == RXRPC_CALL_SUCCEEDED && + test_bit(RXRPC_CALL_EXPOSED, &call->flags)) { + unsigned long final_ack_at = jiffies + 2; + + WRITE_ONCE(chan->final_ack_at, final_ack_at); + smp_wmb(); /* vs rxrpc_process_delayed_final_acks() */ + set_bit(RXRPC_CONN_FINAL_ACK_0 + channel, &conn->flags); + rxrpc_reduce_conn_timer(conn, final_ack_at); + } + + /* Deactivate the channel. */ + chan->call = NULL; + set_bit(conn->bundle_shift + channel, &conn->bundle->avail_chans); + conn->act_chans &= ~(1 << channel); + + /* If no channels remain active, then put the connection on the idle + * list for a short while. Give it a ref to stop it going away if it + * becomes unbundled. + */ + if (!conn->act_chans) { + trace_rxrpc_client(conn, channel, rxrpc_client_to_idle); + conn->idle_timestamp = jiffies; + + rxrpc_get_connection(conn, rxrpc_conn_get_idle); + list_move_tail(&conn->cache_link, &local->idle_client_conns); + + rxrpc_set_client_reap_timer(local); + } +} + +/* + * Remove a connection from a bundle. + */ +static void rxrpc_unbundle_conn(struct rxrpc_connection *conn) +{ + struct rxrpc_bundle *bundle = conn->bundle; + unsigned int bindex; + int i; + + _enter("C=%x", conn->debug_id); + + if (conn->flags & RXRPC_CONN_FINAL_ACK_MASK) + rxrpc_process_delayed_final_acks(conn, true); + + bindex = conn->bundle_shift / RXRPC_MAXCALLS; + if (bundle->conns[bindex] == conn) { + _debug("clear slot %u", bindex); + bundle->conns[bindex] = NULL; + for (i = 0; i < RXRPC_MAXCALLS; i++) + clear_bit(conn->bundle_shift + i, &bundle->avail_chans); + rxrpc_put_client_connection_id(bundle->local, conn); + rxrpc_deactivate_bundle(bundle); + rxrpc_put_connection(conn, rxrpc_conn_put_unbundle); + } +} + +/* + * Drop the active count on a bundle. + */ +void rxrpc_deactivate_bundle(struct rxrpc_bundle *bundle) +{ + struct rxrpc_local *local; + bool need_put = false; + + if (!bundle) + return; + + local = bundle->local; + if (atomic_dec_and_lock(&bundle->active, &local->client_bundles_lock)) { + if (!bundle->exclusive) { + _debug("erase bundle"); + rb_erase(&bundle->local_node, &local->client_bundles); + need_put = true; + } + + spin_unlock(&local->client_bundles_lock); + if (need_put) + rxrpc_put_bundle(bundle, rxrpc_bundle_put_discard); + } +} + +/* + * Clean up a dead client connection. + */ +void rxrpc_kill_client_conn(struct rxrpc_connection *conn) +{ + struct rxrpc_local *local = conn->local; + struct rxrpc_net *rxnet = local->rxnet; + + _enter("C=%x", conn->debug_id); + + trace_rxrpc_client(conn, -1, rxrpc_client_cleanup); + atomic_dec(&rxnet->nr_client_conns); + + rxrpc_put_client_connection_id(local, conn); +} + +/* + * Discard expired client connections from the idle list. Each conn in the + * idle list has been exposed and holds an extra ref because of that. + * + * This may be called from conn setup or from a work item so cannot be + * considered non-reentrant. + */ +void rxrpc_discard_expired_client_conns(struct rxrpc_local *local) +{ + struct rxrpc_connection *conn; + unsigned long expiry, conn_expires_at, now; + unsigned int nr_conns; + + _enter(""); + + /* We keep an estimate of what the number of conns ought to be after + * we've discarded some so that we don't overdo the discarding. + */ + nr_conns = atomic_read(&local->rxnet->nr_client_conns); + +next: + conn = list_first_entry_or_null(&local->idle_client_conns, + struct rxrpc_connection, cache_link); + if (!conn) + return; + + if (!local->kill_all_client_conns) { + /* If the number of connections is over the reap limit, we + * expedite discard by reducing the expiry timeout. We must, + * however, have at least a short grace period to be able to do + * final-ACK or ABORT retransmission. + */ + expiry = rxrpc_conn_idle_client_expiry; + if (nr_conns > rxrpc_reap_client_connections) + expiry = rxrpc_conn_idle_client_fast_expiry; + if (conn->local->service_closed) + expiry = rxrpc_closed_conn_expiry * HZ; + + conn_expires_at = conn->idle_timestamp + expiry; + + now = READ_ONCE(jiffies); + if (time_after(conn_expires_at, now)) + goto not_yet_expired; + } + + atomic_dec(&conn->active); + trace_rxrpc_client(conn, -1, rxrpc_client_discard); + list_del_init(&conn->cache_link); + + rxrpc_unbundle_conn(conn); + /* Drop the ->cache_link ref */ + rxrpc_put_connection(conn, rxrpc_conn_put_discard_idle); + + nr_conns--; + goto next; + +not_yet_expired: + /* The connection at the front of the queue hasn't yet expired, so + * schedule the work item for that point if we discarded something. + * + * We don't worry if the work item is already scheduled - it can look + * after rescheduling itself at a later time. We could cancel it, but + * then things get messier. + */ + _debug("not yet"); + if (!local->kill_all_client_conns) + timer_reduce(&local->client_conn_reap_timer, conn_expires_at); + + _leave(""); +} + +/* + * Clean up the client connections on a local endpoint. + */ +void rxrpc_clean_up_local_conns(struct rxrpc_local *local) +{ + struct rxrpc_connection *conn; + + _enter(""); + + local->kill_all_client_conns = true; + + del_timer_sync(&local->client_conn_reap_timer); + + while ((conn = list_first_entry_or_null(&local->idle_client_conns, + struct rxrpc_connection, cache_link))) { + list_del_init(&conn->cache_link); + atomic_dec(&conn->active); + trace_rxrpc_client(conn, -1, rxrpc_client_discard); + rxrpc_unbundle_conn(conn); + rxrpc_put_connection(conn, rxrpc_conn_put_local_dead); + } + + _leave(" [culled]"); +} |