diff options
author | 2023-02-21 18:24:12 -0800 | |
---|---|---|
committer | 2023-02-21 18:24:12 -0800 | |
commit | 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 (patch) | |
tree | cc5c2d0a898769fd59549594fedb3ee6f84e59a0 /net/rxrpc/sendmsg.c | |
download | linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.tar.gz linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.zip |
Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextgrafted
Pull networking updates from Jakub Kicinski:
"Core:
- Add dedicated kmem_cache for typical/small skb->head, avoid having
to access struct page at kfree time, and improve memory use.
- Introduce sysctl to set default RPS configuration for new netdevs.
- Define Netlink protocol specification format which can be used to
describe messages used by each family and auto-generate parsers.
Add tools for generating kernel data structures and uAPI headers.
- Expose all net/core sysctls inside netns.
- Remove 4s sleep in netpoll if carrier is instantly detected on
boot.
- Add configurable limit of MDB entries per port, and port-vlan.
- Continue populating drop reasons throughout the stack.
- Retire a handful of legacy Qdiscs and classifiers.
Protocols:
- Support IPv4 big TCP (TSO frames larger than 64kB).
- Add IP_LOCAL_PORT_RANGE socket option, to control local port range
on socket by socket basis.
- Track and report in procfs number of MPTCP sockets used.
- Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path
manager.
- IPv6: don't check net.ipv6.route.max_size and rely on garbage
collection to free memory (similarly to IPv4).
- Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).
- ICMP: add per-rate limit counters.
- Add support for user scanning requests in ieee802154.
- Remove static WEP support.
- Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
reporting.
- WiFi 7 EHT channel puncturing support (client & AP).
BPF:
- Add a rbtree data structure following the "next-gen data structure"
precedent set by recently added linked list, that is, by using
kfunc + kptr instead of adding a new BPF map type.
- Expose XDP hints via kfuncs with initial support for RX hash and
timestamp metadata.
- Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to
better support decap on GRE tunnel devices not operating in collect
metadata.
- Improve x86 JIT's codegen for PROBE_MEM runtime error checks.
- Remove the need for trace_printk_lock for bpf_trace_printk and
bpf_trace_vprintk helpers.
- Extend libbpf's bpf_tracing.h support for tracing arguments of
kprobes/uprobes and syscall as a special case.
- Significantly reduce the search time for module symbols by
livepatch and BPF.
- Enable cpumasks to be used as kptrs, which is useful for tracing
programs tracking which tasks end up running on which CPUs in
different time intervals.
- Add support for BPF trampoline on s390x and riscv64.
- Add capability to export the XDP features supported by the NIC.
- Add __bpf_kfunc tag for marking kernel functions as kfuncs.
- Add cgroup.memory=nobpf kernel parameter option to disable BPF
memory accounting for container environments.
Netfilter:
- Remove the CLUSTERIP target. It has been marked as obsolete for
years, and we still have WARN splats wrt races of the out-of-band
/proc interface installed by this target.
- Add 'destroy' commands to nf_tables. They are identical to the
existing 'delete' commands, but do not return an error if the
referenced object (set, chain, rule...) did not exist.
Driver API:
- Improve cpumask_local_spread() locality to help NICs set the right
IRQ affinity on AMD platforms.
- Separate C22 and C45 MDIO bus transactions more clearly.
- Introduce new DCB table to control DSCP rewrite on egress.
- Support configuration of Physical Layer Collision Avoidance (PLCA)
Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
shared medium Ethernet.
- Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
preemption of low priority frames by high priority frames.
- Add support for controlling MACSec offload using netlink SET.
- Rework devlink instance refcounts to allow registration and
de-registration under the instance lock. Split the code into
multiple files, drop some of the unnecessarily granular locks and
factor out common parts of netlink operation handling.
- Add TX frame aggregation parameters (for USB drivers).
- Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
messages with notifications for debug.
- Allow offloading of UDP NEW connections via act_ct.
- Add support for per action HW stats in TC.
- Support hardware miss to TC action (continue processing in SW from
a specific point in the action chain).
- Warn if old Wireless Extension user space interface is used with
modern cfg80211/mac80211 drivers. Do not support Wireless
Extensions for Wi-Fi 7 devices at all. Everyone should switch to
using nl80211 interface instead.
- Improve the CAN bit timing configuration. Use extack to return
error messages directly to user space, update the SJW handling,
including the definition of a new default value that will benefit
CAN-FD controllers, by increasing their oscillator tolerance.
New hardware / drivers:
- Ethernet:
- nVidia BlueField-3 support (control traffic driver)
- Ethernet support for imx93 SoCs
- Motorcomm yt8531 gigabit Ethernet PHY
- onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
- Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
- Amlogic gxl MDIO mux
- WiFi:
- RealTek RTL8188EU (rtl8xxxu)
- Qualcomm Wi-Fi 7 devices (ath12k)
- CAN:
- Renesas R-Car V4H
Drivers:
- Bluetooth:
- Set Per Platform Antenna Gain (PPAG) for Intel controllers.
- Ethernet NICs:
- Intel (1G, igc):
- support TSN / Qbv / packet scheduling features of i226 model
- Intel (100G, ice):
- use GNSS subsystem instead of TTY
- multi-buffer XDP support
- extend support for GPIO pins to E823 devices
- nVidia/Mellanox:
- update the shared buffer configuration on PFC commands
- implement PTP adjphase function for HW offset control
- TC support for Geneve and GRE with VF tunnel offload
- more efficient crypto key management method
- multi-port eswitch support
- Netronome/Corigine:
- add DCB IEEE support
- support IPsec offloading for NFP3800
- Freescale/NXP (enetc):
- support XDP_REDIRECT for XDP non-linear buffers
- improve reconfig, avoid link flap and waiting for idle
- support MAC Merge layer
- Other NICs:
- sfc/ef100: add basic devlink support for ef100
- ionic: rx_push mode operation (writing descriptors via MMIO)
- bnxt: use the auxiliary bus abstraction for RDMA
- r8169: disable ASPM and reset bus in case of tx timeout
- cpsw: support QSGMII mode for J721e CPSW9G
- cpts: support pulse-per-second output
- ngbe: add an mdio bus driver
- usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
- r8152: handle devices with FW with NCM support
- amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
- virtio-net: support multi buffer XDP
- virtio/vsock: replace virtio_vsock_pkt with sk_buff
- tsnep: XDP support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add support for latency TLV (in FW control messages)
- Microchip (sparx5):
- separate explicit and implicit traffic forwarding rules, make
the implicit rules always active
- add support for egress DSCP rewrite
- IS0 VCAP support (Ingress Classification)
- IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS
etc.)
- ES2 VCAP support (Egress Access Control)
- support for Per-Stream Filtering and Policing (802.1Q,
8.6.5.1)
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- add MAB (port auth) offload support
- enable PTP receive for mv88e6390
- NXP (ocelot):
- support MAC Merge layer
- support for the the vsc7512 internal copper phys
- Microchip:
- lan9303: convert to PHYLINK
- lan966x: support TC flower filter statistics
- lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
- lan937x: support Credit Based Shaper configuration
- ksz9477: support Energy Efficient Ethernet
- other:
- qca8k: convert to regmap read/write API, use bulk operations
- rswitch: Improve TX timestamp accuracy
- Intel WiFi (iwlwifi):
- EHT (Wi-Fi 7) rate reporting
- STEP equalizer support: transfer some STEP (connection to radio
on platforms with integrated wifi) related parameters from the
BIOS to the firmware.
- Qualcomm 802.11ax WiFi (ath11k):
- IPQ5018 support
- Fine Timing Measurement (FTM) responder role support
- channel 177 support
- MediaTek WiFi (mt76):
- per-PHY LED support
- mt7996: EHT (Wi-Fi 7) support
- Wireless Ethernet Dispatch (WED) reset support
- switch to using page pool allocator
- RealTek WiFi (rtw89):
- support new version of Bluetooth co-existance
- Mobile:
- rmnet: support TX aggregation"
* tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits)
page_pool: add a comment explaining the fragment counter usage
net: ethtool: fix __ethtool_dev_mm_supported() implementation
ethtool: pse-pd: Fix double word in comments
xsk: add linux/vmalloc.h to xsk.c
sefltests: netdevsim: wait for devlink instance after netns removal
selftest: fib_tests: Always cleanup before exit
net/mlx5e: Align IPsec ASO result memory to be as required by hardware
net/mlx5e: TC, Set CT miss to the specific ct action instance
net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
net/mlx5: Refactor tc miss handling to a single function
net/mlx5: Kconfig: Make tc offload depend on tc skb extension
net/sched: flower: Support hardware miss to tc action
net/sched: flower: Move filter handle initialization earlier
net/sched: cls_api: Support hardware miss to tc action
net/sched: Rename user cookie and act cookie
sfc: fix builds without CONFIG_RTC_LIB
sfc: clean up some inconsistent indentings
net/mlx4_en: Introduce flexible array to silence overflow warning
net: lan966x: Fix possible deadlock inside PTP
net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
...
Diffstat (limited to 'net/rxrpc/sendmsg.c')
-rw-r--r-- | net/rxrpc/sendmsg.c | 824 |
1 files changed, 824 insertions, 0 deletions
diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c new file mode 100644 index 000000000..da49fcf1c --- /dev/null +++ b/net/rxrpc/sendmsg.c @@ -0,0 +1,824 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* AF_RXRPC sendmsg() implementation. + * + * Copyright (C) 2007, 2016 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/net.h> +#include <linux/gfp.h> +#include <linux/skbuff.h> +#include <linux/export.h> +#include <linux/sched/signal.h> + +#include <net/sock.h> +#include <net/af_rxrpc.h> +#include "ar-internal.h" + +/* + * Propose an abort to be made in the I/O thread. + */ +bool rxrpc_propose_abort(struct rxrpc_call *call, s32 abort_code, int error, + enum rxrpc_abort_reason why) +{ + _enter("{%d},%d,%d,%u", call->debug_id, abort_code, error, why); + + if (!call->send_abort && !rxrpc_call_is_complete(call)) { + call->send_abort_why = why; + call->send_abort_err = error; + call->send_abort_seq = 0; + /* Request abort locklessly vs rxrpc_input_call_event(). */ + smp_store_release(&call->send_abort, abort_code); + rxrpc_poke_call(call, rxrpc_call_poke_abort); + return true; + } + + return false; +} + +/* + * Wait for a call to become connected. Interruption here doesn't cause the + * call to be aborted. + */ +static int rxrpc_wait_to_be_connected(struct rxrpc_call *call, long *timeo) +{ + DECLARE_WAITQUEUE(myself, current); + int ret = 0; + + _enter("%d", call->debug_id); + + if (rxrpc_call_state(call) != RXRPC_CALL_CLIENT_AWAIT_CONN) + return call->error; + + add_wait_queue_exclusive(&call->waitq, &myself); + + for (;;) { + ret = call->error; + if (ret < 0) + break; + + switch (call->interruptibility) { + case RXRPC_INTERRUPTIBLE: + case RXRPC_PREINTERRUPTIBLE: + set_current_state(TASK_INTERRUPTIBLE); + break; + case RXRPC_UNINTERRUPTIBLE: + default: + set_current_state(TASK_UNINTERRUPTIBLE); + break; + } + if (rxrpc_call_state(call) != RXRPC_CALL_CLIENT_AWAIT_CONN) { + ret = call->error; + break; + } + if ((call->interruptibility == RXRPC_INTERRUPTIBLE || + call->interruptibility == RXRPC_PREINTERRUPTIBLE) && + signal_pending(current)) { + ret = sock_intr_errno(*timeo); + break; + } + *timeo = schedule_timeout(*timeo); + } + + remove_wait_queue(&call->waitq, &myself); + __set_current_state(TASK_RUNNING); + + if (ret == 0 && rxrpc_call_is_complete(call)) + ret = call->error; + + _leave(" = %d", ret); + return ret; +} + +/* + * Return true if there's sufficient Tx queue space. + */ +static bool rxrpc_check_tx_space(struct rxrpc_call *call, rxrpc_seq_t *_tx_win) +{ + if (_tx_win) + *_tx_win = call->tx_bottom; + return call->tx_prepared - call->tx_bottom < 256; +} + +/* + * Wait for space to appear in the Tx queue or a signal to occur. + */ +static int rxrpc_wait_for_tx_window_intr(struct rxrpc_sock *rx, + struct rxrpc_call *call, + long *timeo) +{ + for (;;) { + set_current_state(TASK_INTERRUPTIBLE); + if (rxrpc_check_tx_space(call, NULL)) + return 0; + + if (rxrpc_call_is_complete(call)) + return call->error; + + if (signal_pending(current)) + return sock_intr_errno(*timeo); + + trace_rxrpc_txqueue(call, rxrpc_txqueue_wait); + *timeo = schedule_timeout(*timeo); + } +} + +/* + * Wait for space to appear in the Tx queue uninterruptibly, but with + * a timeout of 2*RTT if no progress was made and a signal occurred. + */ +static int rxrpc_wait_for_tx_window_waitall(struct rxrpc_sock *rx, + struct rxrpc_call *call) +{ + rxrpc_seq_t tx_start, tx_win; + signed long rtt, timeout; + + rtt = READ_ONCE(call->peer->srtt_us) >> 3; + rtt = usecs_to_jiffies(rtt) * 2; + if (rtt < 2) + rtt = 2; + + timeout = rtt; + tx_start = smp_load_acquire(&call->acks_hard_ack); + + for (;;) { + set_current_state(TASK_UNINTERRUPTIBLE); + + if (rxrpc_check_tx_space(call, &tx_win)) + return 0; + + if (rxrpc_call_is_complete(call)) + return call->error; + + if (timeout == 0 && + tx_win == tx_start && signal_pending(current)) + return -EINTR; + + if (tx_win != tx_start) { + timeout = rtt; + tx_start = tx_win; + } + + trace_rxrpc_txqueue(call, rxrpc_txqueue_wait); + timeout = schedule_timeout(timeout); + } +} + +/* + * Wait for space to appear in the Tx queue uninterruptibly. + */ +static int rxrpc_wait_for_tx_window_nonintr(struct rxrpc_sock *rx, + struct rxrpc_call *call, + long *timeo) +{ + for (;;) { + set_current_state(TASK_UNINTERRUPTIBLE); + if (rxrpc_check_tx_space(call, NULL)) + return 0; + + if (rxrpc_call_is_complete(call)) + return call->error; + + trace_rxrpc_txqueue(call, rxrpc_txqueue_wait); + *timeo = schedule_timeout(*timeo); + } +} + +/* + * wait for space to appear in the transmit/ACK window + * - caller holds the socket locked + */ +static int rxrpc_wait_for_tx_window(struct rxrpc_sock *rx, + struct rxrpc_call *call, + long *timeo, + bool waitall) +{ + DECLARE_WAITQUEUE(myself, current); + int ret; + + _enter(",{%u,%u,%u,%u}", + call->tx_bottom, call->acks_hard_ack, call->tx_top, call->tx_winsize); + + add_wait_queue(&call->waitq, &myself); + + switch (call->interruptibility) { + case RXRPC_INTERRUPTIBLE: + if (waitall) + ret = rxrpc_wait_for_tx_window_waitall(rx, call); + else + ret = rxrpc_wait_for_tx_window_intr(rx, call, timeo); + break; + case RXRPC_PREINTERRUPTIBLE: + case RXRPC_UNINTERRUPTIBLE: + default: + ret = rxrpc_wait_for_tx_window_nonintr(rx, call, timeo); + break; + } + + remove_wait_queue(&call->waitq, &myself); + set_current_state(TASK_RUNNING); + _leave(" = %d", ret); + return ret; +} + +/* + * Notify the owner of the call that the transmit phase is ended and the last + * packet has been queued. + */ +static void rxrpc_notify_end_tx(struct rxrpc_sock *rx, struct rxrpc_call *call, + rxrpc_notify_end_tx_t notify_end_tx) +{ + if (notify_end_tx) + notify_end_tx(&rx->sk, call, call->user_call_ID); +} + +/* + * Queue a DATA packet for transmission, set the resend timeout and send + * the packet immediately. Returns the error from rxrpc_send_data_packet() + * in case the caller wants to do something with it. + */ +static void rxrpc_queue_packet(struct rxrpc_sock *rx, struct rxrpc_call *call, + struct rxrpc_txbuf *txb, + rxrpc_notify_end_tx_t notify_end_tx) +{ + rxrpc_seq_t seq = txb->seq; + bool last = test_bit(RXRPC_TXBUF_LAST, &txb->flags), poke; + + rxrpc_inc_stat(call->rxnet, stat_tx_data); + + ASSERTCMP(txb->seq, ==, call->tx_prepared + 1); + + /* We have to set the timestamp before queueing as the retransmit + * algorithm can see the packet as soon as we queue it. + */ + txb->last_sent = ktime_get_real(); + + if (last) + trace_rxrpc_txqueue(call, rxrpc_txqueue_queue_last); + else + trace_rxrpc_txqueue(call, rxrpc_txqueue_queue); + + /* Add the packet to the call's output buffer */ + spin_lock(&call->tx_lock); + poke = list_empty(&call->tx_sendmsg); + list_add_tail(&txb->call_link, &call->tx_sendmsg); + call->tx_prepared = seq; + if (last) + rxrpc_notify_end_tx(rx, call, notify_end_tx); + spin_unlock(&call->tx_lock); + + if (poke) + rxrpc_poke_call(call, rxrpc_call_poke_start); +} + +/* + * send data through a socket + * - must be called in process context + * - The caller holds the call user access mutex, but not the socket lock. + */ +static int rxrpc_send_data(struct rxrpc_sock *rx, + struct rxrpc_call *call, + struct msghdr *msg, size_t len, + rxrpc_notify_end_tx_t notify_end_tx, + bool *_dropped_lock) +{ + struct rxrpc_txbuf *txb; + struct sock *sk = &rx->sk; + enum rxrpc_call_state state; + long timeo; + bool more = msg->msg_flags & MSG_MORE; + int ret, copied = 0; + + timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT); + + ret = rxrpc_wait_to_be_connected(call, &timeo); + if (ret < 0) + return ret; + + if (call->conn->state == RXRPC_CONN_CLIENT_UNSECURED) { + ret = rxrpc_init_client_conn_security(call->conn); + if (ret < 0) + return ret; + } + + /* this should be in poll */ + sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk); + +reload: + ret = -EPIPE; + if (sk->sk_shutdown & SEND_SHUTDOWN) + goto maybe_error; + state = rxrpc_call_state(call); + ret = -ESHUTDOWN; + if (state >= RXRPC_CALL_COMPLETE) + goto maybe_error; + ret = -EPROTO; + if (state != RXRPC_CALL_CLIENT_SEND_REQUEST && + state != RXRPC_CALL_SERVER_ACK_REQUEST && + state != RXRPC_CALL_SERVER_SEND_REPLY) { + /* Request phase complete for this client call */ + trace_rxrpc_abort(call->debug_id, rxrpc_sendmsg_late_send, + call->cid, call->call_id, call->rx_consumed, + 0, -EPROTO); + goto maybe_error; + } + + ret = -EMSGSIZE; + if (call->tx_total_len != -1) { + if (len - copied > call->tx_total_len) + goto maybe_error; + if (!more && len - copied != call->tx_total_len) + goto maybe_error; + } + + txb = call->tx_pending; + call->tx_pending = NULL; + if (txb) + rxrpc_see_txbuf(txb, rxrpc_txbuf_see_send_more); + + do { + if (!txb) { + size_t remain, bufsize, chunk, offset; + + _debug("alloc"); + + if (!rxrpc_check_tx_space(call, NULL)) + goto wait_for_space; + + /* Work out the maximum size of a packet. Assume that + * the security header is going to be in the padded + * region (enc blocksize), but the trailer is not. + */ + remain = more ? INT_MAX : msg_data_left(msg); + ret = call->conn->security->how_much_data(call, remain, + &bufsize, &chunk, &offset); + if (ret < 0) + goto maybe_error; + + _debug("SIZE: %zu/%zu @%zu", chunk, bufsize, offset); + + /* create a buffer that we can retain until it's ACK'd */ + ret = -ENOMEM; + txb = rxrpc_alloc_txbuf(call, RXRPC_PACKET_TYPE_DATA, + GFP_KERNEL); + if (!txb) + goto maybe_error; + + txb->offset = offset; + txb->space -= offset; + txb->space = min_t(size_t, chunk, txb->space); + } + + _debug("append"); + + /* append next segment of data to the current buffer */ + if (msg_data_left(msg) > 0) { + size_t copy = min_t(size_t, txb->space, msg_data_left(msg)); + + _debug("add %zu", copy); + if (!copy_from_iter_full(txb->data + txb->offset, copy, + &msg->msg_iter)) + goto efault; + _debug("added"); + txb->space -= copy; + txb->len += copy; + txb->offset += copy; + copied += copy; + if (call->tx_total_len != -1) + call->tx_total_len -= copy; + } + + /* check for the far side aborting the call or a network error + * occurring */ + if (rxrpc_call_is_complete(call)) + goto call_terminated; + + /* add the packet to the send queue if it's now full */ + if (!txb->space || + (msg_data_left(msg) == 0 && !more)) { + if (msg_data_left(msg) == 0 && !more) { + txb->wire.flags |= RXRPC_LAST_PACKET; + __set_bit(RXRPC_TXBUF_LAST, &txb->flags); + } + else if (call->tx_top - call->acks_hard_ack < + call->tx_winsize) + txb->wire.flags |= RXRPC_MORE_PACKETS; + + ret = call->security->secure_packet(call, txb); + if (ret < 0) + goto out; + + rxrpc_queue_packet(rx, call, txb, notify_end_tx); + txb = NULL; + } + } while (msg_data_left(msg) > 0); + +success: + ret = copied; + if (rxrpc_call_is_complete(call) && + call->error < 0) + ret = call->error; +out: + call->tx_pending = txb; + _leave(" = %d", ret); + return ret; + +call_terminated: + rxrpc_put_txbuf(txb, rxrpc_txbuf_put_send_aborted); + _leave(" = %d", call->error); + return call->error; + +maybe_error: + if (copied) + goto success; + goto out; + +efault: + ret = -EFAULT; + goto out; + +wait_for_space: + ret = -EAGAIN; + if (msg->msg_flags & MSG_DONTWAIT) + goto maybe_error; + mutex_unlock(&call->user_mutex); + *_dropped_lock = true; + ret = rxrpc_wait_for_tx_window(rx, call, &timeo, + msg->msg_flags & MSG_WAITALL); + if (ret < 0) + goto maybe_error; + if (call->interruptibility == RXRPC_INTERRUPTIBLE) { + if (mutex_lock_interruptible(&call->user_mutex) < 0) { + ret = sock_intr_errno(timeo); + goto maybe_error; + } + } else { + mutex_lock(&call->user_mutex); + } + *_dropped_lock = false; + goto reload; +} + +/* + * extract control messages from the sendmsg() control buffer + */ +static int rxrpc_sendmsg_cmsg(struct msghdr *msg, struct rxrpc_send_params *p) +{ + struct cmsghdr *cmsg; + bool got_user_ID = false; + int len; + + if (msg->msg_controllen == 0) + return -EINVAL; + + for_each_cmsghdr(cmsg, msg) { + if (!CMSG_OK(msg, cmsg)) + return -EINVAL; + + len = cmsg->cmsg_len - sizeof(struct cmsghdr); + _debug("CMSG %d, %d, %d", + cmsg->cmsg_level, cmsg->cmsg_type, len); + + if (cmsg->cmsg_level != SOL_RXRPC) + continue; + + switch (cmsg->cmsg_type) { + case RXRPC_USER_CALL_ID: + if (msg->msg_flags & MSG_CMSG_COMPAT) { + if (len != sizeof(u32)) + return -EINVAL; + p->call.user_call_ID = *(u32 *)CMSG_DATA(cmsg); + } else { + if (len != sizeof(unsigned long)) + return -EINVAL; + p->call.user_call_ID = *(unsigned long *) + CMSG_DATA(cmsg); + } + got_user_ID = true; + break; + + case RXRPC_ABORT: + if (p->command != RXRPC_CMD_SEND_DATA) + return -EINVAL; + p->command = RXRPC_CMD_SEND_ABORT; + if (len != sizeof(p->abort_code)) + return -EINVAL; + p->abort_code = *(unsigned int *)CMSG_DATA(cmsg); + if (p->abort_code == 0) + return -EINVAL; + break; + + case RXRPC_CHARGE_ACCEPT: + if (p->command != RXRPC_CMD_SEND_DATA) + return -EINVAL; + p->command = RXRPC_CMD_CHARGE_ACCEPT; + if (len != 0) + return -EINVAL; + break; + + case RXRPC_EXCLUSIVE_CALL: + p->exclusive = true; + if (len != 0) + return -EINVAL; + break; + + case RXRPC_UPGRADE_SERVICE: + p->upgrade = true; + if (len != 0) + return -EINVAL; + break; + + case RXRPC_TX_LENGTH: + if (p->call.tx_total_len != -1 || len != sizeof(__s64)) + return -EINVAL; + p->call.tx_total_len = *(__s64 *)CMSG_DATA(cmsg); + if (p->call.tx_total_len < 0) + return -EINVAL; + break; + + case RXRPC_SET_CALL_TIMEOUT: + if (len & 3 || len < 4 || len > 12) + return -EINVAL; + memcpy(&p->call.timeouts, CMSG_DATA(cmsg), len); + p->call.nr_timeouts = len / 4; + if (p->call.timeouts.hard > INT_MAX / HZ) + return -ERANGE; + if (p->call.nr_timeouts >= 2 && p->call.timeouts.idle > 60 * 60 * 1000) + return -ERANGE; + if (p->call.nr_timeouts >= 3 && p->call.timeouts.normal > 60 * 60 * 1000) + return -ERANGE; + break; + + default: + return -EINVAL; + } + } + + if (!got_user_ID) + return -EINVAL; + if (p->call.tx_total_len != -1 && p->command != RXRPC_CMD_SEND_DATA) + return -EINVAL; + _leave(" = 0"); + return 0; +} + +/* + * Create a new client call for sendmsg(). + * - Called with the socket lock held, which it must release. + * - If it returns a call, the call's lock will need releasing by the caller. + */ +static struct rxrpc_call * +rxrpc_new_client_call_for_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg, + struct rxrpc_send_params *p) + __releases(&rx->sk.sk_lock.slock) + __acquires(&call->user_mutex) +{ + struct rxrpc_conn_parameters cp; + struct rxrpc_call *call; + struct key *key; + + DECLARE_SOCKADDR(struct sockaddr_rxrpc *, srx, msg->msg_name); + + _enter(""); + + if (!msg->msg_name) { + release_sock(&rx->sk); + return ERR_PTR(-EDESTADDRREQ); + } + + key = rx->key; + if (key && !rx->key->payload.data[0]) + key = NULL; + + memset(&cp, 0, sizeof(cp)); + cp.local = rx->local; + cp.key = rx->key; + cp.security_level = rx->min_sec_level; + cp.exclusive = rx->exclusive | p->exclusive; + cp.upgrade = p->upgrade; + cp.service_id = srx->srx_service; + call = rxrpc_new_client_call(rx, &cp, srx, &p->call, GFP_KERNEL, + atomic_inc_return(&rxrpc_debug_id)); + /* The socket is now unlocked */ + + _leave(" = %p\n", call); + return call; +} + +/* + * send a message forming part of a client call through an RxRPC socket + * - caller holds the socket locked + * - the socket may be either a client socket or a server socket + */ +int rxrpc_do_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg, size_t len) + __releases(&rx->sk.sk_lock.slock) +{ + struct rxrpc_call *call; + unsigned long now, j; + bool dropped_lock = false; + int ret; + + struct rxrpc_send_params p = { + .call.tx_total_len = -1, + .call.user_call_ID = 0, + .call.nr_timeouts = 0, + .call.interruptibility = RXRPC_INTERRUPTIBLE, + .abort_code = 0, + .command = RXRPC_CMD_SEND_DATA, + .exclusive = false, + .upgrade = false, + }; + + _enter(""); + + ret = rxrpc_sendmsg_cmsg(msg, &p); + if (ret < 0) + goto error_release_sock; + + if (p.command == RXRPC_CMD_CHARGE_ACCEPT) { + ret = -EINVAL; + if (rx->sk.sk_state != RXRPC_SERVER_LISTENING) + goto error_release_sock; + ret = rxrpc_user_charge_accept(rx, p.call.user_call_ID); + goto error_release_sock; + } + + call = rxrpc_find_call_by_user_ID(rx, p.call.user_call_ID); + if (!call) { + ret = -EBADSLT; + if (p.command != RXRPC_CMD_SEND_DATA) + goto error_release_sock; + call = rxrpc_new_client_call_for_sendmsg(rx, msg, &p); + /* The socket is now unlocked... */ + if (IS_ERR(call)) + return PTR_ERR(call); + /* ... and we have the call lock. */ + ret = 0; + if (rxrpc_call_is_complete(call)) + goto out_put_unlock; + } else { + switch (rxrpc_call_state(call)) { + case RXRPC_CALL_UNINITIALISED: + case RXRPC_CALL_CLIENT_AWAIT_CONN: + case RXRPC_CALL_SERVER_PREALLOC: + case RXRPC_CALL_SERVER_SECURING: + rxrpc_put_call(call, rxrpc_call_put_sendmsg); + ret = -EBUSY; + goto error_release_sock; + default: + break; + } + + ret = mutex_lock_interruptible(&call->user_mutex); + release_sock(&rx->sk); + if (ret < 0) { + ret = -ERESTARTSYS; + goto error_put; + } + + if (p.call.tx_total_len != -1) { + ret = -EINVAL; + if (call->tx_total_len != -1 || + call->tx_pending || + call->tx_top != 0) + goto out_put_unlock; + call->tx_total_len = p.call.tx_total_len; + } + } + + switch (p.call.nr_timeouts) { + case 3: + j = msecs_to_jiffies(p.call.timeouts.normal); + if (p.call.timeouts.normal > 0 && j == 0) + j = 1; + WRITE_ONCE(call->next_rx_timo, j); + fallthrough; + case 2: + j = msecs_to_jiffies(p.call.timeouts.idle); + if (p.call.timeouts.idle > 0 && j == 0) + j = 1; + WRITE_ONCE(call->next_req_timo, j); + fallthrough; + case 1: + if (p.call.timeouts.hard > 0) { + j = msecs_to_jiffies(p.call.timeouts.hard); + now = jiffies; + j += now; + WRITE_ONCE(call->expect_term_by, j); + rxrpc_reduce_call_timer(call, j, now, + rxrpc_timer_set_for_hard); + } + break; + } + + if (rxrpc_call_is_complete(call)) { + /* it's too late for this call */ + ret = -ESHUTDOWN; + } else if (p.command == RXRPC_CMD_SEND_ABORT) { + rxrpc_propose_abort(call, p.abort_code, -ECONNABORTED, + rxrpc_abort_call_sendmsg); + ret = 0; + } else if (p.command != RXRPC_CMD_SEND_DATA) { + ret = -EINVAL; + } else { + ret = rxrpc_send_data(rx, call, msg, len, NULL, &dropped_lock); + } + +out_put_unlock: + if (!dropped_lock) + mutex_unlock(&call->user_mutex); +error_put: + rxrpc_put_call(call, rxrpc_call_put_sendmsg); + _leave(" = %d", ret); + return ret; + +error_release_sock: + release_sock(&rx->sk); + return ret; +} + +/** + * rxrpc_kernel_send_data - Allow a kernel service to send data on a call + * @sock: The socket the call is on + * @call: The call to send data through + * @msg: The data to send + * @len: The amount of data to send + * @notify_end_tx: Notification that the last packet is queued. + * + * Allow a kernel service to send data on a call. The call must be in an state + * appropriate to sending data. No control data should be supplied in @msg, + * nor should an address be supplied. MSG_MORE should be flagged if there's + * more data to come, otherwise this data will end the transmission phase. + */ +int rxrpc_kernel_send_data(struct socket *sock, struct rxrpc_call *call, + struct msghdr *msg, size_t len, + rxrpc_notify_end_tx_t notify_end_tx) +{ + bool dropped_lock = false; + int ret; + + _enter("{%d},", call->debug_id); + + ASSERTCMP(msg->msg_name, ==, NULL); + ASSERTCMP(msg->msg_control, ==, NULL); + + mutex_lock(&call->user_mutex); + + ret = rxrpc_send_data(rxrpc_sk(sock->sk), call, msg, len, + notify_end_tx, &dropped_lock); + if (ret == -ESHUTDOWN) + ret = call->error; + + if (!dropped_lock) + mutex_unlock(&call->user_mutex); + _leave(" = %d", ret); + return ret; +} +EXPORT_SYMBOL(rxrpc_kernel_send_data); + +/** + * rxrpc_kernel_abort_call - Allow a kernel service to abort a call + * @sock: The socket the call is on + * @call: The call to be aborted + * @abort_code: The abort code to stick into the ABORT packet + * @error: Local error value + * @why: Indication as to why. + * + * Allow a kernel service to abort a call, if it's still in an abortable state + * and return true if the call was aborted, false if it was already complete. + */ +bool rxrpc_kernel_abort_call(struct socket *sock, struct rxrpc_call *call, + u32 abort_code, int error, enum rxrpc_abort_reason why) +{ + bool aborted; + + _enter("{%d},%d,%d,%u", call->debug_id, abort_code, error, why); + + mutex_lock(&call->user_mutex); + aborted = rxrpc_propose_abort(call, abort_code, error, why); + mutex_unlock(&call->user_mutex); + return aborted; +} +EXPORT_SYMBOL(rxrpc_kernel_abort_call); + +/** + * rxrpc_kernel_set_tx_length - Set the total Tx length on a call + * @sock: The socket the call is on + * @call: The call to be informed + * @tx_total_len: The amount of data to be transmitted for this call + * + * Allow a kernel service to set the total transmit length on a call. This + * allows buffer-to-packet encrypt-and-copy to be performed. + * + * This function is primarily for use for setting the reply length since the + * request length can be set when beginning the call. + */ +void rxrpc_kernel_set_tx_length(struct socket *sock, struct rxrpc_call *call, + s64 tx_total_len) +{ + WARN_ON(call->tx_total_len != -1); + call->tx_total_len = tx_total_len; +} +EXPORT_SYMBOL(rxrpc_kernel_set_tx_length); |