aboutsummaryrefslogtreecommitdiff
path: root/net/sctp/output.c
diff options
context:
space:
mode:
authorLibravatar Linus Torvalds <torvalds@linux-foundation.org>2023-02-21 18:24:12 -0800
committerLibravatar Linus Torvalds <torvalds@linux-foundation.org>2023-02-21 18:24:12 -0800
commit5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 (patch)
treecc5c2d0a898769fd59549594fedb3ee6f84e59a0 /net/sctp/output.c
downloadlinux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.tar.gz
linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.zip
Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextgrafted
Pull networking updates from Jakub Kicinski: "Core: - Add dedicated kmem_cache for typical/small skb->head, avoid having to access struct page at kfree time, and improve memory use. - Introduce sysctl to set default RPS configuration for new netdevs. - Define Netlink protocol specification format which can be used to describe messages used by each family and auto-generate parsers. Add tools for generating kernel data structures and uAPI headers. - Expose all net/core sysctls inside netns. - Remove 4s sleep in netpoll if carrier is instantly detected on boot. - Add configurable limit of MDB entries per port, and port-vlan. - Continue populating drop reasons throughout the stack. - Retire a handful of legacy Qdiscs and classifiers. Protocols: - Support IPv4 big TCP (TSO frames larger than 64kB). - Add IP_LOCAL_PORT_RANGE socket option, to control local port range on socket by socket basis. - Track and report in procfs number of MPTCP sockets used. - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path manager. - IPv6: don't check net.ipv6.route.max_size and rely on garbage collection to free memory (similarly to IPv4). - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986). - ICMP: add per-rate limit counters. - Add support for user scanning requests in ieee802154. - Remove static WEP support. - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate reporting. - WiFi 7 EHT channel puncturing support (client & AP). BPF: - Add a rbtree data structure following the "next-gen data structure" precedent set by recently added linked list, that is, by using kfunc + kptr instead of adding a new BPF map type. - Expose XDP hints via kfuncs with initial support for RX hash and timestamp metadata. - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better support decap on GRE tunnel devices not operating in collect metadata. - Improve x86 JIT's codegen for PROBE_MEM runtime error checks. - Remove the need for trace_printk_lock for bpf_trace_printk and bpf_trace_vprintk helpers. - Extend libbpf's bpf_tracing.h support for tracing arguments of kprobes/uprobes and syscall as a special case. - Significantly reduce the search time for module symbols by livepatch and BPF. - Enable cpumasks to be used as kptrs, which is useful for tracing programs tracking which tasks end up running on which CPUs in different time intervals. - Add support for BPF trampoline on s390x and riscv64. - Add capability to export the XDP features supported by the NIC. - Add __bpf_kfunc tag for marking kernel functions as kfuncs. - Add cgroup.memory=nobpf kernel parameter option to disable BPF memory accounting for container environments. Netfilter: - Remove the CLUSTERIP target. It has been marked as obsolete for years, and we still have WARN splats wrt races of the out-of-band /proc interface installed by this target. - Add 'destroy' commands to nf_tables. They are identical to the existing 'delete' commands, but do not return an error if the referenced object (set, chain, rule...) did not exist. Driver API: - Improve cpumask_local_spread() locality to help NICs set the right IRQ affinity on AMD platforms. - Separate C22 and C45 MDIO bus transactions more clearly. - Introduce new DCB table to control DSCP rewrite on egress. - Support configuration of Physical Layer Collision Avoidance (PLCA) Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of shared medium Ethernet. - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing preemption of low priority frames by high priority frames. - Add support for controlling MACSec offload using netlink SET. - Rework devlink instance refcounts to allow registration and de-registration under the instance lock. Split the code into multiple files, drop some of the unnecessarily granular locks and factor out common parts of netlink operation handling. - Add TX frame aggregation parameters (for USB drivers). - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning messages with notifications for debug. - Allow offloading of UDP NEW connections via act_ct. - Add support for per action HW stats in TC. - Support hardware miss to TC action (continue processing in SW from a specific point in the action chain). - Warn if old Wireless Extension user space interface is used with modern cfg80211/mac80211 drivers. Do not support Wireless Extensions for Wi-Fi 7 devices at all. Everyone should switch to using nl80211 interface instead. - Improve the CAN bit timing configuration. Use extack to return error messages directly to user space, update the SJW handling, including the definition of a new default value that will benefit CAN-FD controllers, by increasing their oscillator tolerance. New hardware / drivers: - Ethernet: - nVidia BlueField-3 support (control traffic driver) - Ethernet support for imx93 SoCs - Motorcomm yt8531 gigabit Ethernet PHY - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA) - Microchip LAN8841 PHY (incl. cable diagnostics and PTP) - Amlogic gxl MDIO mux - WiFi: - RealTek RTL8188EU (rtl8xxxu) - Qualcomm Wi-Fi 7 devices (ath12k) - CAN: - Renesas R-Car V4H Drivers: - Bluetooth: - Set Per Platform Antenna Gain (PPAG) for Intel controllers. - Ethernet NICs: - Intel (1G, igc): - support TSN / Qbv / packet scheduling features of i226 model - Intel (100G, ice): - use GNSS subsystem instead of TTY - multi-buffer XDP support - extend support for GPIO pins to E823 devices - nVidia/Mellanox: - update the shared buffer configuration on PFC commands - implement PTP adjphase function for HW offset control - TC support for Geneve and GRE with VF tunnel offload - more efficient crypto key management method - multi-port eswitch support - Netronome/Corigine: - add DCB IEEE support - support IPsec offloading for NFP3800 - Freescale/NXP (enetc): - support XDP_REDIRECT for XDP non-linear buffers - improve reconfig, avoid link flap and waiting for idle - support MAC Merge layer - Other NICs: - sfc/ef100: add basic devlink support for ef100 - ionic: rx_push mode operation (writing descriptors via MMIO) - bnxt: use the auxiliary bus abstraction for RDMA - r8169: disable ASPM and reset bus in case of tx timeout - cpsw: support QSGMII mode for J721e CPSW9G - cpts: support pulse-per-second output - ngbe: add an mdio bus driver - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing - r8152: handle devices with FW with NCM support - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation - virtio-net: support multi buffer XDP - virtio/vsock: replace virtio_vsock_pkt with sk_buff - tsnep: XDP support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add support for latency TLV (in FW control messages) - Microchip (sparx5): - separate explicit and implicit traffic forwarding rules, make the implicit rules always active - add support for egress DSCP rewrite - IS0 VCAP support (Ingress Classification) - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS etc.) - ES2 VCAP support (Egress Access Control) - support for Per-Stream Filtering and Policing (802.1Q, 8.6.5.1) - Ethernet embedded switches: - Marvell (mv88e6xxx): - add MAB (port auth) offload support - enable PTP receive for mv88e6390 - NXP (ocelot): - support MAC Merge layer - support for the the vsc7512 internal copper phys - Microchip: - lan9303: convert to PHYLINK - lan966x: support TC flower filter statistics - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x - lan937x: support Credit Based Shaper configuration - ksz9477: support Energy Efficient Ethernet - other: - qca8k: convert to regmap read/write API, use bulk operations - rswitch: Improve TX timestamp accuracy - Intel WiFi (iwlwifi): - EHT (Wi-Fi 7) rate reporting - STEP equalizer support: transfer some STEP (connection to radio on platforms with integrated wifi) related parameters from the BIOS to the firmware. - Qualcomm 802.11ax WiFi (ath11k): - IPQ5018 support - Fine Timing Measurement (FTM) responder role support - channel 177 support - MediaTek WiFi (mt76): - per-PHY LED support - mt7996: EHT (Wi-Fi 7) support - Wireless Ethernet Dispatch (WED) reset support - switch to using page pool allocator - RealTek WiFi (rtw89): - support new version of Bluetooth co-existance - Mobile: - rmnet: support TX aggregation" * tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits) page_pool: add a comment explaining the fragment counter usage net: ethtool: fix __ethtool_dev_mm_supported() implementation ethtool: pse-pd: Fix double word in comments xsk: add linux/vmalloc.h to xsk.c sefltests: netdevsim: wait for devlink instance after netns removal selftest: fib_tests: Always cleanup before exit net/mlx5e: Align IPsec ASO result memory to be as required by hardware net/mlx5e: TC, Set CT miss to the specific ct action instance net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG net/mlx5: Refactor tc miss handling to a single function net/mlx5: Kconfig: Make tc offload depend on tc skb extension net/sched: flower: Support hardware miss to tc action net/sched: flower: Move filter handle initialization earlier net/sched: cls_api: Support hardware miss to tc action net/sched: Rename user cookie and act cookie sfc: fix builds without CONFIG_RTC_LIB sfc: clean up some inconsistent indentings net/mlx4_en: Introduce flexible array to silence overflow warning net: lan966x: Fix possible deadlock inside PTP net/ulp: Remove redundant ->clone() test in inet_clone_ulp(). ...
Diffstat (limited to '')
-rw-r--r--net/sctp/output.c865
1 files changed, 865 insertions, 0 deletions
diff --git a/net/sctp/output.c b/net/sctp/output.c
new file mode 100644
index 000000000..a63df055a
--- /dev/null
+++ b/net/sctp/output.c
@@ -0,0 +1,865 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* SCTP kernel implementation
+ * (C) Copyright IBM Corp. 2001, 2004
+ * Copyright (c) 1999-2000 Cisco, Inc.
+ * Copyright (c) 1999-2001 Motorola, Inc.
+ *
+ * This file is part of the SCTP kernel implementation
+ *
+ * These functions handle output processing.
+ *
+ * Please send any bug reports or fixes you make to the
+ * email address(es):
+ * lksctp developers <linux-sctp@vger.kernel.org>
+ *
+ * Written or modified by:
+ * La Monte H.P. Yarroll <piggy@acm.org>
+ * Karl Knutson <karl@athena.chicago.il.us>
+ * Jon Grimm <jgrimm@austin.ibm.com>
+ * Sridhar Samudrala <sri@us.ibm.com>
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/wait.h>
+#include <linux/time.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <net/inet_ecn.h>
+#include <net/ip.h>
+#include <net/icmp.h>
+#include <net/net_namespace.h>
+
+#include <linux/socket.h> /* for sa_family_t */
+#include <net/sock.h>
+
+#include <net/sctp/sctp.h>
+#include <net/sctp/sm.h>
+#include <net/sctp/checksum.h>
+
+/* Forward declarations for private helpers. */
+static enum sctp_xmit __sctp_packet_append_chunk(struct sctp_packet *packet,
+ struct sctp_chunk *chunk);
+static enum sctp_xmit sctp_packet_can_append_data(struct sctp_packet *packet,
+ struct sctp_chunk *chunk);
+static void sctp_packet_append_data(struct sctp_packet *packet,
+ struct sctp_chunk *chunk);
+static enum sctp_xmit sctp_packet_will_fit(struct sctp_packet *packet,
+ struct sctp_chunk *chunk,
+ u16 chunk_len);
+
+static void sctp_packet_reset(struct sctp_packet *packet)
+{
+ /* sctp_packet_transmit() relies on this to reset size to the
+ * current overhead after sending packets.
+ */
+ packet->size = packet->overhead;
+
+ packet->has_cookie_echo = 0;
+ packet->has_sack = 0;
+ packet->has_data = 0;
+ packet->has_auth = 0;
+ packet->ipfragok = 0;
+ packet->auth = NULL;
+}
+
+/* Config a packet.
+ * This appears to be a followup set of initializations.
+ */
+void sctp_packet_config(struct sctp_packet *packet, __u32 vtag,
+ int ecn_capable)
+{
+ struct sctp_transport *tp = packet->transport;
+ struct sctp_association *asoc = tp->asoc;
+ struct sctp_sock *sp = NULL;
+ struct sock *sk;
+
+ pr_debug("%s: packet:%p vtag:0x%x\n", __func__, packet, vtag);
+ packet->vtag = vtag;
+
+ /* do the following jobs only once for a flush schedule */
+ if (!sctp_packet_empty(packet))
+ return;
+
+ /* set packet max_size with pathmtu, then calculate overhead */
+ packet->max_size = tp->pathmtu;
+
+ if (asoc) {
+ sk = asoc->base.sk;
+ sp = sctp_sk(sk);
+ }
+ packet->overhead = sctp_mtu_payload(sp, 0, 0);
+ packet->size = packet->overhead;
+
+ if (!asoc)
+ return;
+
+ /* update dst or transport pathmtu if in need */
+ if (!sctp_transport_dst_check(tp)) {
+ sctp_transport_route(tp, NULL, sp);
+ if (asoc->param_flags & SPP_PMTUD_ENABLE)
+ sctp_assoc_sync_pmtu(asoc);
+ } else if (!sctp_transport_pl_enabled(tp) &&
+ asoc->param_flags & SPP_PMTUD_ENABLE) {
+ if (!sctp_transport_pmtu_check(tp))
+ sctp_assoc_sync_pmtu(asoc);
+ }
+
+ if (asoc->pmtu_pending) {
+ if (asoc->param_flags & SPP_PMTUD_ENABLE)
+ sctp_assoc_sync_pmtu(asoc);
+ asoc->pmtu_pending = 0;
+ }
+
+ /* If there a is a prepend chunk stick it on the list before
+ * any other chunks get appended.
+ */
+ if (ecn_capable) {
+ struct sctp_chunk *chunk = sctp_get_ecne_prepend(asoc);
+
+ if (chunk)
+ sctp_packet_append_chunk(packet, chunk);
+ }
+
+ if (!tp->dst)
+ return;
+
+ /* set packet max_size with gso_max_size if gso is enabled*/
+ rcu_read_lock();
+ if (__sk_dst_get(sk) != tp->dst) {
+ dst_hold(tp->dst);
+ sk_setup_caps(sk, tp->dst);
+ }
+ packet->max_size = sk_can_gso(sk) ? min(READ_ONCE(tp->dst->dev->gso_max_size),
+ GSO_LEGACY_MAX_SIZE)
+ : asoc->pathmtu;
+ rcu_read_unlock();
+}
+
+/* Initialize the packet structure. */
+void sctp_packet_init(struct sctp_packet *packet,
+ struct sctp_transport *transport,
+ __u16 sport, __u16 dport)
+{
+ pr_debug("%s: packet:%p transport:%p\n", __func__, packet, transport);
+
+ packet->transport = transport;
+ packet->source_port = sport;
+ packet->destination_port = dport;
+ INIT_LIST_HEAD(&packet->chunk_list);
+ /* The overhead will be calculated by sctp_packet_config() */
+ packet->overhead = 0;
+ sctp_packet_reset(packet);
+ packet->vtag = 0;
+}
+
+/* Free a packet. */
+void sctp_packet_free(struct sctp_packet *packet)
+{
+ struct sctp_chunk *chunk, *tmp;
+
+ pr_debug("%s: packet:%p\n", __func__, packet);
+
+ list_for_each_entry_safe(chunk, tmp, &packet->chunk_list, list) {
+ list_del_init(&chunk->list);
+ sctp_chunk_free(chunk);
+ }
+}
+
+/* This routine tries to append the chunk to the offered packet. If adding
+ * the chunk causes the packet to exceed the path MTU and COOKIE_ECHO chunk
+ * is not present in the packet, it transmits the input packet.
+ * Data can be bundled with a packet containing a COOKIE_ECHO chunk as long
+ * as it can fit in the packet, but any more data that does not fit in this
+ * packet can be sent only after receiving the COOKIE_ACK.
+ */
+enum sctp_xmit sctp_packet_transmit_chunk(struct sctp_packet *packet,
+ struct sctp_chunk *chunk,
+ int one_packet, gfp_t gfp)
+{
+ enum sctp_xmit retval;
+
+ pr_debug("%s: packet:%p size:%zu chunk:%p size:%d\n", __func__,
+ packet, packet->size, chunk, chunk->skb ? chunk->skb->len : -1);
+
+ switch ((retval = (sctp_packet_append_chunk(packet, chunk)))) {
+ case SCTP_XMIT_PMTU_FULL:
+ if (!packet->has_cookie_echo) {
+ int error = 0;
+
+ error = sctp_packet_transmit(packet, gfp);
+ if (error < 0)
+ chunk->skb->sk->sk_err = -error;
+
+ /* If we have an empty packet, then we can NOT ever
+ * return PMTU_FULL.
+ */
+ if (!one_packet)
+ retval = sctp_packet_append_chunk(packet,
+ chunk);
+ }
+ break;
+
+ case SCTP_XMIT_RWND_FULL:
+ case SCTP_XMIT_OK:
+ case SCTP_XMIT_DELAY:
+ break;
+ }
+
+ return retval;
+}
+
+/* Try to bundle a pad chunk into a packet with a heartbeat chunk for PLPMTUTD probe */
+static enum sctp_xmit sctp_packet_bundle_pad(struct sctp_packet *pkt, struct sctp_chunk *chunk)
+{
+ struct sctp_transport *t = pkt->transport;
+ struct sctp_chunk *pad;
+ int overhead = 0;
+
+ if (!chunk->pmtu_probe)
+ return SCTP_XMIT_OK;
+
+ /* calculate the Padding Data size for the pad chunk */
+ overhead += sizeof(struct sctphdr) + sizeof(struct sctp_chunkhdr);
+ overhead += sizeof(struct sctp_sender_hb_info) + sizeof(struct sctp_pad_chunk);
+ pad = sctp_make_pad(t->asoc, t->pl.probe_size - overhead);
+ if (!pad)
+ return SCTP_XMIT_DELAY;
+
+ list_add_tail(&pad->list, &pkt->chunk_list);
+ pkt->size += SCTP_PAD4(ntohs(pad->chunk_hdr->length));
+ chunk->transport = t;
+
+ return SCTP_XMIT_OK;
+}
+
+/* Try to bundle an auth chunk into the packet. */
+static enum sctp_xmit sctp_packet_bundle_auth(struct sctp_packet *pkt,
+ struct sctp_chunk *chunk)
+{
+ struct sctp_association *asoc = pkt->transport->asoc;
+ enum sctp_xmit retval = SCTP_XMIT_OK;
+ struct sctp_chunk *auth;
+
+ /* if we don't have an association, we can't do authentication */
+ if (!asoc)
+ return retval;
+
+ /* See if this is an auth chunk we are bundling or if
+ * auth is already bundled.
+ */
+ if (chunk->chunk_hdr->type == SCTP_CID_AUTH || pkt->has_auth)
+ return retval;
+
+ /* if the peer did not request this chunk to be authenticated,
+ * don't do it
+ */
+ if (!chunk->auth)
+ return retval;
+
+ auth = sctp_make_auth(asoc, chunk->shkey->key_id);
+ if (!auth)
+ return retval;
+
+ auth->shkey = chunk->shkey;
+ sctp_auth_shkey_hold(auth->shkey);
+
+ retval = __sctp_packet_append_chunk(pkt, auth);
+
+ if (retval != SCTP_XMIT_OK)
+ sctp_chunk_free(auth);
+
+ return retval;
+}
+
+/* Try to bundle a SACK with the packet. */
+static enum sctp_xmit sctp_packet_bundle_sack(struct sctp_packet *pkt,
+ struct sctp_chunk *chunk)
+{
+ enum sctp_xmit retval = SCTP_XMIT_OK;
+
+ /* If sending DATA and haven't aleady bundled a SACK, try to
+ * bundle one in to the packet.
+ */
+ if (sctp_chunk_is_data(chunk) && !pkt->has_sack &&
+ !pkt->has_cookie_echo) {
+ struct sctp_association *asoc;
+ struct timer_list *timer;
+ asoc = pkt->transport->asoc;
+ timer = &asoc->timers[SCTP_EVENT_TIMEOUT_SACK];
+
+ /* If the SACK timer is running, we have a pending SACK */
+ if (timer_pending(timer)) {
+ struct sctp_chunk *sack;
+
+ if (pkt->transport->sack_generation !=
+ pkt->transport->asoc->peer.sack_generation)
+ return retval;
+
+ asoc->a_rwnd = asoc->rwnd;
+ sack = sctp_make_sack(asoc);
+ if (sack) {
+ retval = __sctp_packet_append_chunk(pkt, sack);
+ if (retval != SCTP_XMIT_OK) {
+ sctp_chunk_free(sack);
+ goto out;
+ }
+ SCTP_INC_STATS(asoc->base.net,
+ SCTP_MIB_OUTCTRLCHUNKS);
+ asoc->stats.octrlchunks++;
+ asoc->peer.sack_needed = 0;
+ if (del_timer(timer))
+ sctp_association_put(asoc);
+ }
+ }
+ }
+out:
+ return retval;
+}
+
+
+/* Append a chunk to the offered packet reporting back any inability to do
+ * so.
+ */
+static enum sctp_xmit __sctp_packet_append_chunk(struct sctp_packet *packet,
+ struct sctp_chunk *chunk)
+{
+ __u16 chunk_len = SCTP_PAD4(ntohs(chunk->chunk_hdr->length));
+ enum sctp_xmit retval = SCTP_XMIT_OK;
+
+ /* Check to see if this chunk will fit into the packet */
+ retval = sctp_packet_will_fit(packet, chunk, chunk_len);
+ if (retval != SCTP_XMIT_OK)
+ goto finish;
+
+ /* We believe that this chunk is OK to add to the packet */
+ switch (chunk->chunk_hdr->type) {
+ case SCTP_CID_DATA:
+ case SCTP_CID_I_DATA:
+ /* Account for the data being in the packet */
+ sctp_packet_append_data(packet, chunk);
+ /* Disallow SACK bundling after DATA. */
+ packet->has_sack = 1;
+ /* Disallow AUTH bundling after DATA */
+ packet->has_auth = 1;
+ /* Let it be knows that packet has DATA in it */
+ packet->has_data = 1;
+ /* timestamp the chunk for rtx purposes */
+ chunk->sent_at = jiffies;
+ /* Mainly used for prsctp RTX policy */
+ chunk->sent_count++;
+ break;
+ case SCTP_CID_COOKIE_ECHO:
+ packet->has_cookie_echo = 1;
+ break;
+
+ case SCTP_CID_SACK:
+ packet->has_sack = 1;
+ if (chunk->asoc)
+ chunk->asoc->stats.osacks++;
+ break;
+
+ case SCTP_CID_AUTH:
+ packet->has_auth = 1;
+ packet->auth = chunk;
+ break;
+ }
+
+ /* It is OK to send this chunk. */
+ list_add_tail(&chunk->list, &packet->chunk_list);
+ packet->size += chunk_len;
+ chunk->transport = packet->transport;
+finish:
+ return retval;
+}
+
+/* Append a chunk to the offered packet reporting back any inability to do
+ * so.
+ */
+enum sctp_xmit sctp_packet_append_chunk(struct sctp_packet *packet,
+ struct sctp_chunk *chunk)
+{
+ enum sctp_xmit retval = SCTP_XMIT_OK;
+
+ pr_debug("%s: packet:%p chunk:%p\n", __func__, packet, chunk);
+
+ /* Data chunks are special. Before seeing what else we can
+ * bundle into this packet, check to see if we are allowed to
+ * send this DATA.
+ */
+ if (sctp_chunk_is_data(chunk)) {
+ retval = sctp_packet_can_append_data(packet, chunk);
+ if (retval != SCTP_XMIT_OK)
+ goto finish;
+ }
+
+ /* Try to bundle AUTH chunk */
+ retval = sctp_packet_bundle_auth(packet, chunk);
+ if (retval != SCTP_XMIT_OK)
+ goto finish;
+
+ /* Try to bundle SACK chunk */
+ retval = sctp_packet_bundle_sack(packet, chunk);
+ if (retval != SCTP_XMIT_OK)
+ goto finish;
+
+ retval = __sctp_packet_append_chunk(packet, chunk);
+ if (retval != SCTP_XMIT_OK)
+ goto finish;
+
+ retval = sctp_packet_bundle_pad(packet, chunk);
+
+finish:
+ return retval;
+}
+
+static void sctp_packet_gso_append(struct sk_buff *head, struct sk_buff *skb)
+{
+ if (SCTP_OUTPUT_CB(head)->last == head)
+ skb_shinfo(head)->frag_list = skb;
+ else
+ SCTP_OUTPUT_CB(head)->last->next = skb;
+ SCTP_OUTPUT_CB(head)->last = skb;
+
+ head->truesize += skb->truesize;
+ head->data_len += skb->len;
+ head->len += skb->len;
+ refcount_add(skb->truesize, &head->sk->sk_wmem_alloc);
+
+ __skb_header_release(skb);
+}
+
+static int sctp_packet_pack(struct sctp_packet *packet,
+ struct sk_buff *head, int gso, gfp_t gfp)
+{
+ struct sctp_transport *tp = packet->transport;
+ struct sctp_auth_chunk *auth = NULL;
+ struct sctp_chunk *chunk, *tmp;
+ int pkt_count = 0, pkt_size;
+ struct sock *sk = head->sk;
+ struct sk_buff *nskb;
+ int auth_len = 0;
+
+ if (gso) {
+ skb_shinfo(head)->gso_type = sk->sk_gso_type;
+ SCTP_OUTPUT_CB(head)->last = head;
+ } else {
+ nskb = head;
+ pkt_size = packet->size;
+ goto merge;
+ }
+
+ do {
+ /* calculate the pkt_size and alloc nskb */
+ pkt_size = packet->overhead;
+ list_for_each_entry_safe(chunk, tmp, &packet->chunk_list,
+ list) {
+ int padded = SCTP_PAD4(chunk->skb->len);
+
+ if (chunk == packet->auth)
+ auth_len = padded;
+ else if (auth_len + padded + packet->overhead >
+ tp->pathmtu)
+ return 0;
+ else if (pkt_size + padded > tp->pathmtu)
+ break;
+ pkt_size += padded;
+ }
+ nskb = alloc_skb(pkt_size + MAX_HEADER, gfp);
+ if (!nskb)
+ return 0;
+ skb_reserve(nskb, packet->overhead + MAX_HEADER);
+
+merge:
+ /* merge chunks into nskb and append nskb into head list */
+ pkt_size -= packet->overhead;
+ list_for_each_entry_safe(chunk, tmp, &packet->chunk_list, list) {
+ int padding;
+
+ list_del_init(&chunk->list);
+ if (sctp_chunk_is_data(chunk)) {
+ if (!sctp_chunk_retransmitted(chunk) &&
+ !tp->rto_pending) {
+ chunk->rtt_in_progress = 1;
+ tp->rto_pending = 1;
+ }
+ }
+
+ padding = SCTP_PAD4(chunk->skb->len) - chunk->skb->len;
+ if (padding)
+ skb_put_zero(chunk->skb, padding);
+
+ if (chunk == packet->auth)
+ auth = (struct sctp_auth_chunk *)
+ skb_tail_pointer(nskb);
+
+ skb_put_data(nskb, chunk->skb->data, chunk->skb->len);
+
+ pr_debug("*** Chunk:%p[%s] %s 0x%x, length:%d, chunk->skb->len:%d, rtt_in_progress:%d\n",
+ chunk,
+ sctp_cname(SCTP_ST_CHUNK(chunk->chunk_hdr->type)),
+ chunk->has_tsn ? "TSN" : "No TSN",
+ chunk->has_tsn ? ntohl(chunk->subh.data_hdr->tsn) : 0,
+ ntohs(chunk->chunk_hdr->length), chunk->skb->len,
+ chunk->rtt_in_progress);
+
+ pkt_size -= SCTP_PAD4(chunk->skb->len);
+
+ if (!sctp_chunk_is_data(chunk) && chunk != packet->auth)
+ sctp_chunk_free(chunk);
+
+ if (!pkt_size)
+ break;
+ }
+
+ if (auth) {
+ sctp_auth_calculate_hmac(tp->asoc, nskb, auth,
+ packet->auth->shkey, gfp);
+ /* free auth if no more chunks, or add it back */
+ if (list_empty(&packet->chunk_list))
+ sctp_chunk_free(packet->auth);
+ else
+ list_add(&packet->auth->list,
+ &packet->chunk_list);
+ }
+
+ if (gso)
+ sctp_packet_gso_append(head, nskb);
+
+ pkt_count++;
+ } while (!list_empty(&packet->chunk_list));
+
+ if (gso) {
+ memset(head->cb, 0, max(sizeof(struct inet_skb_parm),
+ sizeof(struct inet6_skb_parm)));
+ skb_shinfo(head)->gso_segs = pkt_count;
+ skb_shinfo(head)->gso_size = GSO_BY_FRAGS;
+ goto chksum;
+ }
+
+ if (sctp_checksum_disable)
+ return 1;
+
+ if (!(tp->dst->dev->features & NETIF_F_SCTP_CRC) ||
+ dst_xfrm(tp->dst) || packet->ipfragok || tp->encap_port) {
+ struct sctphdr *sh =
+ (struct sctphdr *)skb_transport_header(head);
+
+ sh->checksum = sctp_compute_cksum(head, 0);
+ } else {
+chksum:
+ head->ip_summed = CHECKSUM_PARTIAL;
+ head->csum_not_inet = 1;
+ head->csum_start = skb_transport_header(head) - head->head;
+ head->csum_offset = offsetof(struct sctphdr, checksum);
+ }
+
+ return pkt_count;
+}
+
+/* All packets are sent to the network through this function from
+ * sctp_outq_tail().
+ *
+ * The return value is always 0 for now.
+ */
+int sctp_packet_transmit(struct sctp_packet *packet, gfp_t gfp)
+{
+ struct sctp_transport *tp = packet->transport;
+ struct sctp_association *asoc = tp->asoc;
+ struct sctp_chunk *chunk, *tmp;
+ int pkt_count, gso = 0;
+ struct sk_buff *head;
+ struct sctphdr *sh;
+ struct sock *sk;
+
+ pr_debug("%s: packet:%p\n", __func__, packet);
+ if (list_empty(&packet->chunk_list))
+ return 0;
+ chunk = list_entry(packet->chunk_list.next, struct sctp_chunk, list);
+ sk = chunk->skb->sk;
+
+ if (packet->size > tp->pathmtu && !packet->ipfragok && !chunk->pmtu_probe) {
+ if (tp->pl.state == SCTP_PL_ERROR) { /* do IP fragmentation if in Error state */
+ packet->ipfragok = 1;
+ } else {
+ if (!sk_can_gso(sk)) { /* check gso */
+ pr_err_once("Trying to GSO but underlying device doesn't support it.");
+ goto out;
+ }
+ gso = 1;
+ }
+ }
+
+ /* alloc head skb */
+ head = alloc_skb((gso ? packet->overhead : packet->size) +
+ MAX_HEADER, gfp);
+ if (!head)
+ goto out;
+ skb_reserve(head, packet->overhead + MAX_HEADER);
+ skb_set_owner_w(head, sk);
+
+ /* set sctp header */
+ sh = skb_push(head, sizeof(struct sctphdr));
+ skb_reset_transport_header(head);
+ sh->source = htons(packet->source_port);
+ sh->dest = htons(packet->destination_port);
+ sh->vtag = htonl(packet->vtag);
+ sh->checksum = 0;
+
+ /* drop packet if no dst */
+ if (!tp->dst) {
+ IP_INC_STATS(sock_net(sk), IPSTATS_MIB_OUTNOROUTES);
+ kfree_skb(head);
+ goto out;
+ }
+
+ /* pack up chunks */
+ pkt_count = sctp_packet_pack(packet, head, gso, gfp);
+ if (!pkt_count) {
+ kfree_skb(head);
+ goto out;
+ }
+ pr_debug("***sctp_transmit_packet*** skb->len:%d\n", head->len);
+
+ /* start autoclose timer */
+ if (packet->has_data && sctp_state(asoc, ESTABLISHED) &&
+ asoc->timeouts[SCTP_EVENT_TIMEOUT_AUTOCLOSE]) {
+ struct timer_list *timer =
+ &asoc->timers[SCTP_EVENT_TIMEOUT_AUTOCLOSE];
+ unsigned long timeout =
+ asoc->timeouts[SCTP_EVENT_TIMEOUT_AUTOCLOSE];
+
+ if (!mod_timer(timer, jiffies + timeout))
+ sctp_association_hold(asoc);
+ }
+
+ /* sctp xmit */
+ tp->af_specific->ecn_capable(sk);
+ if (asoc) {
+ asoc->stats.opackets += pkt_count;
+ if (asoc->peer.last_sent_to != tp)
+ asoc->peer.last_sent_to = tp;
+ }
+ head->ignore_df = packet->ipfragok;
+ if (tp->dst_pending_confirm)
+ skb_set_dst_pending_confirm(head, 1);
+ /* neighbour should be confirmed on successful transmission or
+ * positive error
+ */
+ if (tp->af_specific->sctp_xmit(head, tp) >= 0 &&
+ tp->dst_pending_confirm)
+ tp->dst_pending_confirm = 0;
+
+out:
+ list_for_each_entry_safe(chunk, tmp, &packet->chunk_list, list) {
+ list_del_init(&chunk->list);
+ if (!sctp_chunk_is_data(chunk))
+ sctp_chunk_free(chunk);
+ }
+ sctp_packet_reset(packet);
+ return 0;
+}
+
+/********************************************************************
+ * 2nd Level Abstractions
+ ********************************************************************/
+
+/* This private function check to see if a chunk can be added */
+static enum sctp_xmit sctp_packet_can_append_data(struct sctp_packet *packet,
+ struct sctp_chunk *chunk)
+{
+ size_t datasize, rwnd, inflight, flight_size;
+ struct sctp_transport *transport = packet->transport;
+ struct sctp_association *asoc = transport->asoc;
+ struct sctp_outq *q = &asoc->outqueue;
+
+ /* RFC 2960 6.1 Transmission of DATA Chunks
+ *
+ * A) At any given time, the data sender MUST NOT transmit new data to
+ * any destination transport address if its peer's rwnd indicates
+ * that the peer has no buffer space (i.e. rwnd is 0, see Section
+ * 6.2.1). However, regardless of the value of rwnd (including if it
+ * is 0), the data sender can always have one DATA chunk in flight to
+ * the receiver if allowed by cwnd (see rule B below). This rule
+ * allows the sender to probe for a change in rwnd that the sender
+ * missed due to the SACK having been lost in transit from the data
+ * receiver to the data sender.
+ */
+
+ rwnd = asoc->peer.rwnd;
+ inflight = q->outstanding_bytes;
+ flight_size = transport->flight_size;
+
+ datasize = sctp_data_size(chunk);
+
+ if (datasize > rwnd && inflight > 0)
+ /* We have (at least) one data chunk in flight,
+ * so we can't fall back to rule 6.1 B).
+ */
+ return SCTP_XMIT_RWND_FULL;
+
+ /* RFC 2960 6.1 Transmission of DATA Chunks
+ *
+ * B) At any given time, the sender MUST NOT transmit new data
+ * to a given transport address if it has cwnd or more bytes
+ * of data outstanding to that transport address.
+ */
+ /* RFC 7.2.4 & the Implementers Guide 2.8.
+ *
+ * 3) ...
+ * When a Fast Retransmit is being performed the sender SHOULD
+ * ignore the value of cwnd and SHOULD NOT delay retransmission.
+ */
+ if (chunk->fast_retransmit != SCTP_NEED_FRTX &&
+ flight_size >= transport->cwnd)
+ return SCTP_XMIT_RWND_FULL;
+
+ /* Nagle's algorithm to solve small-packet problem:
+ * Inhibit the sending of new chunks when new outgoing data arrives
+ * if any previously transmitted data on the connection remains
+ * unacknowledged.
+ */
+
+ if ((sctp_sk(asoc->base.sk)->nodelay || inflight == 0) &&
+ !asoc->force_delay)
+ /* Nothing unacked */
+ return SCTP_XMIT_OK;
+
+ if (!sctp_packet_empty(packet))
+ /* Append to packet */
+ return SCTP_XMIT_OK;
+
+ if (!sctp_state(asoc, ESTABLISHED))
+ return SCTP_XMIT_OK;
+
+ /* Check whether this chunk and all the rest of pending data will fit
+ * or delay in hopes of bundling a full sized packet.
+ */
+ if (chunk->skb->len + q->out_qlen > transport->pathmtu -
+ packet->overhead - sctp_datachk_len(&chunk->asoc->stream) - 4)
+ /* Enough data queued to fill a packet */
+ return SCTP_XMIT_OK;
+
+ /* Don't delay large message writes that may have been fragmented */
+ if (!chunk->msg->can_delay)
+ return SCTP_XMIT_OK;
+
+ /* Defer until all data acked or packet full */
+ return SCTP_XMIT_DELAY;
+}
+
+/* This private function does management things when adding DATA chunk */
+static void sctp_packet_append_data(struct sctp_packet *packet,
+ struct sctp_chunk *chunk)
+{
+ struct sctp_transport *transport = packet->transport;
+ size_t datasize = sctp_data_size(chunk);
+ struct sctp_association *asoc = transport->asoc;
+ u32 rwnd = asoc->peer.rwnd;
+
+ /* Keep track of how many bytes are in flight over this transport. */
+ transport->flight_size += datasize;
+
+ /* Keep track of how many bytes are in flight to the receiver. */
+ asoc->outqueue.outstanding_bytes += datasize;
+
+ /* Update our view of the receiver's rwnd. */
+ if (datasize < rwnd)
+ rwnd -= datasize;
+ else
+ rwnd = 0;
+
+ asoc->peer.rwnd = rwnd;
+ sctp_chunk_assign_tsn(chunk);
+ asoc->stream.si->assign_number(chunk);
+}
+
+static enum sctp_xmit sctp_packet_will_fit(struct sctp_packet *packet,
+ struct sctp_chunk *chunk,
+ u16 chunk_len)
+{
+ enum sctp_xmit retval = SCTP_XMIT_OK;
+ size_t psize, pmtu, maxsize;
+
+ /* Don't bundle in this packet if this chunk's auth key doesn't
+ * match other chunks already enqueued on this packet. Also,
+ * don't bundle the chunk with auth key if other chunks in this
+ * packet don't have auth key.
+ */
+ if ((packet->auth && chunk->shkey != packet->auth->shkey) ||
+ (!packet->auth && chunk->shkey &&
+ chunk->chunk_hdr->type != SCTP_CID_AUTH))
+ return SCTP_XMIT_PMTU_FULL;
+
+ psize = packet->size;
+ if (packet->transport->asoc)
+ pmtu = packet->transport->asoc->pathmtu;
+ else
+ pmtu = packet->transport->pathmtu;
+
+ /* Decide if we need to fragment or resubmit later. */
+ if (psize + chunk_len > pmtu) {
+ /* It's OK to fragment at IP level if any one of the following
+ * is true:
+ * 1. The packet is empty (meaning this chunk is greater
+ * the MTU)
+ * 2. The packet doesn't have any data in it yet and data
+ * requires authentication.
+ */
+ if (sctp_packet_empty(packet) ||
+ (!packet->has_data && chunk->auth)) {
+ /* We no longer do re-fragmentation.
+ * Just fragment at the IP layer, if we
+ * actually hit this condition
+ */
+ packet->ipfragok = 1;
+ goto out;
+ }
+
+ /* Similarly, if this chunk was built before a PMTU
+ * reduction, we have to fragment it at IP level now. So
+ * if the packet already contains something, we need to
+ * flush.
+ */
+ maxsize = pmtu - packet->overhead;
+ if (packet->auth)
+ maxsize -= SCTP_PAD4(packet->auth->skb->len);
+ if (chunk_len > maxsize)
+ retval = SCTP_XMIT_PMTU_FULL;
+
+ /* It is also okay to fragment if the chunk we are
+ * adding is a control chunk, but only if current packet
+ * is not a GSO one otherwise it causes fragmentation of
+ * a large frame. So in this case we allow the
+ * fragmentation by forcing it to be in a new packet.
+ */
+ if (!sctp_chunk_is_data(chunk) && packet->has_data)
+ retval = SCTP_XMIT_PMTU_FULL;
+
+ if (psize + chunk_len > packet->max_size)
+ /* Hit GSO/PMTU limit, gotta flush */
+ retval = SCTP_XMIT_PMTU_FULL;
+
+ if (!packet->transport->burst_limited &&
+ psize + chunk_len > (packet->transport->cwnd >> 1))
+ /* Do not allow a single GSO packet to use more
+ * than half of cwnd.
+ */
+ retval = SCTP_XMIT_PMTU_FULL;
+
+ if (packet->transport->burst_limited &&
+ psize + chunk_len > (packet->transport->burst_limited >> 1))
+ /* Do not allow a single GSO packet to use more
+ * than half of original cwnd.
+ */
+ retval = SCTP_XMIT_PMTU_FULL;
+ /* Otherwise it will fit in the GSO packet */
+ }
+
+out:
+ return retval;
+}