Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextgrafted

Pull networking updates from Jakub Kicinski: "Core: - Add dedicated kmem_cache for typical/small skb->head, avoid having to access struct page at kfree time, and improve memory use. - Introduce sysctl to set default RPS configuration for new netdevs. - Define Netlink protocol specification format which can be used to describe messages used by each family and auto-generate parsers. Add tools for generating kernel data structures and uAPI headers. - Expose all net/core sysctls inside netns. - Remove 4s sleep in netpoll if carrier is instantly detected on boot. - Add configurable limit of MDB entries per port, and port-vlan. - Continue populating drop reasons throughout the stack. - Retire a handful of legacy Qdiscs and classifiers. Protocols: - Support IPv4 big TCP (TSO frames larger than 64kB). - Add IP_LOCAL_PORT_RANGE socket option, to control local port range on socket by socket basis. - Track and report in procfs number of MPTCP sockets used. - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path manager. - IPv6: don't check net.ipv6.route.max_size and rely on garbage collection to free memory (similarly to IPv4). - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986). - ICMP: add per-rate limit counters. - Add support for user scanning requests in ieee802154. - Remove static WEP support. - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate reporting. - WiFi 7 EHT channel puncturing support (client & AP). BPF: - Add a rbtree data structure following the "next-gen data structure" precedent set by recently added linked list, that is, by using kfunc + kptr instead of adding a new BPF map type. - Expose XDP hints via kfuncs with initial support for RX hash and timestamp metadata. - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better support decap on GRE tunnel devices not operating in collect metadata. - Improve x86 JIT's codegen for PROBE_MEM runtime error checks. - Remove the need for trace_printk_lock for bpf_trace_printk and bpf_trace_vprintk helpers. - Extend libbpf's bpf_tracing.h support for tracing arguments of kprobes/uprobes and syscall as a special case. - Significantly reduce the search time for module symbols by livepatch and BPF. - Enable cpumasks to be used as kptrs, which is useful for tracing programs tracking which tasks end up running on which CPUs in different time intervals. - Add support for BPF trampoline on s390x and riscv64. - Add capability to export the XDP features supported by the NIC. - Add __bpf_kfunc tag for marking kernel functions as kfuncs. - Add cgroup.memory=nobpf kernel parameter option to disable BPF memory accounting for container environments. Netfilter: - Remove the CLUSTERIP target. It has been marked as obsolete for years, and we still have WARN splats wrt races of the out-of-band /proc interface installed by this target. - Add 'destroy' commands to nf_tables. They are identical to the existing 'delete' commands, but do not return an error if the referenced object (set, chain, rule...) did not exist. Driver API: - Improve cpumask_local_spread() locality to help NICs set the right IRQ affinity on AMD platforms. - Separate C22 and C45 MDIO bus transactions more clearly. - Introduce new DCB table to control DSCP rewrite on egress. - Support configuration of Physical Layer Collision Avoidance (PLCA) Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of shared medium Ethernet. - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing preemption of low priority frames by high priority frames. - Add support for controlling MACSec offload using netlink SET. - Rework devlink instance refcounts to allow registration and de-registration under the instance lock. Split the code into multiple files, drop some of the unnecessarily granular locks and factor out common parts of netlink operation handling. - Add TX frame aggregation parameters (for USB drivers). - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning messages with notifications for debug. - Allow offloading of UDP NEW connections via act_ct. - Add support for per action HW stats in TC. - Support hardware miss to TC action (continue processing in SW from a specific point in the action chain). - Warn if old Wireless Extension user space interface is used with modern cfg80211/mac80211 drivers. Do not support Wireless Extensions for Wi-Fi 7 devices at all. Everyone should switch to using nl80211 interface instead. - Improve the CAN bit timing configuration. Use extack to return error messages directly to user space, update the SJW handling, including the definition of a new default value that will benefit CAN-FD controllers, by increasing their oscillator tolerance. New hardware / drivers: - Ethernet: - nVidia BlueField-3 support (control traffic driver) - Ethernet support for imx93 SoCs - Motorcomm yt8531 gigabit Ethernet PHY - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA) - Microchip LAN8841 PHY (incl. cable diagnostics and PTP) - Amlogic gxl MDIO mux - WiFi: - RealTek RTL8188EU (rtl8xxxu) - Qualcomm Wi-Fi 7 devices (ath12k) - CAN: - Renesas R-Car V4H Drivers: - Bluetooth: - Set Per Platform Antenna Gain (PPAG) for Intel controllers. - Ethernet NICs: - Intel (1G, igc): - support TSN / Qbv / packet scheduling features of i226 model - Intel (100G, ice): - use GNSS subsystem instead of TTY - multi-buffer XDP support - extend support for GPIO pins to E823 devices - nVidia/Mellanox: - update the shared buffer configuration on PFC commands - implement PTP adjphase function for HW offset control - TC support for Geneve and GRE with VF tunnel offload - more efficient crypto key management method - multi-port eswitch support - Netronome/Corigine: - add DCB IEEE support - support IPsec offloading for NFP3800 - Freescale/NXP (enetc): - support XDP_REDIRECT for XDP non-linear buffers - improve reconfig, avoid link flap and waiting for idle - support MAC Merge layer - Other NICs: - sfc/ef100: add basic devlink support for ef100 - ionic: rx_push mode operation (writing descriptors via MMIO) - bnxt: use the auxiliary bus abstraction for RDMA - r8169: disable ASPM and reset bus in case of tx timeout - cpsw: support QSGMII mode for J721e CPSW9G - cpts: support pulse-per-second output - ngbe: add an mdio bus driver - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing - r8152: handle devices with FW with NCM support - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation - virtio-net: support multi buffer XDP - virtio/vsock: replace virtio_vsock_pkt with sk_buff - tsnep: XDP support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add support for latency TLV (in FW control messages) - Microchip (sparx5): - separate explicit and implicit traffic forwarding rules, make the implicit rules always active - add support for egress DSCP rewrite - IS0 VCAP support (Ingress Classification) - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS etc.) - ES2 VCAP support (Egress Access Control) - support for Per-Stream Filtering and Policing (802.1Q, 8.6.5.1) - Ethernet embedded switches: - Marvell (mv88e6xxx): - add MAB (port auth) offload support - enable PTP receive for mv88e6390 - NXP (ocelot): - support MAC Merge layer - support for the the vsc7512 internal copper phys - Microchip: - lan9303: convert to PHYLINK - lan966x: support TC flower filter statistics - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x - lan937x: support Credit Based Shaper configuration - ksz9477: support Energy Efficient Ethernet - other: - qca8k: convert to regmap read/write API, use bulk operations - rswitch: Improve TX timestamp accuracy - Intel WiFi (iwlwifi): - EHT (Wi-Fi 7) rate reporting - STEP equalizer support: transfer some STEP (connection to radio on platforms with integrated wifi) related parameters from the BIOS to the firmware. - Qualcomm 802.11ax WiFi (ath11k): - IPQ5018 support - Fine Timing Measurement (FTM) responder role support - channel 177 support - MediaTek WiFi (mt76): - per-PHY LED support - mt7996: EHT (Wi-Fi 7) support - Wireless Ethernet Dispatch (WED) reset support - switch to using page pool allocator - RealTek WiFi (rtw89): - support new version of Bluetooth co-existance - Mobile: - rmnet: support TX aggregation" * tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits) page_pool: add a comment explaining the fragment counter usage net: ethtool: fix __ethtool_dev_mm_supported() implementation ethtool: pse-pd: Fix double word in comments xsk: add linux/vmalloc.h to xsk.c sefltests: netdevsim: wait for devlink instance after netns removal selftest: fib_tests: Always cleanup before exit net/mlx5e: Align IPsec ASO result memory to be as required by hardware net/mlx5e: TC, Set CT miss to the specific ct action instance net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG net/mlx5: Refactor tc miss handling to a single function net/mlx5: Kconfig: Make tc offload depend on tc skb extension net/sched: flower: Support hardware miss to tc action net/sched: flower: Move filter handle initialization earlier net/sched: cls_api: Support hardware miss to tc action net/sched: Rename user cookie and act cookie sfc: fix builds without CONFIG_RTC_LIB sfc: clean up some inconsistent indentings net/mlx4_en: Introduce flexible array to silence overflow warning net: lan966x: Fix possible deadlock inside PTP net/ulp: Remove redundant ->clone() test in inet_clone_ulp(). ...
author: Linus Torvalds <torvalds@linux-foundation.org> 2023-02-21 18:24:12 -0800
committer: Linus Torvalds <torvalds@linux-foundation.org> 2023-02-21 18:24:12 -0800
commit: 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 (patch)
tree: cc5c2d0a898769fd59549594fedb3ee6f84e59a0 /drivers/connector
download: linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.tar.gz
linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.zip
5 files changed, 862 insertions, 0 deletions
diff --git a/drivers/connector/Kconfig b/drivers/connector/Kconfig
new file mode 100644
index 000000000..0c2d2aa82
--- /dev/null
+++ b/drivers/connector/Kconfig
@@ -0,0 +1,23 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+menuconfig CONNECTOR
+	tristate "Connector - unified userspace <-> kernelspace linker"
+	depends on NET
+	help
+	  This is unified userspace <-> kernelspace connector working on top
+	  of the netlink socket protocol.
+
+	  Connector support can also be built as a module.  If so, the module
+	  will be called cn.
+
+if CONNECTOR
+
+config PROC_EVENTS
+	bool "Report process events to userspace"
+	depends on CONNECTOR=y
+	default y
+	help
+	  Provide a connector that reports process events to userspace. Send
+	  events such as fork, exec, id change (uid, gid, suid, etc), and exit.
+
+endif # CONNECTOR
diff --git a/drivers/connector/Makefile b/drivers/connector/Makefile
new file mode 100644
index 000000000..1bf67d3df
--- /dev/null
+++ b/drivers/connector/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_CONNECTOR)		+= cn.o
+obj-$(CONFIG_PROC_EVENTS)	+= cn_proc.o
+
+cn-y				+= cn_queue.o connector.o
diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
new file mode 100644
index 000000000..ccac1c453
--- /dev/null
+++ b/drivers/connector/cn_proc.c
@@ -0,0 +1,403 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * cn_proc.c - process events connector
+ *
+ * Copyright (C) Matt Helsley, IBM Corp. 2005
+ * Based on cn_fork.c by Guillaume Thouvenin <guillaume.thouvenin@bull.net>
+ * Original copyright notice follows:
+ * Copyright (C) 2005 BULL SA.
+ */
+
+#include <linux/kernel.h>
+#include <linux/ktime.h>
+#include <linux/init.h>
+#include <linux/connector.h>
+#include <linux/gfp.h>
+#include <linux/ptrace.h>
+#include <linux/atomic.h>
+#include <linux/pid_namespace.h>
+
+#include <linux/cn_proc.h>
+#include <linux/local_lock.h>
+
+/*
+ * Size of a cn_msg followed by a proc_event structure.  Since the
+ * sizeof struct cn_msg is a multiple of 4 bytes, but not 8 bytes, we
+ * add one 4-byte word to the size here, and then start the actual
+ * cn_msg structure 4 bytes into the stack buffer.  The result is that
+ * the immediately following proc_event structure is aligned to 8 bytes.
+ */
+#define CN_PROC_MSG_SIZE (sizeof(struct cn_msg) + sizeof(struct proc_event) + 4)
+
+/* See comment above; we test our assumption about sizeof struct cn_msg here. */
+static inline struct cn_msg *buffer_to_cn_msg(__u8 *buffer)
+{
+	BUILD_BUG_ON(sizeof(struct cn_msg) != 20);
+	return (struct cn_msg *)(buffer + 4);
+}
+
+static atomic_t proc_event_num_listeners = ATOMIC_INIT(0);
+static struct cb_id cn_proc_event_id = { CN_IDX_PROC, CN_VAL_PROC };
+
+/* local_event.count is used as the sequence number of the netlink message */
+struct local_event {
+	local_lock_t lock;
+	__u32 count;
+};
+static DEFINE_PER_CPU(struct local_event, local_event) = {
+	.lock = INIT_LOCAL_LOCK(lock),
+};
+
+static inline void send_msg(struct cn_msg *msg)
+{
+	local_lock(&local_event.lock);
+
+	msg->seq = __this_cpu_inc_return(local_event.count) - 1;
+	((struct proc_event *)msg->data)->cpu = smp_processor_id();
+
+	/*
+	 * local_lock() disables preemption during send to ensure the messages
+	 * are ordered according to their sequence numbers.
+	 *
+	 * If cn_netlink_send() fails, the data is not sent.
+	 */
+	cn_netlink_send(msg, 0, CN_IDX_PROC, GFP_NOWAIT);
+
+	local_unlock(&local_event.lock);
+}
+
+void proc_fork_connector(struct task_struct *task)
+{
+	struct cn_msg *msg;
+	struct proc_event *ev;
+	__u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
+	struct task_struct *parent;
+
+	if (atomic_read(&proc_event_num_listeners) < 1)
+		return;
+
+	msg = buffer_to_cn_msg(buffer);
+	ev = (struct proc_event *)msg->data;
+	memset(&ev->event_data, 0, sizeof(ev->event_data));
+	ev->timestamp_ns = ktime_get_ns();
+	ev->what = PROC_EVENT_FORK;
+	rcu_read_lock();
+	parent = rcu_dereference(task->real_parent);
+	ev->event_data.fork.parent_pid = parent->pid;
+	ev->event_data.fork.parent_tgid = parent->tgid;
+	rcu_read_unlock();
+	ev->event_data.fork.child_pid = task->pid;
+	ev->event_data.fork.child_tgid = task->tgid;
+
+	memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
+	msg->ack = 0; /* not used */
+	msg->len = sizeof(*ev);
+	msg->flags = 0; /* not used */
+	send_msg(msg);
+}
+
+void proc_exec_connector(struct task_struct *task)
+{
+	struct cn_msg *msg;
+	struct proc_event *ev;
+	__u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
+
+	if (atomic_read(&proc_event_num_listeners) < 1)
+		return;
+
+	msg = buffer_to_cn_msg(buffer);
+	ev = (struct proc_event *)msg->data;
+	memset(&ev->event_data, 0, sizeof(ev->event_data));
+	ev->timestamp_ns = ktime_get_ns();
+	ev->what = PROC_EVENT_EXEC;
+	ev->event_data.exec.process_pid = task->pid;
+	ev->event_data.exec.process_tgid = task->tgid;
+
+	memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
+	msg->ack = 0; /* not used */
+	msg->len = sizeof(*ev);
+	msg->flags = 0; /* not used */
+	send_msg(msg);
+}
+
+void proc_id_connector(struct task_struct *task, int which_id)
+{
+	struct cn_msg *msg;
+	struct proc_event *ev;
+	__u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
+	const struct cred *cred;
+
+	if (atomic_read(&proc_event_num_listeners) < 1)
+		return;
+
+	msg = buffer_to_cn_msg(buffer);
+	ev = (struct proc_event *)msg->data;
+	memset(&ev->event_data, 0, sizeof(ev->event_data));
+	ev->what = which_id;
+	ev->event_data.id.process_pid = task->pid;
+	ev->event_data.id.process_tgid = task->tgid;
+	rcu_read_lock();
+	cred = __task_cred(task);
+	if (which_id == PROC_EVENT_UID) {
+		ev->event_data.id.r.ruid = from_kuid_munged(&init_user_ns, cred->uid);
+		ev->event_data.id.e.euid = from_kuid_munged(&init_user_ns, cred->euid);
+	} else if (which_id == PROC_EVENT_GID) {
+		ev->event_data.id.r.rgid = from_kgid_munged(&init_user_ns, cred->gid);
+		ev->event_data.id.e.egid = from_kgid_munged(&init_user_ns, cred->egid);
+	} else {
+		rcu_read_unlock();
+		return;
+	}
+	rcu_read_unlock();
+	ev->timestamp_ns = ktime_get_ns();
+
+	memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
+	msg->ack = 0; /* not used */
+	msg->len = sizeof(*ev);
+	msg->flags = 0; /* not used */
+	send_msg(msg);
+}
+
+void proc_sid_connector(struct task_struct *task)
+{
+	struct cn_msg *msg;
+	struct proc_event *ev;
+	__u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
+
+	if (atomic_read(&proc_event_num_listeners) < 1)
+		return;
+
+	msg = buffer_to_cn_msg(buffer);
+	ev = (struct proc_event *)msg->data;
+	memset(&ev->event_data, 0, sizeof(ev->event_data));
+	ev->timestamp_ns = ktime_get_ns();
+	ev->what = PROC_EVENT_SID;
+	ev->event_data.sid.process_pid = task->pid;
+	ev->event_data.sid.process_tgid = task->tgid;
+
+	memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
+	msg->ack = 0; /* not used */
+	msg->len = sizeof(*ev);
+	msg->flags = 0; /* not used */
+	send_msg(msg);
+}
+
+void proc_ptrace_connector(struct task_struct *task, int ptrace_id)
+{
+	struct cn_msg *msg;
+	struct proc_event *ev;
+	__u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
+
+	if (atomic_read(&proc_event_num_listeners) < 1)
+		return;
+
+	msg = buffer_to_cn_msg(buffer);
+	ev = (struct proc_event *)msg->data;
+	memset(&ev->event_data, 0, sizeof(ev->event_data));
+	ev->timestamp_ns = ktime_get_ns();
+	ev->what = PROC_EVENT_PTRACE;
+	ev->event_data.ptrace.process_pid  = task->pid;
+	ev->event_data.ptrace.process_tgid = task->tgid;
+	if (ptrace_id == PTRACE_ATTACH) {
+		ev->event_data.ptrace.tracer_pid  = current->pid;
+		ev->event_data.ptrace.tracer_tgid = current->tgid;
+	} else if (ptrace_id == PTRACE_DETACH) {
+		ev->event_data.ptrace.tracer_pid  = 0;
+		ev->event_data.ptrace.tracer_tgid = 0;
+	} else
+		return;
+
+	memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
+	msg->ack = 0; /* not used */
+	msg->len = sizeof(*ev);
+	msg->flags = 0; /* not used */
+	send_msg(msg);
+}
+
+void proc_comm_connector(struct task_struct *task)
+{
+	struct cn_msg *msg;
+	struct proc_event *ev;
+	__u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
+
+	if (atomic_read(&proc_event_num_listeners) < 1)
+		return;
+
+	msg = buffer_to_cn_msg(buffer);
+	ev = (struct proc_event *)msg->data;
+	memset(&ev->event_data, 0, sizeof(ev->event_data));
+	ev->timestamp_ns = ktime_get_ns();
+	ev->what = PROC_EVENT_COMM;
+	ev->event_data.comm.process_pid  = task->pid;
+	ev->event_data.comm.process_tgid = task->tgid;
+	get_task_comm(ev->event_data.comm.comm, task);
+
+	memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
+	msg->ack = 0; /* not used */
+	msg->len = sizeof(*ev);
+	msg->flags = 0; /* not used */
+	send_msg(msg);
+}
+
+void proc_coredump_connector(struct task_struct *task)
+{
+	struct cn_msg *msg;
+	struct proc_event *ev;
+	struct task_struct *parent;
+	__u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
+
+	if (atomic_read(&proc_event_num_listeners) < 1)
+		return;
+
+	msg = buffer_to_cn_msg(buffer);
+	ev = (struct proc_event *)msg->data;
+	memset(&ev->event_data, 0, sizeof(ev->event_data));
+	ev->timestamp_ns = ktime_get_ns();
+	ev->what = PROC_EVENT_COREDUMP;
+	ev->event_data.coredump.process_pid = task->pid;
+	ev->event_data.coredump.process_tgid = task->tgid;
+
+	rcu_read_lock();
+	if (pid_alive(task)) {
+		parent = rcu_dereference(task->real_parent);
+		ev->event_data.coredump.parent_pid = parent->pid;
+		ev->event_data.coredump.parent_tgid = parent->tgid;
+	}
+	rcu_read_unlock();
+
+	memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
+	msg->ack = 0; /* not used */
+	msg->len = sizeof(*ev);
+	msg->flags = 0; /* not used */
+	send_msg(msg);
+}
+
+void proc_exit_connector(struct task_struct *task)
+{
+	struct cn_msg *msg;
+	struct proc_event *ev;
+	struct task_struct *parent;
+	__u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
+
+	if (atomic_read(&proc_event_num_listeners) < 1)
+		return;
+
+	msg = buffer_to_cn_msg(buffer);
+	ev = (struct proc_event *)msg->data;
+	memset(&ev->event_data, 0, sizeof(ev->event_data));
+	ev->timestamp_ns = ktime_get_ns();
+	ev->what = PROC_EVENT_EXIT;
+	ev->event_data.exit.process_pid = task->pid;
+	ev->event_data.exit.process_tgid = task->tgid;
+	ev->event_data.exit.exit_code = task->exit_code;
+	ev->event_data.exit.exit_signal = task->exit_signal;
+
+	rcu_read_lock();
+	if (pid_alive(task)) {
+		parent = rcu_dereference(task->real_parent);
+		ev->event_data.exit.parent_pid = parent->pid;
+		ev->event_data.exit.parent_tgid = parent->tgid;
+	}
+	rcu_read_unlock();
+
+	memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
+	msg->ack = 0; /* not used */
+	msg->len = sizeof(*ev);
+	msg->flags = 0; /* not used */
+	send_msg(msg);
+}
+
+/*
+ * Send an acknowledgement message to userspace
+ *
+ * Use 0 for success, EFOO otherwise.
+ * Note: this is the negative of conventional kernel error
+ * values because it's not being returned via syscall return
+ * mechanisms.
+ */
+static void cn_proc_ack(int err, int rcvd_seq, int rcvd_ack)
+{
+	struct cn_msg *msg;
+	struct proc_event *ev;
+	__u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
+
+	if (atomic_read(&proc_event_num_listeners) < 1)
+		return;
+
+	msg = buffer_to_cn_msg(buffer);
+	ev = (struct proc_event *)msg->data;
+	memset(&ev->event_data, 0, sizeof(ev->event_data));
+	msg->seq = rcvd_seq;
+	ev->timestamp_ns = ktime_get_ns();
+	ev->cpu = -1;
+	ev->what = PROC_EVENT_NONE;
+	ev->event_data.ack.err = err;
+	memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
+	msg->ack = rcvd_ack + 1;
+	msg->len = sizeof(*ev);
+	msg->flags = 0; /* not used */
+	send_msg(msg);
+}
+
+/**
+ * cn_proc_mcast_ctl
+ * @data: message sent from userspace via the connector
+ */
+static void cn_proc_mcast_ctl(struct cn_msg *msg,
+			      struct netlink_skb_parms *nsp)
+{
+	enum proc_cn_mcast_op *mc_op = NULL;
+	int err = 0;
+
+	if (msg->len != sizeof(*mc_op))
+		return;
+
+	/* 
+	 * Events are reported with respect to the initial pid
+	 * and user namespaces so ignore requestors from
+	 * other namespaces.
+	 */
+	if ((current_user_ns() != &init_user_ns) ||
+	    !task_is_in_init_pid_ns(current))
+		return;
+
+	/* Can only change if privileged. */
+	if (!__netlink_ns_capable(nsp, &init_user_ns, CAP_NET_ADMIN)) {
+		err = EPERM;
+		goto out;
+	}
+
+	mc_op = (enum proc_cn_mcast_op *)msg->data;
+	switch (*mc_op) {
+	case PROC_CN_MCAST_LISTEN:
+		atomic_inc(&proc_event_num_listeners);
+		break;
+	case PROC_CN_MCAST_IGNORE:
+		atomic_dec(&proc_event_num_listeners);
+		break;
+	default:
+		err = EINVAL;
+		break;
+	}
+
+out:
+	cn_proc_ack(err, msg->seq, msg->ack);
+}
+
+/*
+ * cn_proc_init - initialization entry point
+ *
+ * Adds the connector callback to the connector driver.
+ */
+static int __init cn_proc_init(void)
+{
+	int err = cn_add_callback(&cn_proc_event_id,
+				  "cn_proc",
+				  &cn_proc_mcast_ctl);
+	if (err) {
+		pr_warn("cn_proc failed to register\n");
+		return err;
+	}
+	return 0;
+}
+device_initcall(cn_proc_init);
diff --git a/drivers/connector/cn_queue.c b/drivers/connector/cn_queue.c
new file mode 100644
index 000000000..996f025eb
--- /dev/null
+++ b/drivers/connector/cn_queue.c
@@ -0,0 +1,147 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ *	cn_queue.c
+ *
+ * 2004+ Copyright (c) Evgeniy Polyakov <zbr@ioremap.net>
+ * All rights reserved.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
+#include <linux/slab.h>
+#include <linux/skbuff.h>
+#include <linux/suspend.h>
+#include <linux/connector.h>
+#include <linux/delay.h>
+
+static struct cn_callback_entry *
+cn_queue_alloc_callback_entry(struct cn_queue_dev *dev, const char *name,
+			      const struct cb_id *id,
+			      void (*callback)(struct cn_msg *,
+					       struct netlink_skb_parms *))
+{
+	struct cn_callback_entry *cbq;
+
+	cbq = kzalloc(sizeof(*cbq), GFP_KERNEL);
+	if (!cbq) {
+		pr_err("Failed to create new callback queue.\n");
+		return NULL;
+	}
+
+	refcount_set(&cbq->refcnt, 1);
+
+	atomic_inc(&dev->refcnt);
+	cbq->pdev = dev;
+
+	snprintf(cbq->id.name, sizeof(cbq->id.name), "%s", name);
+	memcpy(&cbq->id.id, id, sizeof(struct cb_id));
+	cbq->callback = callback;
+	return cbq;
+}
+
+void cn_queue_release_callback(struct cn_callback_entry *cbq)
+{
+	if (!refcount_dec_and_test(&cbq->refcnt))
+		return;
+
+	atomic_dec(&cbq->pdev->refcnt);
+	kfree(cbq);
+}
+
+int cn_cb_equal(const struct cb_id *i1, const struct cb_id *i2)
+{
+	return ((i1->idx == i2->idx) && (i1->val == i2->val));
+}
+
+int cn_queue_add_callback(struct cn_queue_dev *dev, const char *name,
+			  const struct cb_id *id,
+			  void (*callback)(struct cn_msg *,
+					   struct netlink_skb_parms *))
+{
+	struct cn_callback_entry *cbq, *__cbq;
+	int found = 0;
+
+	cbq = cn_queue_alloc_callback_entry(dev, name, id, callback);
+	if (!cbq)
+		return -ENOMEM;
+
+	spin_lock_bh(&dev->queue_lock);
+	list_for_each_entry(__cbq, &dev->queue_list, callback_entry) {
+		if (cn_cb_equal(&__cbq->id.id, id)) {
+			found = 1;
+			break;
+		}
+	}
+	if (!found)
+		list_add_tail(&cbq->callback_entry, &dev->queue_list);
+	spin_unlock_bh(&dev->queue_lock);
+
+	if (found) {
+		cn_queue_release_callback(cbq);
+		return -EINVAL;
+	}
+
+	cbq->seq = 0;
+	cbq->group = cbq->id.id.idx;
+
+	return 0;
+}
+
+void cn_queue_del_callback(struct cn_queue_dev *dev, const struct cb_id *id)
+{
+	struct cn_callback_entry *cbq, *n;
+	int found = 0;
+
+	spin_lock_bh(&dev->queue_lock);
+	list_for_each_entry_safe(cbq, n, &dev->queue_list, callback_entry) {
+		if (cn_cb_equal(&cbq->id.id, id)) {
+			list_del(&cbq->callback_entry);
+			found = 1;
+			break;
+		}
+	}
+	spin_unlock_bh(&dev->queue_lock);
+
+	if (found)
+		cn_queue_release_callback(cbq);
+}
+
+struct cn_queue_dev *cn_queue_alloc_dev(const char *name, struct sock *nls)
+{
+	struct cn_queue_dev *dev;
+
+	dev = kzalloc(sizeof(*dev), GFP_KERNEL);
+	if (!dev)
+		return NULL;
+
+	snprintf(dev->name, sizeof(dev->name), "%s", name);
+	atomic_set(&dev->refcnt, 0);
+	INIT_LIST_HEAD(&dev->queue_list);
+	spin_lock_init(&dev->queue_lock);
+
+	dev->nls = nls;
+
+	return dev;
+}
+
+void cn_queue_free_dev(struct cn_queue_dev *dev)
+{
+	struct cn_callback_entry *cbq, *n;
+
+	spin_lock_bh(&dev->queue_lock);
+	list_for_each_entry_safe(cbq, n, &dev->queue_list, callback_entry)
+		list_del(&cbq->callback_entry);
+	spin_unlock_bh(&dev->queue_lock);
+
+	while (atomic_read(&dev->refcnt)) {
+		pr_info("Waiting for %s to become free: refcnt=%d.\n",
+		       dev->name, atomic_read(&dev->refcnt));
+		msleep(1000);
+	}
+
+	kfree(dev);
+	dev = NULL;
+}
diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
new file mode 100644
index 000000000..48ec7ce6e
--- /dev/null
+++ b/drivers/connector/connector.c
@@ -0,0 +1,284 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ *	connector.c
+ *
+ * 2004+ Copyright (c) Evgeniy Polyakov <zbr@ioremap.net>
+ * All rights reserved.
+ */
+
+#include <linux/compiler.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/list.h>
+#include <linux/skbuff.h>
+#include <net/netlink.h>
+#include <linux/moduleparam.h>
+#include <linux/connector.h>
+#include <linux/slab.h>
+#include <linux/mutex.h>
+#include <linux/proc_fs.h>
+#include <linux/spinlock.h>
+
+#include <net/sock.h>
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Evgeniy Polyakov <zbr@ioremap.net>");
+MODULE_DESCRIPTION("Generic userspace <-> kernelspace connector.");
+MODULE_ALIAS_NET_PF_PROTO(PF_NETLINK, NETLINK_CONNECTOR);
+
+static struct cn_dev cdev;
+
+static int cn_already_initialized;
+
+/*
+ * Sends mult (multiple) cn_msg at a time.
+ *
+ * msg->seq and msg->ack are used to determine message genealogy.
+ * When someone sends message it puts there locally unique sequence
+ * and random acknowledge numbers.  Sequence number may be copied into
+ * nlmsghdr->nlmsg_seq too.
+ *
+ * Sequence number is incremented with each message to be sent.
+ *
+ * If we expect a reply to our message then the sequence number in
+ * received message MUST be the same as in original message, and
+ * acknowledge number MUST be the same + 1.
+ *
+ * If we receive a message and its sequence number is not equal to the
+ * one we are expecting then it is a new message.
+ *
+ * If we receive a message and its sequence number is the same as one
+ * we are expecting but it's acknowledgement number is not equal to
+ * the acknowledgement number in the original message + 1, then it is
+ * a new message.
+ *
+ * If msg->len != len, then additional cn_msg messages are expected following
+ * the first msg.
+ *
+ * The message is sent to, the portid if given, the group if given, both if
+ * both, or if both are zero then the group is looked up and sent there.
+ */
+int cn_netlink_send_mult(struct cn_msg *msg, u16 len, u32 portid, u32 __group,
+	gfp_t gfp_mask)
+{
+	struct cn_callback_entry *__cbq;
+	unsigned int size;
+	struct sk_buff *skb;
+	struct nlmsghdr *nlh;
+	struct cn_msg *data;
+	struct cn_dev *dev = &cdev;
+	u32 group = 0;
+	int found = 0;
+
+	if (portid || __group) {
+		group = __group;
+	} else {
+		spin_lock_bh(&dev->cbdev->queue_lock);
+		list_for_each_entry(__cbq, &dev->cbdev->queue_list,
+				    callback_entry) {
+			if (cn_cb_equal(&__cbq->id.id, &msg->id)) {
+				found = 1;
+				group = __cbq->group;
+				break;
+			}
+		}
+		spin_unlock_bh(&dev->cbdev->queue_lock);
+
+		if (!found)
+			return -ENODEV;
+	}
+
+	if (!portid && !netlink_has_listeners(dev->nls, group))
+		return -ESRCH;
+
+	size = sizeof(*msg) + len;
+
+	skb = nlmsg_new(size, gfp_mask);
+	if (!skb)
+		return -ENOMEM;
+
+	nlh = nlmsg_put(skb, 0, msg->seq, NLMSG_DONE, size, 0);
+	if (!nlh) {
+		kfree_skb(skb);
+		return -EMSGSIZE;
+	}
+
+	data = nlmsg_data(nlh);
+
+	memcpy(data, msg, size);
+
+	NETLINK_CB(skb).dst_group = group;
+
+	if (group)
+		return netlink_broadcast(dev->nls, skb, portid, group,
+					 gfp_mask);
+	return netlink_unicast(dev->nls, skb, portid,
+			!gfpflags_allow_blocking(gfp_mask));
+}
+EXPORT_SYMBOL_GPL(cn_netlink_send_mult);
+
+/* same as cn_netlink_send_mult except msg->len is used for len */
+int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group,
+	gfp_t gfp_mask)
+{
+	return cn_netlink_send_mult(msg, msg->len, portid, __group, gfp_mask);
+}
+EXPORT_SYMBOL_GPL(cn_netlink_send);
+
+/*
+ * Callback helper - queues work and setup destructor for given data.
+ */
+static int cn_call_callback(struct sk_buff *skb)
+{
+	struct nlmsghdr *nlh;
+	struct cn_callback_entry *i, *cbq = NULL;
+	struct cn_dev *dev = &cdev;
+	struct cn_msg *msg = nlmsg_data(nlmsg_hdr(skb));
+	struct netlink_skb_parms *nsp = &NETLINK_CB(skb);
+	int err = -ENODEV;
+
+	/* verify msg->len is within skb */
+	nlh = nlmsg_hdr(skb);
+	if (nlh->nlmsg_len < NLMSG_HDRLEN + sizeof(struct cn_msg) + msg->len)
+		return -EINVAL;
+
+	spin_lock_bh(&dev->cbdev->queue_lock);
+	list_for_each_entry(i, &dev->cbdev->queue_list, callback_entry) {
+		if (cn_cb_equal(&i->id.id, &msg->id)) {
+			refcount_inc(&i->refcnt);
+			cbq = i;
+			break;
+		}
+	}
+	spin_unlock_bh(&dev->cbdev->queue_lock);
+
+	if (cbq != NULL) {
+		cbq->callback(msg, nsp);
+		kfree_skb(skb);
+		cn_queue_release_callback(cbq);
+		err = 0;
+	}
+
+	return err;
+}
+
+/*
+ * Main netlink receiving function.
+ *
+ * It checks skb, netlink header and msg sizes, and calls callback helper.
+ */
+static void cn_rx_skb(struct sk_buff *skb)
+{
+	struct nlmsghdr *nlh;
+	int len, err;
+
+	if (skb->len >= NLMSG_HDRLEN) {
+		nlh = nlmsg_hdr(skb);
+		len = nlmsg_len(nlh);
+
+		if (len < (int)sizeof(struct cn_msg) ||
+		    skb->len < nlh->nlmsg_len ||
+		    len > CONNECTOR_MAX_MSG_SIZE)
+			return;
+
+		err = cn_call_callback(skb_get(skb));
+		if (err < 0)
+			kfree_skb(skb);
+	}
+}
+
+/*
+ * Callback add routing - adds callback with given ID and name.
+ * If there is registered callback with the same ID it will not be added.
+ *
+ * May sleep.
+ */
+int cn_add_callback(const struct cb_id *id, const char *name,
+		    void (*callback)(struct cn_msg *,
+				     struct netlink_skb_parms *))
+{
+	struct cn_dev *dev = &cdev;
+
+	if (!cn_already_initialized)
+		return -EAGAIN;
+
+	return cn_queue_add_callback(dev->cbdev, name, id, callback);
+}
+EXPORT_SYMBOL_GPL(cn_add_callback);
+
+/*
+ * Callback remove routing - removes callback
+ * with given ID.
+ * If there is no registered callback with given
+ * ID nothing happens.
+ *
+ * May sleep while waiting for reference counter to become zero.
+ */
+void cn_del_callback(const struct cb_id *id)
+{
+	struct cn_dev *dev = &cdev;
+
+	cn_queue_del_callback(dev->cbdev, id);
+}
+EXPORT_SYMBOL_GPL(cn_del_callback);
+
+static int __maybe_unused cn_proc_show(struct seq_file *m, void *v)
+{
+	struct cn_queue_dev *dev = cdev.cbdev;
+	struct cn_callback_entry *cbq;
+
+	seq_printf(m, "Name            ID\n");
+
+	spin_lock_bh(&dev->queue_lock);
+
+	list_for_each_entry(cbq, &dev->queue_list, callback_entry) {
+		seq_printf(m, "%-15s %u:%u\n",
+			   cbq->id.name,
+			   cbq->id.id.idx,
+			   cbq->id.id.val);
+	}
+
+	spin_unlock_bh(&dev->queue_lock);
+
+	return 0;
+}
+
+static int cn_init(void)
+{
+	struct cn_dev *dev = &cdev;
+	struct netlink_kernel_cfg cfg = {
+		.groups	= CN_NETLINK_USERS + 0xf,
+		.input	= cn_rx_skb,
+	};
+
+	dev->nls = netlink_kernel_create(&init_net, NETLINK_CONNECTOR, &cfg);
+	if (!dev->nls)
+		return -EIO;
+
+	dev->cbdev = cn_queue_alloc_dev("cqueue", dev->nls);
+	if (!dev->cbdev) {
+		netlink_kernel_release(dev->nls);
+		return -EINVAL;
+	}
+
+	cn_already_initialized = 1;
+
+	proc_create_single("connector", S_IRUGO, init_net.proc_net, cn_proc_show);
+
+	return 0;
+}
+
+static void cn_fini(void)
+{
+	struct cn_dev *dev = &cdev;
+
+	cn_already_initialized = 0;
+
+	remove_proc_entry("connector", init_net.proc_net);
+
+	cn_queue_free_dev(dev->cbdev);
+	netlink_kernel_release(dev->nls);
+}
+
+subsys_initcall(cn_init);
+module_exit(cn_fini);
author	Linus Torvalds <torvalds@linux-foundation.org>	2023-02-21 18:24:12 -0800
committer	Linus Torvalds <torvalds@linux-foundation.org>	2023-02-21 18:24:12 -0800
commit	5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 (patch)
tree	cc5c2d0a898769fd59549594fedb3ee6f84e59a0 /drivers/connector
download	linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.tar.gz linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.zip