From 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Tue, 21 Feb 2023 18:24:12 -0800 Subject: Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - Add dedicated kmem_cache for typical/small skb->head, avoid having to access struct page at kfree time, and improve memory use. - Introduce sysctl to set default RPS configuration for new netdevs. - Define Netlink protocol specification format which can be used to describe messages used by each family and auto-generate parsers. Add tools for generating kernel data structures and uAPI headers. - Expose all net/core sysctls inside netns. - Remove 4s sleep in netpoll if carrier is instantly detected on boot. - Add configurable limit of MDB entries per port, and port-vlan. - Continue populating drop reasons throughout the stack. - Retire a handful of legacy Qdiscs and classifiers. Protocols: - Support IPv4 big TCP (TSO frames larger than 64kB). - Add IP_LOCAL_PORT_RANGE socket option, to control local port range on socket by socket basis. - Track and report in procfs number of MPTCP sockets used. - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path manager. - IPv6: don't check net.ipv6.route.max_size and rely on garbage collection to free memory (similarly to IPv4). - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986). - ICMP: add per-rate limit counters. - Add support for user scanning requests in ieee802154. - Remove static WEP support. - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate reporting. - WiFi 7 EHT channel puncturing support (client & AP). BPF: - Add a rbtree data structure following the "next-gen data structure" precedent set by recently added linked list, that is, by using kfunc + kptr instead of adding a new BPF map type. - Expose XDP hints via kfuncs with initial support for RX hash and timestamp metadata. - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better support decap on GRE tunnel devices not operating in collect metadata. - Improve x86 JIT's codegen for PROBE_MEM runtime error checks. - Remove the need for trace_printk_lock for bpf_trace_printk and bpf_trace_vprintk helpers. - Extend libbpf's bpf_tracing.h support for tracing arguments of kprobes/uprobes and syscall as a special case. - Significantly reduce the search time for module symbols by livepatch and BPF. - Enable cpumasks to be used as kptrs, which is useful for tracing programs tracking which tasks end up running on which CPUs in different time intervals. - Add support for BPF trampoline on s390x and riscv64. - Add capability to export the XDP features supported by the NIC. - Add __bpf_kfunc tag for marking kernel functions as kfuncs. - Add cgroup.memory=nobpf kernel parameter option to disable BPF memory accounting for container environments. Netfilter: - Remove the CLUSTERIP target. It has been marked as obsolete for years, and we still have WARN splats wrt races of the out-of-band /proc interface installed by this target. - Add 'destroy' commands to nf_tables. They are identical to the existing 'delete' commands, but do not return an error if the referenced object (set, chain, rule...) did not exist. Driver API: - Improve cpumask_local_spread() locality to help NICs set the right IRQ affinity on AMD platforms. - Separate C22 and C45 MDIO bus transactions more clearly. - Introduce new DCB table to control DSCP rewrite on egress. - Support configuration of Physical Layer Collision Avoidance (PLCA) Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of shared medium Ethernet. - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing preemption of low priority frames by high priority frames. - Add support for controlling MACSec offload using netlink SET. - Rework devlink instance refcounts to allow registration and de-registration under the instance lock. Split the code into multiple files, drop some of the unnecessarily granular locks and factor out common parts of netlink operation handling. - Add TX frame aggregation parameters (for USB drivers). - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning messages with notifications for debug. - Allow offloading of UDP NEW connections via act_ct. - Add support for per action HW stats in TC. - Support hardware miss to TC action (continue processing in SW from a specific point in the action chain). - Warn if old Wireless Extension user space interface is used with modern cfg80211/mac80211 drivers. Do not support Wireless Extensions for Wi-Fi 7 devices at all. Everyone should switch to using nl80211 interface instead. - Improve the CAN bit timing configuration. Use extack to return error messages directly to user space, update the SJW handling, including the definition of a new default value that will benefit CAN-FD controllers, by increasing their oscillator tolerance. New hardware / drivers: - Ethernet: - nVidia BlueField-3 support (control traffic driver) - Ethernet support for imx93 SoCs - Motorcomm yt8531 gigabit Ethernet PHY - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA) - Microchip LAN8841 PHY (incl. cable diagnostics and PTP) - Amlogic gxl MDIO mux - WiFi: - RealTek RTL8188EU (rtl8xxxu) - Qualcomm Wi-Fi 7 devices (ath12k) - CAN: - Renesas R-Car V4H Drivers: - Bluetooth: - Set Per Platform Antenna Gain (PPAG) for Intel controllers. - Ethernet NICs: - Intel (1G, igc): - support TSN / Qbv / packet scheduling features of i226 model - Intel (100G, ice): - use GNSS subsystem instead of TTY - multi-buffer XDP support - extend support for GPIO pins to E823 devices - nVidia/Mellanox: - update the shared buffer configuration on PFC commands - implement PTP adjphase function for HW offset control - TC support for Geneve and GRE with VF tunnel offload - more efficient crypto key management method - multi-port eswitch support - Netronome/Corigine: - add DCB IEEE support - support IPsec offloading for NFP3800 - Freescale/NXP (enetc): - support XDP_REDIRECT for XDP non-linear buffers - improve reconfig, avoid link flap and waiting for idle - support MAC Merge layer - Other NICs: - sfc/ef100: add basic devlink support for ef100 - ionic: rx_push mode operation (writing descriptors via MMIO) - bnxt: use the auxiliary bus abstraction for RDMA - r8169: disable ASPM and reset bus in case of tx timeout - cpsw: support QSGMII mode for J721e CPSW9G - cpts: support pulse-per-second output - ngbe: add an mdio bus driver - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing - r8152: handle devices with FW with NCM support - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation - virtio-net: support multi buffer XDP - virtio/vsock: replace virtio_vsock_pkt with sk_buff - tsnep: XDP support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add support for latency TLV (in FW control messages) - Microchip (sparx5): - separate explicit and implicit traffic forwarding rules, make the implicit rules always active - add support for egress DSCP rewrite - IS0 VCAP support (Ingress Classification) - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS etc.) - ES2 VCAP support (Egress Access Control) - support for Per-Stream Filtering and Policing (802.1Q, 8.6.5.1) - Ethernet embedded switches: - Marvell (mv88e6xxx): - add MAB (port auth) offload support - enable PTP receive for mv88e6390 - NXP (ocelot): - support MAC Merge layer - support for the the vsc7512 internal copper phys - Microchip: - lan9303: convert to PHYLINK - lan966x: support TC flower filter statistics - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x - lan937x: support Credit Based Shaper configuration - ksz9477: support Energy Efficient Ethernet - other: - qca8k: convert to regmap read/write API, use bulk operations - rswitch: Improve TX timestamp accuracy - Intel WiFi (iwlwifi): - EHT (Wi-Fi 7) rate reporting - STEP equalizer support: transfer some STEP (connection to radio on platforms with integrated wifi) related parameters from the BIOS to the firmware. - Qualcomm 802.11ax WiFi (ath11k): - IPQ5018 support - Fine Timing Measurement (FTM) responder role support - channel 177 support - MediaTek WiFi (mt76): - per-PHY LED support - mt7996: EHT (Wi-Fi 7) support - Wireless Ethernet Dispatch (WED) reset support - switch to using page pool allocator - RealTek WiFi (rtw89): - support new version of Bluetooth co-existance - Mobile: - rmnet: support TX aggregation" * tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits) page_pool: add a comment explaining the fragment counter usage net: ethtool: fix __ethtool_dev_mm_supported() implementation ethtool: pse-pd: Fix double word in comments xsk: add linux/vmalloc.h to xsk.c sefltests: netdevsim: wait for devlink instance after netns removal selftest: fib_tests: Always cleanup before exit net/mlx5e: Align IPsec ASO result memory to be as required by hardware net/mlx5e: TC, Set CT miss to the specific ct action instance net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG net/mlx5: Refactor tc miss handling to a single function net/mlx5: Kconfig: Make tc offload depend on tc skb extension net/sched: flower: Support hardware miss to tc action net/sched: flower: Move filter handle initialization earlier net/sched: cls_api: Support hardware miss to tc action net/sched: Rename user cookie and act cookie sfc: fix builds without CONFIG_RTC_LIB sfc: clean up some inconsistent indentings net/mlx4_en: Introduce flexible array to silence overflow warning net: lan966x: Fix possible deadlock inside PTP net/ulp: Remove redundant ->clone() test in inet_clone_ulp(). ... --- arch/um/kernel/um_arch.c | 536 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 536 insertions(+) create mode 100644 arch/um/kernel/um_arch.c (limited to 'arch/um/kernel/um_arch.c') diff --git a/arch/um/kernel/um_arch.c b/arch/um/kernel/um_arch.c new file mode 100644 index 000000000..786b44dc2 --- /dev/null +++ b/arch/um/kernel/um_arch.c @@ -0,0 +1,536 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2000 - 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com) + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "um_arch.h" + +#define DEFAULT_COMMAND_LINE_ROOT "root=98:0" +#define DEFAULT_COMMAND_LINE_CONSOLE "console=tty0" + +/* Changed in add_arg and setup_arch, which run before SMP is started */ +static char __initdata command_line[COMMAND_LINE_SIZE] = { 0 }; + +static void __init add_arg(char *arg) +{ + if (strlen(command_line) + strlen(arg) + 1 > COMMAND_LINE_SIZE) { + os_warn("add_arg: Too many command line arguments!\n"); + exit(1); + } + if (strlen(command_line) > 0) + strcat(command_line, " "); + strcat(command_line, arg); +} + +/* + * These fields are initialized at boot time and not changed. + * XXX This structure is used only in the non-SMP case. Maybe this + * should be moved to smp.c. + */ +struct cpuinfo_um boot_cpu_data = { + .loops_per_jiffy = 0, + .ipi_pipe = { -1, -1 }, + .cache_alignment = L1_CACHE_BYTES, + .x86_capability = { 0 } +}; + +EXPORT_SYMBOL(boot_cpu_data); + +union thread_union cpu0_irqstack + __section(".data..init_irqstack") = + { .thread_info = INIT_THREAD_INFO(init_task) }; + +/* Changed in setup_arch, which is called in early boot */ +static char host_info[(__NEW_UTS_LEN + 1) * 5]; + +static int show_cpuinfo(struct seq_file *m, void *v) +{ + int i = 0; + + seq_printf(m, "processor\t: %d\n", i); + seq_printf(m, "vendor_id\t: User Mode Linux\n"); + seq_printf(m, "model name\t: UML\n"); + seq_printf(m, "mode\t\t: skas\n"); + seq_printf(m, "host\t\t: %s\n", host_info); + seq_printf(m, "fpu\t\t: %s\n", cpu_has(&boot_cpu_data, X86_FEATURE_FPU) ? "yes" : "no"); + seq_printf(m, "flags\t\t:"); + for (i = 0; i < 32*NCAPINTS; i++) + if (cpu_has(&boot_cpu_data, i) && (x86_cap_flags[i] != NULL)) + seq_printf(m, " %s", x86_cap_flags[i]); + seq_printf(m, "\n"); + seq_printf(m, "cache_alignment\t: %d\n", boot_cpu_data.cache_alignment); + seq_printf(m, "bogomips\t: %lu.%02lu\n", + loops_per_jiffy/(500000/HZ), + (loops_per_jiffy/(5000/HZ)) % 100); + + + return 0; +} + +static void *c_start(struct seq_file *m, loff_t *pos) +{ + return *pos < nr_cpu_ids ? cpu_data + *pos : NULL; +} + +static void *c_next(struct seq_file *m, void *v, loff_t *pos) +{ + ++*pos; + return c_start(m, pos); +} + +static void c_stop(struct seq_file *m, void *v) +{ +} + +const struct seq_operations cpuinfo_op = { + .start = c_start, + .next = c_next, + .stop = c_stop, + .show = show_cpuinfo, +}; + +/* Set in linux_main */ +unsigned long uml_physmem; +EXPORT_SYMBOL(uml_physmem); + +unsigned long uml_reserved; /* Also modified in mem_init */ +unsigned long start_vm; +unsigned long end_vm; + +/* Set in uml_ncpus_setup */ +int ncpus = 1; + +/* Set in early boot */ +static int have_root __initdata; +static int have_console __initdata; + +/* Set in uml_mem_setup and modified in linux_main */ +long long physmem_size = 64 * 1024 * 1024; +EXPORT_SYMBOL(physmem_size); + +static const char *usage_string = +"User Mode Linux v%s\n" +" available at http://user-mode-linux.sourceforge.net/\n\n"; + +static int __init uml_version_setup(char *line, int *add) +{ + /* Explicitly use printf() to show version in stdout */ + printf("%s\n", init_utsname()->release); + exit(0); + + return 0; +} + +__uml_setup("--version", uml_version_setup, +"--version\n" +" Prints the version number of the kernel.\n\n" +); + +static int __init uml_root_setup(char *line, int *add) +{ + have_root = 1; + return 0; +} + +__uml_setup("root=", uml_root_setup, +"root=\n" +" This is actually used by the generic kernel in exactly the same\n" +" way as in any other kernel. If you configure a number of block\n" +" devices and want to boot off something other than ubd0, you \n" +" would use something like:\n" +" root=/dev/ubd5\n\n" +); + +static int __init no_skas_debug_setup(char *line, int *add) +{ + os_warn("'debug' is not necessary to gdb UML in skas mode - run\n"); + os_warn("'gdb linux'\n"); + + return 0; +} + +__uml_setup("debug", no_skas_debug_setup, +"debug\n" +" this flag is not needed to run gdb on UML in skas mode\n\n" +); + +static int __init uml_console_setup(char *line, int *add) +{ + have_console = 1; + return 0; +} + +__uml_setup("console=", uml_console_setup, +"console=\n" +" Specify the preferred console output driver\n\n" +); + +static int __init Usage(char *line, int *add) +{ + const char **p; + + printf(usage_string, init_utsname()->release); + p = &__uml_help_start; + /* Explicitly use printf() to show help in stdout */ + while (p < &__uml_help_end) { + printf("%s", *p); + p++; + } + exit(0); + return 0; +} + +__uml_setup("--help", Usage, +"--help\n" +" Prints this message.\n\n" +); + +static void __init uml_checksetup(char *line, int *add) +{ + struct uml_param *p; + + p = &__uml_setup_start; + while (p < &__uml_setup_end) { + size_t n; + + n = strlen(p->str); + if (!strncmp(line, p->str, n) && p->setup_func(line + n, add)) + return; + p++; + } +} + +static void __init uml_postsetup(void) +{ + initcall_t *p; + + p = &__uml_postsetup_start; + while (p < &__uml_postsetup_end) { + (*p)(); + p++; + } + return; +} + +static int panic_exit(struct notifier_block *self, unsigned long unused1, + void *unused2) +{ + kmsg_dump(KMSG_DUMP_PANIC); + bust_spinlocks(1); + bust_spinlocks(0); + uml_exitcode = 1; + os_dump_core(); + + return NOTIFY_DONE; +} + +static struct notifier_block panic_exit_notifier = { + .notifier_call = panic_exit, + .priority = INT_MAX - 1, /* run as 2nd notifier, won't return */ +}; + +void uml_finishsetup(void) +{ + atomic_notifier_chain_register(&panic_notifier_list, + &panic_exit_notifier); + + uml_postsetup(); + + new_thread_handler(); +} + +/* Set during early boot */ +unsigned long stub_start; +unsigned long task_size; +EXPORT_SYMBOL(task_size); + +unsigned long host_task_size; + +unsigned long brk_start; +unsigned long end_iomem; +EXPORT_SYMBOL(end_iomem); + +#define MIN_VMALLOC (32 * 1024 * 1024) + +static void parse_host_cpu_flags(char *line) +{ + int i; + for (i = 0; i < 32*NCAPINTS; i++) { + if ((x86_cap_flags[i] != NULL) && strstr(line, x86_cap_flags[i])) + set_cpu_cap(&boot_cpu_data, i); + } +} +static void parse_cache_line(char *line) +{ + long res; + char *to_parse = strstr(line, ":"); + if (to_parse) { + to_parse++; + while (*to_parse != 0 && isspace(*to_parse)) { + to_parse++; + } + if (kstrtoul(to_parse, 10, &res) == 0 && is_power_of_2(res)) + boot_cpu_data.cache_alignment = res; + else + boot_cpu_data.cache_alignment = L1_CACHE_BYTES; + } +} + +int __init linux_main(int argc, char **argv) +{ + unsigned long avail, diff; + unsigned long virtmem_size, max_physmem; + unsigned long stack; + unsigned int i; + int add; + + for (i = 1; i < argc; i++) { + if ((i == 1) && (argv[i][0] == ' ')) + continue; + add = 1; + uml_checksetup(argv[i], &add); + if (add) + add_arg(argv[i]); + } + if (have_root == 0) + add_arg(DEFAULT_COMMAND_LINE_ROOT); + + if (have_console == 0) + add_arg(DEFAULT_COMMAND_LINE_CONSOLE); + + host_task_size = os_get_top_address(); + /* reserve two pages for the stubs */ + host_task_size -= 2 * PAGE_SIZE; + stub_start = host_task_size; + + /* + * TASK_SIZE needs to be PGDIR_SIZE aligned or else exit_mmap craps + * out + */ + task_size = host_task_size & PGDIR_MASK; + + /* OS sanity checks that need to happen before the kernel runs */ + os_early_checks(); + + get_host_cpu_features(parse_host_cpu_flags, parse_cache_line); + + brk_start = (unsigned long) sbrk(0); + + /* + * Increase physical memory size for exec-shield users + * so they actually get what they asked for. This should + * add zero for non-exec shield users + */ + + diff = UML_ROUND_UP(brk_start) - UML_ROUND_UP(&_end); + if (diff > 1024 * 1024) { + os_info("Adding %ld bytes to physical memory to account for " + "exec-shield gap\n", diff); + physmem_size += UML_ROUND_UP(brk_start) - UML_ROUND_UP(&_end); + } + + uml_physmem = (unsigned long) __binary_start & PAGE_MASK; + + /* Reserve up to 4M after the current brk */ + uml_reserved = ROUND_4M(brk_start) + (1 << 22); + + setup_machinename(init_utsname()->machine); + + highmem = 0; + iomem_size = (iomem_size + PAGE_SIZE - 1) & PAGE_MASK; + max_physmem = TASK_SIZE - uml_physmem - iomem_size - MIN_VMALLOC; + + /* + * Zones have to begin on a 1 << MAX_ORDER page boundary, + * so this makes sure that's true for highmem + */ + max_physmem &= ~((1 << (PAGE_SHIFT + MAX_ORDER)) - 1); + if (physmem_size + iomem_size > max_physmem) { + highmem = physmem_size + iomem_size - max_physmem; + physmem_size -= highmem; + } + + high_physmem = uml_physmem + physmem_size; + end_iomem = high_physmem + iomem_size; + high_memory = (void *) end_iomem; + + start_vm = VMALLOC_START; + + virtmem_size = physmem_size; + stack = (unsigned long) argv; + stack &= ~(1024 * 1024 - 1); + avail = stack - start_vm; + if (physmem_size > avail) + virtmem_size = avail; + end_vm = start_vm + virtmem_size; + + if (virtmem_size < physmem_size) + os_info("Kernel virtual memory size shrunk to %lu bytes\n", + virtmem_size); + + os_flush_stdout(); + + return start_uml(); +} + +int __init __weak read_initrd(void) +{ + return 0; +} + +void __init setup_arch(char **cmdline_p) +{ + u8 rng_seed[32]; + + stack_protections((unsigned long) &init_thread_info); + setup_physmem(uml_physmem, uml_reserved, physmem_size, highmem); + mem_total_pages(physmem_size, iomem_size, highmem); + uml_dtb_init(); + read_initrd(); + + paging_init(); + strscpy(boot_command_line, command_line, COMMAND_LINE_SIZE); + *cmdline_p = command_line; + setup_hostinfo(host_info, sizeof host_info); + + if (os_getrandom(rng_seed, sizeof(rng_seed), 0) == sizeof(rng_seed)) { + add_bootloader_randomness(rng_seed, sizeof(rng_seed)); + memzero_explicit(rng_seed, sizeof(rng_seed)); + } +} + +void __init check_bugs(void) +{ + arch_check_bugs(); + os_check_bugs(); +} + +void apply_ibt_endbr(s32 *start, s32 *end) +{ +} + +void apply_retpolines(s32 *start, s32 *end) +{ +} + +void apply_returns(s32 *start, s32 *end) +{ +} + +void apply_fineibt(s32 *start_retpoline, s32 *end_retpoline, + s32 *start_cfi, s32 *end_cfi) +{ +} + +void apply_alternatives(struct alt_instr *start, struct alt_instr *end) +{ +} + +void *text_poke(void *addr, const void *opcode, size_t len) +{ + /* + * In UML, the only reference to this function is in + * apply_relocate_add(), which shouldn't ever actually call this + * because UML doesn't have live patching. + */ + WARN_ON(1); + + return memcpy(addr, opcode, len); +} + +void text_poke_sync(void) +{ +} + +void uml_pm_wake(void) +{ + pm_system_wakeup(); +} + +#ifdef CONFIG_PM_SLEEP +static int um_suspend_valid(suspend_state_t state) +{ + return state == PM_SUSPEND_MEM; +} + +static int um_suspend_prepare(void) +{ + um_irqs_suspend(); + return 0; +} + +static int um_suspend_enter(suspend_state_t state) +{ + if (WARN_ON(state != PM_SUSPEND_MEM)) + return -EINVAL; + + /* + * This is identical to the idle sleep, but we've just + * (during suspend) turned off all interrupt sources + * except for the ones we want, so now we can only wake + * up on something we actually want to wake up on. All + * timing has also been suspended. + */ + um_idle_sleep(); + return 0; +} + +static void um_suspend_finish(void) +{ + um_irqs_resume(); +} + +const struct platform_suspend_ops um_suspend_ops = { + .valid = um_suspend_valid, + .prepare = um_suspend_prepare, + .enter = um_suspend_enter, + .finish = um_suspend_finish, +}; + +static int init_pm_wake_signal(void) +{ + /* + * In external time-travel mode we can't use signals to wake up + * since that would mess with the scheduling. We'll have to do + * some additional work to support wakeup on virtio devices or + * similar, perhaps implementing a fake RTC controller that can + * trigger wakeup (and request the appropriate scheduling from + * the external scheduler when going to suspend.) + */ + if (time_travel_mode != TT_MODE_EXTERNAL) + register_pm_wake_signal(); + + suspend_set_ops(&um_suspend_ops); + + return 0; +} + +late_initcall(init_pm_wake_signal); +#endif -- cgit v1.2.3