From 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Tue, 21 Feb 2023 18:24:12 -0800 Subject: Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - Add dedicated kmem_cache for typical/small skb->head, avoid having to access struct page at kfree time, and improve memory use. - Introduce sysctl to set default RPS configuration for new netdevs. - Define Netlink protocol specification format which can be used to describe messages used by each family and auto-generate parsers. Add tools for generating kernel data structures and uAPI headers. - Expose all net/core sysctls inside netns. - Remove 4s sleep in netpoll if carrier is instantly detected on boot. - Add configurable limit of MDB entries per port, and port-vlan. - Continue populating drop reasons throughout the stack. - Retire a handful of legacy Qdiscs and classifiers. Protocols: - Support IPv4 big TCP (TSO frames larger than 64kB). - Add IP_LOCAL_PORT_RANGE socket option, to control local port range on socket by socket basis. - Track and report in procfs number of MPTCP sockets used. - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path manager. - IPv6: don't check net.ipv6.route.max_size and rely on garbage collection to free memory (similarly to IPv4). - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986). - ICMP: add per-rate limit counters. - Add support for user scanning requests in ieee802154. - Remove static WEP support. - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate reporting. - WiFi 7 EHT channel puncturing support (client & AP). BPF: - Add a rbtree data structure following the "next-gen data structure" precedent set by recently added linked list, that is, by using kfunc + kptr instead of adding a new BPF map type. - Expose XDP hints via kfuncs with initial support for RX hash and timestamp metadata. - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better support decap on GRE tunnel devices not operating in collect metadata. - Improve x86 JIT's codegen for PROBE_MEM runtime error checks. - Remove the need for trace_printk_lock for bpf_trace_printk and bpf_trace_vprintk helpers. - Extend libbpf's bpf_tracing.h support for tracing arguments of kprobes/uprobes and syscall as a special case. - Significantly reduce the search time for module symbols by livepatch and BPF. - Enable cpumasks to be used as kptrs, which is useful for tracing programs tracking which tasks end up running on which CPUs in different time intervals. - Add support for BPF trampoline on s390x and riscv64. - Add capability to export the XDP features supported by the NIC. - Add __bpf_kfunc tag for marking kernel functions as kfuncs. - Add cgroup.memory=nobpf kernel parameter option to disable BPF memory accounting for container environments. Netfilter: - Remove the CLUSTERIP target. It has been marked as obsolete for years, and we still have WARN splats wrt races of the out-of-band /proc interface installed by this target. - Add 'destroy' commands to nf_tables. They are identical to the existing 'delete' commands, but do not return an error if the referenced object (set, chain, rule...) did not exist. Driver API: - Improve cpumask_local_spread() locality to help NICs set the right IRQ affinity on AMD platforms. - Separate C22 and C45 MDIO bus transactions more clearly. - Introduce new DCB table to control DSCP rewrite on egress. - Support configuration of Physical Layer Collision Avoidance (PLCA) Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of shared medium Ethernet. - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing preemption of low priority frames by high priority frames. - Add support for controlling MACSec offload using netlink SET. - Rework devlink instance refcounts to allow registration and de-registration under the instance lock. Split the code into multiple files, drop some of the unnecessarily granular locks and factor out common parts of netlink operation handling. - Add TX frame aggregation parameters (for USB drivers). - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning messages with notifications for debug. - Allow offloading of UDP NEW connections via act_ct. - Add support for per action HW stats in TC. - Support hardware miss to TC action (continue processing in SW from a specific point in the action chain). - Warn if old Wireless Extension user space interface is used with modern cfg80211/mac80211 drivers. Do not support Wireless Extensions for Wi-Fi 7 devices at all. Everyone should switch to using nl80211 interface instead. - Improve the CAN bit timing configuration. Use extack to return error messages directly to user space, update the SJW handling, including the definition of a new default value that will benefit CAN-FD controllers, by increasing their oscillator tolerance. New hardware / drivers: - Ethernet: - nVidia BlueField-3 support (control traffic driver) - Ethernet support for imx93 SoCs - Motorcomm yt8531 gigabit Ethernet PHY - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA) - Microchip LAN8841 PHY (incl. cable diagnostics and PTP) - Amlogic gxl MDIO mux - WiFi: - RealTek RTL8188EU (rtl8xxxu) - Qualcomm Wi-Fi 7 devices (ath12k) - CAN: - Renesas R-Car V4H Drivers: - Bluetooth: - Set Per Platform Antenna Gain (PPAG) for Intel controllers. - Ethernet NICs: - Intel (1G, igc): - support TSN / Qbv / packet scheduling features of i226 model - Intel (100G, ice): - use GNSS subsystem instead of TTY - multi-buffer XDP support - extend support for GPIO pins to E823 devices - nVidia/Mellanox: - update the shared buffer configuration on PFC commands - implement PTP adjphase function for HW offset control - TC support for Geneve and GRE with VF tunnel offload - more efficient crypto key management method - multi-port eswitch support - Netronome/Corigine: - add DCB IEEE support - support IPsec offloading for NFP3800 - Freescale/NXP (enetc): - support XDP_REDIRECT for XDP non-linear buffers - improve reconfig, avoid link flap and waiting for idle - support MAC Merge layer - Other NICs: - sfc/ef100: add basic devlink support for ef100 - ionic: rx_push mode operation (writing descriptors via MMIO) - bnxt: use the auxiliary bus abstraction for RDMA - r8169: disable ASPM and reset bus in case of tx timeout - cpsw: support QSGMII mode for J721e CPSW9G - cpts: support pulse-per-second output - ngbe: add an mdio bus driver - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing - r8152: handle devices with FW with NCM support - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation - virtio-net: support multi buffer XDP - virtio/vsock: replace virtio_vsock_pkt with sk_buff - tsnep: XDP support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add support for latency TLV (in FW control messages) - Microchip (sparx5): - separate explicit and implicit traffic forwarding rules, make the implicit rules always active - add support for egress DSCP rewrite - IS0 VCAP support (Ingress Classification) - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS etc.) - ES2 VCAP support (Egress Access Control) - support for Per-Stream Filtering and Policing (802.1Q, 8.6.5.1) - Ethernet embedded switches: - Marvell (mv88e6xxx): - add MAB (port auth) offload support - enable PTP receive for mv88e6390 - NXP (ocelot): - support MAC Merge layer - support for the the vsc7512 internal copper phys - Microchip: - lan9303: convert to PHYLINK - lan966x: support TC flower filter statistics - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x - lan937x: support Credit Based Shaper configuration - ksz9477: support Energy Efficient Ethernet - other: - qca8k: convert to regmap read/write API, use bulk operations - rswitch: Improve TX timestamp accuracy - Intel WiFi (iwlwifi): - EHT (Wi-Fi 7) rate reporting - STEP equalizer support: transfer some STEP (connection to radio on platforms with integrated wifi) related parameters from the BIOS to the firmware. - Qualcomm 802.11ax WiFi (ath11k): - IPQ5018 support - Fine Timing Measurement (FTM) responder role support - channel 177 support - MediaTek WiFi (mt76): - per-PHY LED support - mt7996: EHT (Wi-Fi 7) support - Wireless Ethernet Dispatch (WED) reset support - switch to using page pool allocator - RealTek WiFi (rtw89): - support new version of Bluetooth co-existance - Mobile: - rmnet: support TX aggregation" * tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits) page_pool: add a comment explaining the fragment counter usage net: ethtool: fix __ethtool_dev_mm_supported() implementation ethtool: pse-pd: Fix double word in comments xsk: add linux/vmalloc.h to xsk.c sefltests: netdevsim: wait for devlink instance after netns removal selftest: fib_tests: Always cleanup before exit net/mlx5e: Align IPsec ASO result memory to be as required by hardware net/mlx5e: TC, Set CT miss to the specific ct action instance net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG net/mlx5: Refactor tc miss handling to a single function net/mlx5: Kconfig: Make tc offload depend on tc skb extension net/sched: flower: Support hardware miss to tc action net/sched: flower: Move filter handle initialization earlier net/sched: cls_api: Support hardware miss to tc action net/sched: Rename user cookie and act cookie sfc: fix builds without CONFIG_RTC_LIB sfc: clean up some inconsistent indentings net/mlx4_en: Introduce flexible array to silence overflow warning net: lan966x: Fix possible deadlock inside PTP net/ulp: Remove redundant ->clone() test in inet_clone_ulp(). ... --- arch/um/kernel/skas/Makefile | 17 ++ arch/um/kernel/skas/clone.c | 47 ++++++ arch/um/kernel/skas/mmu.c | 79 +++++++++ arch/um/kernel/skas/process.c | 55 +++++++ arch/um/kernel/skas/syscall.c | 50 ++++++ arch/um/kernel/skas/uaccess.c | 366 ++++++++++++++++++++++++++++++++++++++++++ 6 files changed, 614 insertions(+) create mode 100644 arch/um/kernel/skas/Makefile create mode 100644 arch/um/kernel/skas/clone.c create mode 100644 arch/um/kernel/skas/mmu.c create mode 100644 arch/um/kernel/skas/process.c create mode 100644 arch/um/kernel/skas/syscall.c create mode 100644 arch/um/kernel/skas/uaccess.c (limited to 'arch/um/kernel/skas') diff --git a/arch/um/kernel/skas/Makefile b/arch/um/kernel/skas/Makefile new file mode 100644 index 000000000..f3d494a4f --- /dev/null +++ b/arch/um/kernel/skas/Makefile @@ -0,0 +1,17 @@ +# SPDX-License-Identifier: GPL-2.0 +# +# Copyright (C) 2002 - 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com) +# + +obj-y := clone.o mmu.o process.o syscall.o uaccess.o + +# clone.o is in the stub, so it can't be built with profiling +# GCC hardened also auto-enables -fpic, but we need %ebx so it can't work -> +# disable it + +CFLAGS_clone.o := $(CFLAGS_NO_HARDENING) +UNPROFILE_OBJS := clone.o + +KCOV_INSTRUMENT := n + +include arch/um/scripts/Makefile.rules diff --git a/arch/um/kernel/skas/clone.c b/arch/um/kernel/skas/clone.c new file mode 100644 index 000000000..ff5061f29 --- /dev/null +++ b/arch/um/kernel/skas/clone.c @@ -0,0 +1,47 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2015 Thomas Meyer (thomas@m3y3r.de) + * Copyright (C) 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com) + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * This is in a separate file because it needs to be compiled with any + * extraneous gcc flags (-pg, -fprofile-arcs, -ftest-coverage) disabled + * + * Use UM_KERN_PAGE_SIZE instead of PAGE_SIZE because that calls getpagesize + * on some systems. + */ + +void __attribute__ ((__section__ (".__syscall_stub"))) +stub_clone_handler(void) +{ + struct stub_data *data = get_stub_page(); + long err; + + err = stub_syscall2(__NR_clone, CLONE_PARENT | CLONE_FILES | SIGCHLD, + (unsigned long)data + UM_KERN_PAGE_SIZE / 2); + if (err) { + data->parent_err = err; + goto done; + } + + err = stub_syscall4(__NR_ptrace, PTRACE_TRACEME, 0, 0, 0); + if (err) { + data->child_err = err; + goto done; + } + + remap_stack_and_trap(); + + done: + trap_myself(); +} diff --git a/arch/um/kernel/skas/mmu.c b/arch/um/kernel/skas/mmu.c new file mode 100644 index 000000000..125df465e --- /dev/null +++ b/arch/um/kernel/skas/mmu.c @@ -0,0 +1,79 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2015 Thomas Meyer (thomas@m3y3r.de) + * Copyright (C) 2002 - 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com) + */ + +#include +#include +#include + +#include +#include +#include +#include +#include + +int init_new_context(struct task_struct *task, struct mm_struct *mm) +{ + struct mm_context *from_mm = NULL; + struct mm_context *to_mm = &mm->context; + unsigned long stack = 0; + int ret = -ENOMEM; + + stack = get_zeroed_page(GFP_KERNEL); + if (stack == 0) + goto out; + + to_mm->id.stack = stack; + if (current->mm != NULL && current->mm != &init_mm) + from_mm = ¤t->mm->context; + + block_signals_trace(); + if (from_mm) + to_mm->id.u.pid = copy_context_skas0(stack, + from_mm->id.u.pid); + else to_mm->id.u.pid = start_userspace(stack); + unblock_signals_trace(); + + if (to_mm->id.u.pid < 0) { + ret = to_mm->id.u.pid; + goto out_free; + } + + ret = init_new_ldt(to_mm, from_mm); + if (ret < 0) { + printk(KERN_ERR "init_new_context_skas - init_ldt" + " failed, errno = %d\n", ret); + goto out_free; + } + + return 0; + + out_free: + if (to_mm->id.stack != 0) + free_page(to_mm->id.stack); + out: + return ret; +} + +void destroy_context(struct mm_struct *mm) +{ + struct mm_context *mmu = &mm->context; + + /* + * If init_new_context wasn't called, this will be + * zero, resulting in a kill(0), which will result in the + * whole UML suddenly dying. Also, cover negative and + * 1 cases, since they shouldn't happen either. + */ + if (mmu->id.u.pid < 2) { + printk(KERN_ERR "corrupt mm_context - pid = %d\n", + mmu->id.u.pid); + return; + } + os_kill_ptraced_process(mmu->id.u.pid, 1); + + free_page(mmu->id.stack); + free_ldt(mmu); +} diff --git a/arch/um/kernel/skas/process.c b/arch/um/kernel/skas/process.c new file mode 100644 index 000000000..f2ac134c9 --- /dev/null +++ b/arch/um/kernel/skas/process.c @@ -0,0 +1,55 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2002 - 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com) + */ + +#include +#include +#include +#include + +#include +#include +#include +#include + +extern void start_kernel(void); + +static int __init start_kernel_proc(void *unused) +{ + int pid; + + block_signals_trace(); + pid = os_getpid(); + + cpu_tasks[0].pid = pid; + cpu_tasks[0].task = current; + + start_kernel(); + return 0; +} + +extern int userspace_pid[]; + +extern char cpu0_irqstack[]; + +int __init start_uml(void) +{ + stack_protections((unsigned long) &cpu0_irqstack); + set_sigstack(cpu0_irqstack, THREAD_SIZE); + + init_new_thread_signals(); + + init_task.thread.request.u.thread.proc = start_kernel_proc; + init_task.thread.request.u.thread.arg = NULL; + return start_idle_thread(task_stack_page(&init_task), + &init_task.thread.switch_buf); +} + +unsigned long current_stub_stack(void) +{ + if (current->mm == NULL) + return 0; + + return current->mm->context.id.stack; +} diff --git a/arch/um/kernel/skas/syscall.c b/arch/um/kernel/skas/syscall.c new file mode 100644 index 000000000..9ee19e566 --- /dev/null +++ b/arch/um/kernel/skas/syscall.c @@ -0,0 +1,50 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2002 - 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com) + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +void handle_syscall(struct uml_pt_regs *r) +{ + struct pt_regs *regs = container_of(r, struct pt_regs, regs); + int syscall; + + /* + * If we have infinite CPU resources, then make every syscall also a + * preemption point, since we don't have any other preemption in this + * case, and kernel threads would basically never run until userspace + * went to sleep, even if said userspace interacts with the kernel in + * various ways. + */ + if (time_travel_mode == TT_MODE_INFCPU || + time_travel_mode == TT_MODE_EXTERNAL) + schedule(); + + /* Initialize the syscall number and default return value. */ + UPT_SYSCALL_NR(r) = PT_SYSCALL_NR(r->gp); + PT_REGS_SET_SYSCALL_RETURN(regs, -ENOSYS); + + if (syscall_trace_enter(regs)) + goto out; + + /* Do the seccomp check after ptrace; failures should be fast. */ + if (secure_computing() == -1) + goto out; + + syscall = UPT_SYSCALL_NR(r); + if (syscall >= 0 && syscall < __NR_syscalls) + PT_REGS_SET_SYSCALL_RETURN(regs, + EXECUTE_SYSCALL(syscall, regs)); + +out: + syscall_trace_leave(regs); +} diff --git a/arch/um/kernel/skas/uaccess.c b/arch/um/kernel/skas/uaccess.c new file mode 100644 index 000000000..aaee96f07 --- /dev/null +++ b/arch/um/kernel/skas/uaccess.c @@ -0,0 +1,366 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2002 - 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com) + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +pte_t *virt_to_pte(struct mm_struct *mm, unsigned long addr) +{ + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + + if (mm == NULL) + return NULL; + + pgd = pgd_offset(mm, addr); + if (!pgd_present(*pgd)) + return NULL; + + p4d = p4d_offset(pgd, addr); + if (!p4d_present(*p4d)) + return NULL; + + pud = pud_offset(p4d, addr); + if (!pud_present(*pud)) + return NULL; + + pmd = pmd_offset(pud, addr); + if (!pmd_present(*pmd)) + return NULL; + + return pte_offset_kernel(pmd, addr); +} + +static pte_t *maybe_map(unsigned long virt, int is_write) +{ + pte_t *pte = virt_to_pte(current->mm, virt); + int err, dummy_code; + + if ((pte == NULL) || !pte_present(*pte) || + (is_write && !pte_write(*pte))) { + err = handle_page_fault(virt, 0, is_write, 1, &dummy_code); + if (err) + return NULL; + pte = virt_to_pte(current->mm, virt); + } + if (!pte_present(*pte)) + pte = NULL; + + return pte; +} + +static int do_op_one_page(unsigned long addr, int len, int is_write, + int (*op)(unsigned long addr, int len, void *arg), void *arg) +{ + struct page *page; + pte_t *pte; + int n; + + pte = maybe_map(addr, is_write); + if (pte == NULL) + return -1; + + page = pte_page(*pte); +#ifdef CONFIG_64BIT + pagefault_disable(); + addr = (unsigned long) page_address(page) + + (addr & ~PAGE_MASK); +#else + addr = (unsigned long) kmap_atomic(page) + + (addr & ~PAGE_MASK); +#endif + n = (*op)(addr, len, arg); + +#ifdef CONFIG_64BIT + pagefault_enable(); +#else + kunmap_atomic((void *)addr); +#endif + + return n; +} + +static long buffer_op(unsigned long addr, int len, int is_write, + int (*op)(unsigned long, int, void *), void *arg) +{ + long size, remain, n; + + size = min(PAGE_ALIGN(addr) - addr, (unsigned long) len); + remain = len; + + n = do_op_one_page(addr, size, is_write, op, arg); + if (n != 0) { + remain = (n < 0 ? remain : 0); + goto out; + } + + addr += size; + remain -= size; + if (remain == 0) + goto out; + + while (addr < ((addr + remain) & PAGE_MASK)) { + n = do_op_one_page(addr, PAGE_SIZE, is_write, op, arg); + if (n != 0) { + remain = (n < 0 ? remain : 0); + goto out; + } + + addr += PAGE_SIZE; + remain -= PAGE_SIZE; + } + if (remain == 0) + goto out; + + n = do_op_one_page(addr, remain, is_write, op, arg); + if (n != 0) { + remain = (n < 0 ? remain : 0); + goto out; + } + + return 0; + out: + return remain; +} + +static int copy_chunk_from_user(unsigned long from, int len, void *arg) +{ + unsigned long *to_ptr = arg, to = *to_ptr; + + memcpy((void *) to, (void *) from, len); + *to_ptr += len; + return 0; +} + +unsigned long raw_copy_from_user(void *to, const void __user *from, unsigned long n) +{ + return buffer_op((unsigned long) from, n, 0, copy_chunk_from_user, &to); +} +EXPORT_SYMBOL(raw_copy_from_user); + +static int copy_chunk_to_user(unsigned long to, int len, void *arg) +{ + unsigned long *from_ptr = arg, from = *from_ptr; + + memcpy((void *) to, (void *) from, len); + *from_ptr += len; + return 0; +} + +unsigned long raw_copy_to_user(void __user *to, const void *from, unsigned long n) +{ + return buffer_op((unsigned long) to, n, 1, copy_chunk_to_user, &from); +} +EXPORT_SYMBOL(raw_copy_to_user); + +static int strncpy_chunk_from_user(unsigned long from, int len, void *arg) +{ + char **to_ptr = arg, *to = *to_ptr; + int n; + + strncpy(to, (void *) from, len); + n = strnlen(to, len); + *to_ptr += n; + + if (n < len) + return 1; + return 0; +} + +long strncpy_from_user(char *dst, const char __user *src, long count) +{ + long n; + char *ptr = dst; + + if (!access_ok(src, 1)) + return -EFAULT; + n = buffer_op((unsigned long) src, count, 0, strncpy_chunk_from_user, + &ptr); + if (n != 0) + return -EFAULT; + return strnlen(dst, count); +} +EXPORT_SYMBOL(strncpy_from_user); + +static int clear_chunk(unsigned long addr, int len, void *unused) +{ + memset((void *) addr, 0, len); + return 0; +} + +unsigned long __clear_user(void __user *mem, unsigned long len) +{ + return buffer_op((unsigned long) mem, len, 1, clear_chunk, NULL); +} +EXPORT_SYMBOL(__clear_user); + +static int strnlen_chunk(unsigned long str, int len, void *arg) +{ + int *len_ptr = arg, n; + + n = strnlen((void *) str, len); + *len_ptr += n; + + if (n < len) + return 1; + return 0; +} + +long strnlen_user(const char __user *str, long len) +{ + int count = 0, n; + + if (!access_ok(str, 1)) + return -EFAULT; + n = buffer_op((unsigned long) str, len, 0, strnlen_chunk, &count); + if (n == 0) + return count + 1; + return 0; +} +EXPORT_SYMBOL(strnlen_user); + +/** + * arch_futex_atomic_op_inuser() - Atomic arithmetic operation with constant + * argument and comparison of the previous + * futex value with another constant. + * + * @encoded_op: encoded operation to execute + * @uaddr: pointer to user space address + * + * Return: + * 0 - On success + * -EFAULT - User access resulted in a page fault + * -EAGAIN - Atomic operation was unable to complete due to contention + * -ENOSYS - Operation not supported + */ + +int arch_futex_atomic_op_inuser(int op, u32 oparg, int *oval, u32 __user *uaddr) +{ + int oldval, ret; + struct page *page; + unsigned long addr = (unsigned long) uaddr; + pte_t *pte; + + ret = -EFAULT; + if (!access_ok(uaddr, sizeof(*uaddr))) + return -EFAULT; + preempt_disable(); + pte = maybe_map(addr, 1); + if (pte == NULL) + goto out_inuser; + + page = pte_page(*pte); +#ifdef CONFIG_64BIT + pagefault_disable(); + addr = (unsigned long) page_address(page) + + (((unsigned long) addr) & ~PAGE_MASK); +#else + addr = (unsigned long) kmap_atomic(page) + + ((unsigned long) addr & ~PAGE_MASK); +#endif + uaddr = (u32 *) addr; + oldval = *uaddr; + + ret = 0; + + switch (op) { + case FUTEX_OP_SET: + *uaddr = oparg; + break; + case FUTEX_OP_ADD: + *uaddr += oparg; + break; + case FUTEX_OP_OR: + *uaddr |= oparg; + break; + case FUTEX_OP_ANDN: + *uaddr &= ~oparg; + break; + case FUTEX_OP_XOR: + *uaddr ^= oparg; + break; + default: + ret = -ENOSYS; + } +#ifdef CONFIG_64BIT + pagefault_enable(); +#else + kunmap_atomic((void *)addr); +#endif + +out_inuser: + preempt_enable(); + + if (ret == 0) + *oval = oldval; + + return ret; +} +EXPORT_SYMBOL(arch_futex_atomic_op_inuser); + +/** + * futex_atomic_cmpxchg_inatomic() - Compare and exchange the content of the + * uaddr with newval if the current value is + * oldval. + * @uval: pointer to store content of @uaddr + * @uaddr: pointer to user space address + * @oldval: old value + * @newval: new value to store to @uaddr + * + * Return: + * 0 - On success + * -EFAULT - User access resulted in a page fault + * -EAGAIN - Atomic operation was unable to complete due to contention + */ + +int futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, + u32 oldval, u32 newval) +{ + struct page *page; + pte_t *pte; + int ret = -EFAULT; + + if (!access_ok(uaddr, sizeof(*uaddr))) + return -EFAULT; + + preempt_disable(); + pte = maybe_map((unsigned long) uaddr, 1); + if (pte == NULL) + goto out_inatomic; + + page = pte_page(*pte); +#ifdef CONFIG_64BIT + pagefault_disable(); + uaddr = page_address(page) + (((unsigned long) uaddr) & ~PAGE_MASK); +#else + uaddr = kmap_atomic(page) + ((unsigned long) uaddr & ~PAGE_MASK); +#endif + + *uval = *uaddr; + + ret = cmpxchg(uaddr, oldval, newval); + +#ifdef CONFIG_64BIT + pagefault_enable(); +#else + kunmap_atomic(uaddr); +#endif + ret = 0; + +out_inatomic: + preempt_enable(); + return ret; +} +EXPORT_SYMBOL(futex_atomic_cmpxchg_inatomic); -- cgit v1.2.3