diff options
author | 2023-02-21 18:24:12 -0800 | |
---|---|---|
committer | 2023-02-21 18:24:12 -0800 | |
commit | 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 (patch) | |
tree | cc5c2d0a898769fd59549594fedb3ee6f84e59a0 /arch/mips/lib/csum_partial.S | |
download | linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.tar.gz linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.zip |
Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextgrafted
Pull networking updates from Jakub Kicinski:
"Core:
- Add dedicated kmem_cache for typical/small skb->head, avoid having
to access struct page at kfree time, and improve memory use.
- Introduce sysctl to set default RPS configuration for new netdevs.
- Define Netlink protocol specification format which can be used to
describe messages used by each family and auto-generate parsers.
Add tools for generating kernel data structures and uAPI headers.
- Expose all net/core sysctls inside netns.
- Remove 4s sleep in netpoll if carrier is instantly detected on
boot.
- Add configurable limit of MDB entries per port, and port-vlan.
- Continue populating drop reasons throughout the stack.
- Retire a handful of legacy Qdiscs and classifiers.
Protocols:
- Support IPv4 big TCP (TSO frames larger than 64kB).
- Add IP_LOCAL_PORT_RANGE socket option, to control local port range
on socket by socket basis.
- Track and report in procfs number of MPTCP sockets used.
- Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path
manager.
- IPv6: don't check net.ipv6.route.max_size and rely on garbage
collection to free memory (similarly to IPv4).
- Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).
- ICMP: add per-rate limit counters.
- Add support for user scanning requests in ieee802154.
- Remove static WEP support.
- Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
reporting.
- WiFi 7 EHT channel puncturing support (client & AP).
BPF:
- Add a rbtree data structure following the "next-gen data structure"
precedent set by recently added linked list, that is, by using
kfunc + kptr instead of adding a new BPF map type.
- Expose XDP hints via kfuncs with initial support for RX hash and
timestamp metadata.
- Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to
better support decap on GRE tunnel devices not operating in collect
metadata.
- Improve x86 JIT's codegen for PROBE_MEM runtime error checks.
- Remove the need for trace_printk_lock for bpf_trace_printk and
bpf_trace_vprintk helpers.
- Extend libbpf's bpf_tracing.h support for tracing arguments of
kprobes/uprobes and syscall as a special case.
- Significantly reduce the search time for module symbols by
livepatch and BPF.
- Enable cpumasks to be used as kptrs, which is useful for tracing
programs tracking which tasks end up running on which CPUs in
different time intervals.
- Add support for BPF trampoline on s390x and riscv64.
- Add capability to export the XDP features supported by the NIC.
- Add __bpf_kfunc tag for marking kernel functions as kfuncs.
- Add cgroup.memory=nobpf kernel parameter option to disable BPF
memory accounting for container environments.
Netfilter:
- Remove the CLUSTERIP target. It has been marked as obsolete for
years, and we still have WARN splats wrt races of the out-of-band
/proc interface installed by this target.
- Add 'destroy' commands to nf_tables. They are identical to the
existing 'delete' commands, but do not return an error if the
referenced object (set, chain, rule...) did not exist.
Driver API:
- Improve cpumask_local_spread() locality to help NICs set the right
IRQ affinity on AMD platforms.
- Separate C22 and C45 MDIO bus transactions more clearly.
- Introduce new DCB table to control DSCP rewrite on egress.
- Support configuration of Physical Layer Collision Avoidance (PLCA)
Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
shared medium Ethernet.
- Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
preemption of low priority frames by high priority frames.
- Add support for controlling MACSec offload using netlink SET.
- Rework devlink instance refcounts to allow registration and
de-registration under the instance lock. Split the code into
multiple files, drop some of the unnecessarily granular locks and
factor out common parts of netlink operation handling.
- Add TX frame aggregation parameters (for USB drivers).
- Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
messages with notifications for debug.
- Allow offloading of UDP NEW connections via act_ct.
- Add support for per action HW stats in TC.
- Support hardware miss to TC action (continue processing in SW from
a specific point in the action chain).
- Warn if old Wireless Extension user space interface is used with
modern cfg80211/mac80211 drivers. Do not support Wireless
Extensions for Wi-Fi 7 devices at all. Everyone should switch to
using nl80211 interface instead.
- Improve the CAN bit timing configuration. Use extack to return
error messages directly to user space, update the SJW handling,
including the definition of a new default value that will benefit
CAN-FD controllers, by increasing their oscillator tolerance.
New hardware / drivers:
- Ethernet:
- nVidia BlueField-3 support (control traffic driver)
- Ethernet support for imx93 SoCs
- Motorcomm yt8531 gigabit Ethernet PHY
- onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
- Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
- Amlogic gxl MDIO mux
- WiFi:
- RealTek RTL8188EU (rtl8xxxu)
- Qualcomm Wi-Fi 7 devices (ath12k)
- CAN:
- Renesas R-Car V4H
Drivers:
- Bluetooth:
- Set Per Platform Antenna Gain (PPAG) for Intel controllers.
- Ethernet NICs:
- Intel (1G, igc):
- support TSN / Qbv / packet scheduling features of i226 model
- Intel (100G, ice):
- use GNSS subsystem instead of TTY
- multi-buffer XDP support
- extend support for GPIO pins to E823 devices
- nVidia/Mellanox:
- update the shared buffer configuration on PFC commands
- implement PTP adjphase function for HW offset control
- TC support for Geneve and GRE with VF tunnel offload
- more efficient crypto key management method
- multi-port eswitch support
- Netronome/Corigine:
- add DCB IEEE support
- support IPsec offloading for NFP3800
- Freescale/NXP (enetc):
- support XDP_REDIRECT for XDP non-linear buffers
- improve reconfig, avoid link flap and waiting for idle
- support MAC Merge layer
- Other NICs:
- sfc/ef100: add basic devlink support for ef100
- ionic: rx_push mode operation (writing descriptors via MMIO)
- bnxt: use the auxiliary bus abstraction for RDMA
- r8169: disable ASPM and reset bus in case of tx timeout
- cpsw: support QSGMII mode for J721e CPSW9G
- cpts: support pulse-per-second output
- ngbe: add an mdio bus driver
- usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
- r8152: handle devices with FW with NCM support
- amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
- virtio-net: support multi buffer XDP
- virtio/vsock: replace virtio_vsock_pkt with sk_buff
- tsnep: XDP support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add support for latency TLV (in FW control messages)
- Microchip (sparx5):
- separate explicit and implicit traffic forwarding rules, make
the implicit rules always active
- add support for egress DSCP rewrite
- IS0 VCAP support (Ingress Classification)
- IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS
etc.)
- ES2 VCAP support (Egress Access Control)
- support for Per-Stream Filtering and Policing (802.1Q,
8.6.5.1)
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- add MAB (port auth) offload support
- enable PTP receive for mv88e6390
- NXP (ocelot):
- support MAC Merge layer
- support for the the vsc7512 internal copper phys
- Microchip:
- lan9303: convert to PHYLINK
- lan966x: support TC flower filter statistics
- lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
- lan937x: support Credit Based Shaper configuration
- ksz9477: support Energy Efficient Ethernet
- other:
- qca8k: convert to regmap read/write API, use bulk operations
- rswitch: Improve TX timestamp accuracy
- Intel WiFi (iwlwifi):
- EHT (Wi-Fi 7) rate reporting
- STEP equalizer support: transfer some STEP (connection to radio
on platforms with integrated wifi) related parameters from the
BIOS to the firmware.
- Qualcomm 802.11ax WiFi (ath11k):
- IPQ5018 support
- Fine Timing Measurement (FTM) responder role support
- channel 177 support
- MediaTek WiFi (mt76):
- per-PHY LED support
- mt7996: EHT (Wi-Fi 7) support
- Wireless Ethernet Dispatch (WED) reset support
- switch to using page pool allocator
- RealTek WiFi (rtw89):
- support new version of Bluetooth co-existance
- Mobile:
- rmnet: support TX aggregation"
* tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits)
page_pool: add a comment explaining the fragment counter usage
net: ethtool: fix __ethtool_dev_mm_supported() implementation
ethtool: pse-pd: Fix double word in comments
xsk: add linux/vmalloc.h to xsk.c
sefltests: netdevsim: wait for devlink instance after netns removal
selftest: fib_tests: Always cleanup before exit
net/mlx5e: Align IPsec ASO result memory to be as required by hardware
net/mlx5e: TC, Set CT miss to the specific ct action instance
net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
net/mlx5: Refactor tc miss handling to a single function
net/mlx5: Kconfig: Make tc offload depend on tc skb extension
net/sched: flower: Support hardware miss to tc action
net/sched: flower: Move filter handle initialization earlier
net/sched: cls_api: Support hardware miss to tc action
net/sched: Rename user cookie and act cookie
sfc: fix builds without CONFIG_RTC_LIB
sfc: clean up some inconsistent indentings
net/mlx4_en: Introduce flexible array to silence overflow warning
net: lan966x: Fix possible deadlock inside PTP
net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
...
Diffstat (limited to 'arch/mips/lib/csum_partial.S')
-rw-r--r-- | arch/mips/lib/csum_partial.S | 754 |
1 files changed, 754 insertions, 0 deletions
diff --git a/arch/mips/lib/csum_partial.S b/arch/mips/lib/csum_partial.S new file mode 100644 index 000000000..7767137c3 --- /dev/null +++ b/arch/mips/lib/csum_partial.S @@ -0,0 +1,754 @@ +/* + * This file is subject to the terms and conditions of the GNU General Public + * License. See the file "COPYING" in the main directory of this archive + * for more details. + * + * Quick'n'dirty IP checksum ... + * + * Copyright (C) 1998, 1999 Ralf Baechle + * Copyright (C) 1999 Silicon Graphics, Inc. + * Copyright (C) 2007 Maciej W. Rozycki + * Copyright (C) 2014 Imagination Technologies Ltd. + */ +#include <linux/errno.h> +#include <asm/asm.h> +#include <asm/asm-offsets.h> +#include <asm/export.h> +#include <asm/regdef.h> + +#ifdef CONFIG_64BIT +/* + * As we are sharing code base with the mips32 tree (which use the o32 ABI + * register definitions). We need to redefine the register definitions from + * the n64 ABI register naming to the o32 ABI register naming. + */ +#undef t0 +#undef t1 +#undef t2 +#undef t3 +#define t0 $8 +#define t1 $9 +#define t2 $10 +#define t3 $11 +#define t4 $12 +#define t5 $13 +#define t6 $14 +#define t7 $15 + +#define USE_DOUBLE +#endif + +#ifdef USE_DOUBLE + +#define LOAD ld +#define LOAD32 lwu +#define ADD daddu +#define NBYTES 8 + +#else + +#define LOAD lw +#define LOAD32 lw +#define ADD addu +#define NBYTES 4 + +#endif /* USE_DOUBLE */ + +#define UNIT(unit) ((unit)*NBYTES) + +#define ADDC(sum,reg) \ + .set push; \ + .set noat; \ + ADD sum, reg; \ + sltu v1, sum, reg; \ + ADD sum, v1; \ + .set pop + +#define ADDC32(sum,reg) \ + .set push; \ + .set noat; \ + addu sum, reg; \ + sltu v1, sum, reg; \ + addu sum, v1; \ + .set pop + +#define CSUM_BIGCHUNK1(src, offset, sum, _t0, _t1, _t2, _t3) \ + LOAD _t0, (offset + UNIT(0))(src); \ + LOAD _t1, (offset + UNIT(1))(src); \ + LOAD _t2, (offset + UNIT(2))(src); \ + LOAD _t3, (offset + UNIT(3))(src); \ + ADDC(_t0, _t1); \ + ADDC(_t2, _t3); \ + ADDC(sum, _t0); \ + ADDC(sum, _t2) + +#ifdef USE_DOUBLE +#define CSUM_BIGCHUNK(src, offset, sum, _t0, _t1, _t2, _t3) \ + CSUM_BIGCHUNK1(src, offset, sum, _t0, _t1, _t2, _t3) +#else +#define CSUM_BIGCHUNK(src, offset, sum, _t0, _t1, _t2, _t3) \ + CSUM_BIGCHUNK1(src, offset, sum, _t0, _t1, _t2, _t3); \ + CSUM_BIGCHUNK1(src, offset + 0x10, sum, _t0, _t1, _t2, _t3) +#endif + +/* + * a0: source address + * a1: length of the area to checksum + * a2: partial checksum + */ + +#define src a0 +#define sum v0 + + .text + .set noreorder + .align 5 +LEAF(csum_partial) +EXPORT_SYMBOL(csum_partial) + move sum, zero + move t7, zero + + sltiu t8, a1, 0x8 + bnez t8, .Lsmall_csumcpy /* < 8 bytes to copy */ + move t2, a1 + + andi t7, src, 0x1 /* odd buffer? */ + +.Lhword_align: + beqz t7, .Lword_align + andi t8, src, 0x2 + + lbu t0, (src) + LONG_SUBU a1, a1, 0x1 +#ifdef __MIPSEL__ + sll t0, t0, 8 +#endif + ADDC(sum, t0) + PTR_ADDU src, src, 0x1 + andi t8, src, 0x2 + +.Lword_align: + beqz t8, .Ldword_align + sltiu t8, a1, 56 + + lhu t0, (src) + LONG_SUBU a1, a1, 0x2 + ADDC(sum, t0) + sltiu t8, a1, 56 + PTR_ADDU src, src, 0x2 + +.Ldword_align: + bnez t8, .Ldo_end_words + move t8, a1 + + andi t8, src, 0x4 + beqz t8, .Lqword_align + andi t8, src, 0x8 + + LOAD32 t0, 0x00(src) + LONG_SUBU a1, a1, 0x4 + ADDC(sum, t0) + PTR_ADDU src, src, 0x4 + andi t8, src, 0x8 + +.Lqword_align: + beqz t8, .Loword_align + andi t8, src, 0x10 + +#ifdef USE_DOUBLE + ld t0, 0x00(src) + LONG_SUBU a1, a1, 0x8 + ADDC(sum, t0) +#else + lw t0, 0x00(src) + lw t1, 0x04(src) + LONG_SUBU a1, a1, 0x8 + ADDC(sum, t0) + ADDC(sum, t1) +#endif + PTR_ADDU src, src, 0x8 + andi t8, src, 0x10 + +.Loword_align: + beqz t8, .Lbegin_movement + LONG_SRL t8, a1, 0x7 + +#ifdef USE_DOUBLE + ld t0, 0x00(src) + ld t1, 0x08(src) + ADDC(sum, t0) + ADDC(sum, t1) +#else + CSUM_BIGCHUNK1(src, 0x00, sum, t0, t1, t3, t4) +#endif + LONG_SUBU a1, a1, 0x10 + PTR_ADDU src, src, 0x10 + LONG_SRL t8, a1, 0x7 + +.Lbegin_movement: + beqz t8, 1f + andi t2, a1, 0x40 + +.Lmove_128bytes: + CSUM_BIGCHUNK(src, 0x00, sum, t0, t1, t3, t4) + CSUM_BIGCHUNK(src, 0x20, sum, t0, t1, t3, t4) + CSUM_BIGCHUNK(src, 0x40, sum, t0, t1, t3, t4) + CSUM_BIGCHUNK(src, 0x60, sum, t0, t1, t3, t4) + LONG_SUBU t8, t8, 0x01 + .set reorder /* DADDI_WAR */ + PTR_ADDU src, src, 0x80 + bnez t8, .Lmove_128bytes + .set noreorder + +1: + beqz t2, 1f + andi t2, a1, 0x20 + +.Lmove_64bytes: + CSUM_BIGCHUNK(src, 0x00, sum, t0, t1, t3, t4) + CSUM_BIGCHUNK(src, 0x20, sum, t0, t1, t3, t4) + PTR_ADDU src, src, 0x40 + +1: + beqz t2, .Ldo_end_words + andi t8, a1, 0x1c + +.Lmove_32bytes: + CSUM_BIGCHUNK(src, 0x00, sum, t0, t1, t3, t4) + andi t8, a1, 0x1c + PTR_ADDU src, src, 0x20 + +.Ldo_end_words: + beqz t8, .Lsmall_csumcpy + andi t2, a1, 0x3 + LONG_SRL t8, t8, 0x2 + +.Lend_words: + LOAD32 t0, (src) + LONG_SUBU t8, t8, 0x1 + ADDC(sum, t0) + .set reorder /* DADDI_WAR */ + PTR_ADDU src, src, 0x4 + bnez t8, .Lend_words + .set noreorder + +/* unknown src alignment and < 8 bytes to go */ +.Lsmall_csumcpy: + move a1, t2 + + andi t0, a1, 4 + beqz t0, 1f + andi t0, a1, 2 + + /* Still a full word to go */ + ulw t1, (src) + PTR_ADDIU src, 4 +#ifdef USE_DOUBLE + dsll t1, t1, 32 /* clear lower 32bit */ +#endif + ADDC(sum, t1) + +1: move t1, zero + beqz t0, 1f + andi t0, a1, 1 + + /* Still a halfword to go */ + ulhu t1, (src) + PTR_ADDIU src, 2 + +1: beqz t0, 1f + sll t1, t1, 16 + + lbu t2, (src) + nop + +#ifdef __MIPSEB__ + sll t2, t2, 8 +#endif + or t1, t2 + +1: ADDC(sum, t1) + + /* fold checksum */ +#ifdef USE_DOUBLE + dsll32 v1, sum, 0 + daddu sum, v1 + sltu v1, sum, v1 + dsra32 sum, sum, 0 + addu sum, v1 +#endif + + /* odd buffer alignment? */ +#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR5) || \ + defined(CONFIG_CPU_LOONGSON64) + .set push + .set arch=mips32r2 + wsbh v1, sum + movn sum, v1, t7 + .set pop +#else + beqz t7, 1f /* odd buffer alignment? */ + lui v1, 0x00ff + addu v1, 0x00ff + and t0, sum, v1 + sll t0, t0, 8 + srl sum, sum, 8 + and sum, sum, v1 + or sum, sum, t0 +1: +#endif + .set reorder + /* Add the passed partial csum. */ + ADDC32(sum, a2) + jr ra + .set noreorder + END(csum_partial) + + +/* + * checksum and copy routines based on memcpy.S + * + * csum_partial_copy_nocheck(src, dst, len) + * __csum_partial_copy_kernel(src, dst, len) + * + * See "Spec" in memcpy.S for details. Unlike __copy_user, all + * function in this file use the standard calling convention. + */ + +#define src a0 +#define dst a1 +#define len a2 +#define sum v0 +#define odd t8 + +/* + * All exception handlers simply return 0. + */ + +/* Instruction type */ +#define LD_INSN 1 +#define ST_INSN 2 +#define LEGACY_MODE 1 +#define EVA_MODE 2 +#define USEROP 1 +#define KERNELOP 2 + +/* + * Wrapper to add an entry in the exception table + * in case the insn causes a memory exception. + * Arguments: + * insn : Load/store instruction + * type : Instruction type + * reg : Register + * addr : Address + * handler : Exception handler + */ +#define EXC(insn, type, reg, addr) \ + .if \mode == LEGACY_MODE; \ +9: insn reg, addr; \ + .section __ex_table,"a"; \ + PTR_WD 9b, .L_exc; \ + .previous; \ + /* This is enabled in EVA mode */ \ + .else; \ + /* If loading from user or storing to user */ \ + .if ((\from == USEROP) && (type == LD_INSN)) || \ + ((\to == USEROP) && (type == ST_INSN)); \ +9: __BUILD_EVA_INSN(insn##e, reg, addr); \ + .section __ex_table,"a"; \ + PTR_WD 9b, .L_exc; \ + .previous; \ + .else; \ + /* EVA without exception */ \ + insn reg, addr; \ + .endif; \ + .endif + +#undef LOAD + +#ifdef USE_DOUBLE + +#define LOADK ld /* No exception */ +#define LOAD(reg, addr) EXC(ld, LD_INSN, reg, addr) +#define LOADBU(reg, addr) EXC(lbu, LD_INSN, reg, addr) +#define LOADL(reg, addr) EXC(ldl, LD_INSN, reg, addr) +#define LOADR(reg, addr) EXC(ldr, LD_INSN, reg, addr) +#define STOREB(reg, addr) EXC(sb, ST_INSN, reg, addr) +#define STOREL(reg, addr) EXC(sdl, ST_INSN, reg, addr) +#define STORER(reg, addr) EXC(sdr, ST_INSN, reg, addr) +#define STORE(reg, addr) EXC(sd, ST_INSN, reg, addr) +#define ADD daddu +#define SUB dsubu +#define SRL dsrl +#define SLL dsll +#define SLLV dsllv +#define SRLV dsrlv +#define NBYTES 8 +#define LOG_NBYTES 3 + +#else + +#define LOADK lw /* No exception */ +#define LOAD(reg, addr) EXC(lw, LD_INSN, reg, addr) +#define LOADBU(reg, addr) EXC(lbu, LD_INSN, reg, addr) +#define LOADL(reg, addr) EXC(lwl, LD_INSN, reg, addr) +#define LOADR(reg, addr) EXC(lwr, LD_INSN, reg, addr) +#define STOREB(reg, addr) EXC(sb, ST_INSN, reg, addr) +#define STOREL(reg, addr) EXC(swl, ST_INSN, reg, addr) +#define STORER(reg, addr) EXC(swr, ST_INSN, reg, addr) +#define STORE(reg, addr) EXC(sw, ST_INSN, reg, addr) +#define ADD addu +#define SUB subu +#define SRL srl +#define SLL sll +#define SLLV sllv +#define SRLV srlv +#define NBYTES 4 +#define LOG_NBYTES 2 + +#endif /* USE_DOUBLE */ + +#ifdef CONFIG_CPU_LITTLE_ENDIAN +#define LDFIRST LOADR +#define LDREST LOADL +#define STFIRST STORER +#define STREST STOREL +#define SHIFT_DISCARD SLLV +#define SHIFT_DISCARD_REVERT SRLV +#else +#define LDFIRST LOADL +#define LDREST LOADR +#define STFIRST STOREL +#define STREST STORER +#define SHIFT_DISCARD SRLV +#define SHIFT_DISCARD_REVERT SLLV +#endif + +#define FIRST(unit) ((unit)*NBYTES) +#define REST(unit) (FIRST(unit)+NBYTES-1) + +#define ADDRMASK (NBYTES-1) + +#ifndef CONFIG_CPU_DADDI_WORKAROUNDS + .set noat +#else + .set at=v1 +#endif + + .macro __BUILD_CSUM_PARTIAL_COPY_USER mode, from, to + + li sum, -1 + move odd, zero + /* + * Note: dst & src may be unaligned, len may be 0 + * Temps + */ + /* + * The "issue break"s below are very approximate. + * Issue delays for dcache fills will perturb the schedule, as will + * load queue full replay traps, etc. + * + * If len < NBYTES use byte operations. + */ + sltu t2, len, NBYTES + and t1, dst, ADDRMASK + bnez t2, .Lcopy_bytes_checklen\@ + and t0, src, ADDRMASK + andi odd, dst, 0x1 /* odd buffer? */ + bnez t1, .Ldst_unaligned\@ + nop + bnez t0, .Lsrc_unaligned_dst_aligned\@ + /* + * use delay slot for fall-through + * src and dst are aligned; need to compute rem + */ +.Lboth_aligned\@: + SRL t0, len, LOG_NBYTES+3 # +3 for 8 units/iter + beqz t0, .Lcleanup_both_aligned\@ # len < 8*NBYTES + nop + SUB len, 8*NBYTES # subtract here for bgez loop + .align 4 +1: + LOAD(t0, UNIT(0)(src)) + LOAD(t1, UNIT(1)(src)) + LOAD(t2, UNIT(2)(src)) + LOAD(t3, UNIT(3)(src)) + LOAD(t4, UNIT(4)(src)) + LOAD(t5, UNIT(5)(src)) + LOAD(t6, UNIT(6)(src)) + LOAD(t7, UNIT(7)(src)) + SUB len, len, 8*NBYTES + ADD src, src, 8*NBYTES + STORE(t0, UNIT(0)(dst)) + ADDC(t0, t1) + STORE(t1, UNIT(1)(dst)) + ADDC(sum, t0) + STORE(t2, UNIT(2)(dst)) + ADDC(t2, t3) + STORE(t3, UNIT(3)(dst)) + ADDC(sum, t2) + STORE(t4, UNIT(4)(dst)) + ADDC(t4, t5) + STORE(t5, UNIT(5)(dst)) + ADDC(sum, t4) + STORE(t6, UNIT(6)(dst)) + ADDC(t6, t7) + STORE(t7, UNIT(7)(dst)) + ADDC(sum, t6) + .set reorder /* DADDI_WAR */ + ADD dst, dst, 8*NBYTES + bgez len, 1b + .set noreorder + ADD len, 8*NBYTES # revert len (see above) + + /* + * len == the number of bytes left to copy < 8*NBYTES + */ +.Lcleanup_both_aligned\@: +#define rem t7 + beqz len, .Ldone\@ + sltu t0, len, 4*NBYTES + bnez t0, .Lless_than_4units\@ + and rem, len, (NBYTES-1) # rem = len % NBYTES + /* + * len >= 4*NBYTES + */ + LOAD(t0, UNIT(0)(src)) + LOAD(t1, UNIT(1)(src)) + LOAD(t2, UNIT(2)(src)) + LOAD(t3, UNIT(3)(src)) + SUB len, len, 4*NBYTES + ADD src, src, 4*NBYTES + STORE(t0, UNIT(0)(dst)) + ADDC(t0, t1) + STORE(t1, UNIT(1)(dst)) + ADDC(sum, t0) + STORE(t2, UNIT(2)(dst)) + ADDC(t2, t3) + STORE(t3, UNIT(3)(dst)) + ADDC(sum, t2) + .set reorder /* DADDI_WAR */ + ADD dst, dst, 4*NBYTES + beqz len, .Ldone\@ + .set noreorder +.Lless_than_4units\@: + /* + * rem = len % NBYTES + */ + beq rem, len, .Lcopy_bytes\@ + nop +1: + LOAD(t0, 0(src)) + ADD src, src, NBYTES + SUB len, len, NBYTES + STORE(t0, 0(dst)) + ADDC(sum, t0) + .set reorder /* DADDI_WAR */ + ADD dst, dst, NBYTES + bne rem, len, 1b + .set noreorder + + /* + * src and dst are aligned, need to copy rem bytes (rem < NBYTES) + * A loop would do only a byte at a time with possible branch + * mispredicts. Can't do an explicit LOAD dst,mask,or,STORE + * because can't assume read-access to dst. Instead, use + * STREST dst, which doesn't require read access to dst. + * + * This code should perform better than a simple loop on modern, + * wide-issue mips processors because the code has fewer branches and + * more instruction-level parallelism. + */ +#define bits t2 + beqz len, .Ldone\@ + ADD t1, dst, len # t1 is just past last byte of dst + li bits, 8*NBYTES + SLL rem, len, 3 # rem = number of bits to keep + LOAD(t0, 0(src)) + SUB bits, bits, rem # bits = number of bits to discard + SHIFT_DISCARD t0, t0, bits + STREST(t0, -1(t1)) + SHIFT_DISCARD_REVERT t0, t0, bits + .set reorder + ADDC(sum, t0) + b .Ldone\@ + .set noreorder +.Ldst_unaligned\@: + /* + * dst is unaligned + * t0 = src & ADDRMASK + * t1 = dst & ADDRMASK; T1 > 0 + * len >= NBYTES + * + * Copy enough bytes to align dst + * Set match = (src and dst have same alignment) + */ +#define match rem + LDFIRST(t3, FIRST(0)(src)) + ADD t2, zero, NBYTES + LDREST(t3, REST(0)(src)) + SUB t2, t2, t1 # t2 = number of bytes copied + xor match, t0, t1 + STFIRST(t3, FIRST(0)(dst)) + SLL t4, t1, 3 # t4 = number of bits to discard + SHIFT_DISCARD t3, t3, t4 + /* no SHIFT_DISCARD_REVERT to handle odd buffer properly */ + ADDC(sum, t3) + beq len, t2, .Ldone\@ + SUB len, len, t2 + ADD dst, dst, t2 + beqz match, .Lboth_aligned\@ + ADD src, src, t2 + +.Lsrc_unaligned_dst_aligned\@: + SRL t0, len, LOG_NBYTES+2 # +2 for 4 units/iter + beqz t0, .Lcleanup_src_unaligned\@ + and rem, len, (4*NBYTES-1) # rem = len % 4*NBYTES +1: +/* + * Avoid consecutive LD*'s to the same register since some mips + * implementations can't issue them in the same cycle. + * It's OK to load FIRST(N+1) before REST(N) because the two addresses + * are to the same unit (unless src is aligned, but it's not). + */ + LDFIRST(t0, FIRST(0)(src)) + LDFIRST(t1, FIRST(1)(src)) + SUB len, len, 4*NBYTES + LDREST(t0, REST(0)(src)) + LDREST(t1, REST(1)(src)) + LDFIRST(t2, FIRST(2)(src)) + LDFIRST(t3, FIRST(3)(src)) + LDREST(t2, REST(2)(src)) + LDREST(t3, REST(3)(src)) + ADD src, src, 4*NBYTES +#ifdef CONFIG_CPU_SB1 + nop # improves slotting +#endif + STORE(t0, UNIT(0)(dst)) + ADDC(t0, t1) + STORE(t1, UNIT(1)(dst)) + ADDC(sum, t0) + STORE(t2, UNIT(2)(dst)) + ADDC(t2, t3) + STORE(t3, UNIT(3)(dst)) + ADDC(sum, t2) + .set reorder /* DADDI_WAR */ + ADD dst, dst, 4*NBYTES + bne len, rem, 1b + .set noreorder + +.Lcleanup_src_unaligned\@: + beqz len, .Ldone\@ + and rem, len, NBYTES-1 # rem = len % NBYTES + beq rem, len, .Lcopy_bytes\@ + nop +1: + LDFIRST(t0, FIRST(0)(src)) + LDREST(t0, REST(0)(src)) + ADD src, src, NBYTES + SUB len, len, NBYTES + STORE(t0, 0(dst)) + ADDC(sum, t0) + .set reorder /* DADDI_WAR */ + ADD dst, dst, NBYTES + bne len, rem, 1b + .set noreorder + +.Lcopy_bytes_checklen\@: + beqz len, .Ldone\@ + nop +.Lcopy_bytes\@: + /* 0 < len < NBYTES */ +#ifdef CONFIG_CPU_LITTLE_ENDIAN +#define SHIFT_START 0 +#define SHIFT_INC 8 +#else +#define SHIFT_START 8*(NBYTES-1) +#define SHIFT_INC -8 +#endif + move t2, zero # partial word + li t3, SHIFT_START # shift +#define COPY_BYTE(N) \ + LOADBU(t0, N(src)); \ + SUB len, len, 1; \ + STOREB(t0, N(dst)); \ + SLLV t0, t0, t3; \ + addu t3, SHIFT_INC; \ + beqz len, .Lcopy_bytes_done\@; \ + or t2, t0 + + COPY_BYTE(0) + COPY_BYTE(1) +#ifdef USE_DOUBLE + COPY_BYTE(2) + COPY_BYTE(3) + COPY_BYTE(4) + COPY_BYTE(5) +#endif + LOADBU(t0, NBYTES-2(src)) + SUB len, len, 1 + STOREB(t0, NBYTES-2(dst)) + SLLV t0, t0, t3 + or t2, t0 +.Lcopy_bytes_done\@: + ADDC(sum, t2) +.Ldone\@: + /* fold checksum */ + .set push + .set noat +#ifdef USE_DOUBLE + dsll32 v1, sum, 0 + daddu sum, v1 + sltu v1, sum, v1 + dsra32 sum, sum, 0 + addu sum, v1 +#endif + +#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR5) || \ + defined(CONFIG_CPU_LOONGSON64) + .set push + .set arch=mips32r2 + wsbh v1, sum + movn sum, v1, odd + .set pop +#else + beqz odd, 1f /* odd buffer alignment? */ + lui v1, 0x00ff + addu v1, 0x00ff + and t0, sum, v1 + sll t0, t0, 8 + srl sum, sum, 8 + and sum, sum, v1 + or sum, sum, t0 +1: +#endif + .set pop + .set reorder + jr ra + .set noreorder + .endm + + .set noreorder +.L_exc: + jr ra + li v0, 0 + +FEXPORT(__csum_partial_copy_nocheck) +EXPORT_SYMBOL(__csum_partial_copy_nocheck) +#ifndef CONFIG_EVA +FEXPORT(__csum_partial_copy_to_user) +EXPORT_SYMBOL(__csum_partial_copy_to_user) +FEXPORT(__csum_partial_copy_from_user) +EXPORT_SYMBOL(__csum_partial_copy_from_user) +#endif +__BUILD_CSUM_PARTIAL_COPY_USER LEGACY_MODE USEROP USEROP + +#ifdef CONFIG_EVA +LEAF(__csum_partial_copy_to_user) +__BUILD_CSUM_PARTIAL_COPY_USER EVA_MODE KERNELOP USEROP +END(__csum_partial_copy_to_user) + +LEAF(__csum_partial_copy_from_user) +__BUILD_CSUM_PARTIAL_COPY_USER EVA_MODE USEROP KERNELOP +END(__csum_partial_copy_from_user) +#endif |