diff options
author | 2023-02-21 18:24:12 -0800 | |
---|---|---|
committer | 2023-02-21 18:24:12 -0800 | |
commit | 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 (patch) | |
tree | cc5c2d0a898769fd59549594fedb3ee6f84e59a0 /include/crypto/gf128mul.h | |
download | linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.tar.gz linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.zip |
Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextgrafted
Pull networking updates from Jakub Kicinski:
"Core:
- Add dedicated kmem_cache for typical/small skb->head, avoid having
to access struct page at kfree time, and improve memory use.
- Introduce sysctl to set default RPS configuration for new netdevs.
- Define Netlink protocol specification format which can be used to
describe messages used by each family and auto-generate parsers.
Add tools for generating kernel data structures and uAPI headers.
- Expose all net/core sysctls inside netns.
- Remove 4s sleep in netpoll if carrier is instantly detected on
boot.
- Add configurable limit of MDB entries per port, and port-vlan.
- Continue populating drop reasons throughout the stack.
- Retire a handful of legacy Qdiscs and classifiers.
Protocols:
- Support IPv4 big TCP (TSO frames larger than 64kB).
- Add IP_LOCAL_PORT_RANGE socket option, to control local port range
on socket by socket basis.
- Track and report in procfs number of MPTCP sockets used.
- Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path
manager.
- IPv6: don't check net.ipv6.route.max_size and rely on garbage
collection to free memory (similarly to IPv4).
- Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).
- ICMP: add per-rate limit counters.
- Add support for user scanning requests in ieee802154.
- Remove static WEP support.
- Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
reporting.
- WiFi 7 EHT channel puncturing support (client & AP).
BPF:
- Add a rbtree data structure following the "next-gen data structure"
precedent set by recently added linked list, that is, by using
kfunc + kptr instead of adding a new BPF map type.
- Expose XDP hints via kfuncs with initial support for RX hash and
timestamp metadata.
- Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to
better support decap on GRE tunnel devices not operating in collect
metadata.
- Improve x86 JIT's codegen for PROBE_MEM runtime error checks.
- Remove the need for trace_printk_lock for bpf_trace_printk and
bpf_trace_vprintk helpers.
- Extend libbpf's bpf_tracing.h support for tracing arguments of
kprobes/uprobes and syscall as a special case.
- Significantly reduce the search time for module symbols by
livepatch and BPF.
- Enable cpumasks to be used as kptrs, which is useful for tracing
programs tracking which tasks end up running on which CPUs in
different time intervals.
- Add support for BPF trampoline on s390x and riscv64.
- Add capability to export the XDP features supported by the NIC.
- Add __bpf_kfunc tag for marking kernel functions as kfuncs.
- Add cgroup.memory=nobpf kernel parameter option to disable BPF
memory accounting for container environments.
Netfilter:
- Remove the CLUSTERIP target. It has been marked as obsolete for
years, and we still have WARN splats wrt races of the out-of-band
/proc interface installed by this target.
- Add 'destroy' commands to nf_tables. They are identical to the
existing 'delete' commands, but do not return an error if the
referenced object (set, chain, rule...) did not exist.
Driver API:
- Improve cpumask_local_spread() locality to help NICs set the right
IRQ affinity on AMD platforms.
- Separate C22 and C45 MDIO bus transactions more clearly.
- Introduce new DCB table to control DSCP rewrite on egress.
- Support configuration of Physical Layer Collision Avoidance (PLCA)
Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
shared medium Ethernet.
- Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
preemption of low priority frames by high priority frames.
- Add support for controlling MACSec offload using netlink SET.
- Rework devlink instance refcounts to allow registration and
de-registration under the instance lock. Split the code into
multiple files, drop some of the unnecessarily granular locks and
factor out common parts of netlink operation handling.
- Add TX frame aggregation parameters (for USB drivers).
- Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
messages with notifications for debug.
- Allow offloading of UDP NEW connections via act_ct.
- Add support for per action HW stats in TC.
- Support hardware miss to TC action (continue processing in SW from
a specific point in the action chain).
- Warn if old Wireless Extension user space interface is used with
modern cfg80211/mac80211 drivers. Do not support Wireless
Extensions for Wi-Fi 7 devices at all. Everyone should switch to
using nl80211 interface instead.
- Improve the CAN bit timing configuration. Use extack to return
error messages directly to user space, update the SJW handling,
including the definition of a new default value that will benefit
CAN-FD controllers, by increasing their oscillator tolerance.
New hardware / drivers:
- Ethernet:
- nVidia BlueField-3 support (control traffic driver)
- Ethernet support for imx93 SoCs
- Motorcomm yt8531 gigabit Ethernet PHY
- onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
- Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
- Amlogic gxl MDIO mux
- WiFi:
- RealTek RTL8188EU (rtl8xxxu)
- Qualcomm Wi-Fi 7 devices (ath12k)
- CAN:
- Renesas R-Car V4H
Drivers:
- Bluetooth:
- Set Per Platform Antenna Gain (PPAG) for Intel controllers.
- Ethernet NICs:
- Intel (1G, igc):
- support TSN / Qbv / packet scheduling features of i226 model
- Intel (100G, ice):
- use GNSS subsystem instead of TTY
- multi-buffer XDP support
- extend support for GPIO pins to E823 devices
- nVidia/Mellanox:
- update the shared buffer configuration on PFC commands
- implement PTP adjphase function for HW offset control
- TC support for Geneve and GRE with VF tunnel offload
- more efficient crypto key management method
- multi-port eswitch support
- Netronome/Corigine:
- add DCB IEEE support
- support IPsec offloading for NFP3800
- Freescale/NXP (enetc):
- support XDP_REDIRECT for XDP non-linear buffers
- improve reconfig, avoid link flap and waiting for idle
- support MAC Merge layer
- Other NICs:
- sfc/ef100: add basic devlink support for ef100
- ionic: rx_push mode operation (writing descriptors via MMIO)
- bnxt: use the auxiliary bus abstraction for RDMA
- r8169: disable ASPM and reset bus in case of tx timeout
- cpsw: support QSGMII mode for J721e CPSW9G
- cpts: support pulse-per-second output
- ngbe: add an mdio bus driver
- usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
- r8152: handle devices with FW with NCM support
- amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
- virtio-net: support multi buffer XDP
- virtio/vsock: replace virtio_vsock_pkt with sk_buff
- tsnep: XDP support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add support for latency TLV (in FW control messages)
- Microchip (sparx5):
- separate explicit and implicit traffic forwarding rules, make
the implicit rules always active
- add support for egress DSCP rewrite
- IS0 VCAP support (Ingress Classification)
- IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS
etc.)
- ES2 VCAP support (Egress Access Control)
- support for Per-Stream Filtering and Policing (802.1Q,
8.6.5.1)
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- add MAB (port auth) offload support
- enable PTP receive for mv88e6390
- NXP (ocelot):
- support MAC Merge layer
- support for the the vsc7512 internal copper phys
- Microchip:
- lan9303: convert to PHYLINK
- lan966x: support TC flower filter statistics
- lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
- lan937x: support Credit Based Shaper configuration
- ksz9477: support Energy Efficient Ethernet
- other:
- qca8k: convert to regmap read/write API, use bulk operations
- rswitch: Improve TX timestamp accuracy
- Intel WiFi (iwlwifi):
- EHT (Wi-Fi 7) rate reporting
- STEP equalizer support: transfer some STEP (connection to radio
on platforms with integrated wifi) related parameters from the
BIOS to the firmware.
- Qualcomm 802.11ax WiFi (ath11k):
- IPQ5018 support
- Fine Timing Measurement (FTM) responder role support
- channel 177 support
- MediaTek WiFi (mt76):
- per-PHY LED support
- mt7996: EHT (Wi-Fi 7) support
- Wireless Ethernet Dispatch (WED) reset support
- switch to using page pool allocator
- RealTek WiFi (rtw89):
- support new version of Bluetooth co-existance
- Mobile:
- rmnet: support TX aggregation"
* tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits)
page_pool: add a comment explaining the fragment counter usage
net: ethtool: fix __ethtool_dev_mm_supported() implementation
ethtool: pse-pd: Fix double word in comments
xsk: add linux/vmalloc.h to xsk.c
sefltests: netdevsim: wait for devlink instance after netns removal
selftest: fib_tests: Always cleanup before exit
net/mlx5e: Align IPsec ASO result memory to be as required by hardware
net/mlx5e: TC, Set CT miss to the specific ct action instance
net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
net/mlx5: Refactor tc miss handling to a single function
net/mlx5: Kconfig: Make tc offload depend on tc skb extension
net/sched: flower: Support hardware miss to tc action
net/sched: flower: Move filter handle initialization earlier
net/sched: cls_api: Support hardware miss to tc action
net/sched: Rename user cookie and act cookie
sfc: fix builds without CONFIG_RTC_LIB
sfc: clean up some inconsistent indentings
net/mlx4_en: Introduce flexible array to silence overflow warning
net: lan966x: Fix possible deadlock inside PTP
net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
...
Diffstat (limited to 'include/crypto/gf128mul.h')
-rw-r--r-- | include/crypto/gf128mul.h | 252 |
1 files changed, 252 insertions, 0 deletions
diff --git a/include/crypto/gf128mul.h b/include/crypto/gf128mul.h new file mode 100644 index 000000000..81330c644 --- /dev/null +++ b/include/crypto/gf128mul.h @@ -0,0 +1,252 @@ +/* gf128mul.h - GF(2^128) multiplication functions + * + * Copyright (c) 2003, Dr Brian Gladman, Worcester, UK. + * Copyright (c) 2006 Rik Snel <rsnel@cube.dyndns.org> + * + * Based on Dr Brian Gladman's (GPL'd) work published at + * http://fp.gladman.plus.com/cryptography_technology/index.htm + * See the original copyright notice below. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + */ +/* + --------------------------------------------------------------------------- + Copyright (c) 2003, Dr Brian Gladman, Worcester, UK. All rights reserved. + + LICENSE TERMS + + The free distribution and use of this software in both source and binary + form is allowed (with or without changes) provided that: + + 1. distributions of this source code include the above copyright + notice, this list of conditions and the following disclaimer; + + 2. distributions in binary form include the above copyright + notice, this list of conditions and the following disclaimer + in the documentation and/or other associated materials; + + 3. the copyright holder's name is not used to endorse products + built using this software without specific written permission. + + ALTERNATIVELY, provided that this notice is retained in full, this product + may be distributed under the terms of the GNU General Public License (GPL), + in which case the provisions of the GPL apply INSTEAD OF those given above. + + DISCLAIMER + + This software is provided 'as is' with no explicit or implied warranties + in respect of its properties, including, but not limited to, correctness + and/or fitness for purpose. + --------------------------------------------------------------------------- + Issue Date: 31/01/2006 + + An implementation of field multiplication in Galois Field GF(2^128) +*/ + +#ifndef _CRYPTO_GF128MUL_H +#define _CRYPTO_GF128MUL_H + +#include <asm/byteorder.h> +#include <crypto/b128ops.h> +#include <linux/slab.h> + +/* Comment by Rik: + * + * For some background on GF(2^128) see for example: + * http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/proposedmodes/gcm/gcm-revised-spec.pdf + * + * The elements of GF(2^128) := GF(2)[X]/(X^128-X^7-X^2-X^1-1) can + * be mapped to computer memory in a variety of ways. Let's examine + * three common cases. + * + * Take a look at the 16 binary octets below in memory order. The msb's + * are left and the lsb's are right. char b[16] is an array and b[0] is + * the first octet. + * + * 10000000 00000000 00000000 00000000 .... 00000000 00000000 00000000 + * b[0] b[1] b[2] b[3] b[13] b[14] b[15] + * + * Every bit is a coefficient of some power of X. We can store the bits + * in every byte in little-endian order and the bytes themselves also in + * little endian order. I will call this lle (little-little-endian). + * The above buffer represents the polynomial 1, and X^7+X^2+X^1+1 looks + * like 11100001 00000000 .... 00000000 = { 0xE1, 0x00, }. + * This format was originally implemented in gf128mul and is used + * in GCM (Galois/Counter mode) and in ABL (Arbitrary Block Length). + * + * Another convention says: store the bits in bigendian order and the + * bytes also. This is bbe (big-big-endian). Now the buffer above + * represents X^127. X^7+X^2+X^1+1 looks like 00000000 .... 10000111, + * b[15] = 0x87 and the rest is 0. LRW uses this convention and bbe + * is partly implemented. + * + * Both of the above formats are easy to implement on big-endian + * machines. + * + * XTS and EME (the latter of which is patent encumbered) use the ble + * format (bits are stored in big endian order and the bytes in little + * endian). The above buffer represents X^7 in this case and the + * primitive polynomial is b[0] = 0x87. + * + * The common machine word-size is smaller than 128 bits, so to make + * an efficient implementation we must split into machine word sizes. + * This implementation uses 64-bit words for the moment. Machine + * endianness comes into play. The lle format in relation to machine + * endianness is discussed below by the original author of gf128mul Dr + * Brian Gladman. + * + * Let's look at the bbe and ble format on a little endian machine. + * + * bbe on a little endian machine u32 x[4]: + * + * MS x[0] LS MS x[1] LS + * ms ls ms ls ms ls ms ls ms ls ms ls ms ls ms ls + * 103..96 111.104 119.112 127.120 71...64 79...72 87...80 95...88 + * + * MS x[2] LS MS x[3] LS + * ms ls ms ls ms ls ms ls ms ls ms ls ms ls ms ls + * 39...32 47...40 55...48 63...56 07...00 15...08 23...16 31...24 + * + * ble on a little endian machine + * + * MS x[0] LS MS x[1] LS + * ms ls ms ls ms ls ms ls ms ls ms ls ms ls ms ls + * 31...24 23...16 15...08 07...00 63...56 55...48 47...40 39...32 + * + * MS x[2] LS MS x[3] LS + * ms ls ms ls ms ls ms ls ms ls ms ls ms ls ms ls + * 95...88 87...80 79...72 71...64 127.120 199.112 111.104 103..96 + * + * Multiplications in GF(2^128) are mostly bit-shifts, so you see why + * ble (and lbe also) are easier to implement on a little-endian + * machine than on a big-endian machine. The converse holds for bbe + * and lle. + * + * Note: to have good alignment, it seems to me that it is sufficient + * to keep elements of GF(2^128) in type u64[2]. On 32-bit wordsize + * machines this will automatically aligned to wordsize and on a 64-bit + * machine also. + */ +/* Multiply a GF(2^128) field element by x. Field elements are + held in arrays of bytes in which field bits 8n..8n + 7 are held in + byte[n], with lower indexed bits placed in the more numerically + significant bit positions within bytes. + + On little endian machines the bit indexes translate into the bit + positions within four 32-bit words in the following way + + MS x[0] LS MS x[1] LS + ms ls ms ls ms ls ms ls ms ls ms ls ms ls ms ls + 24...31 16...23 08...15 00...07 56...63 48...55 40...47 32...39 + + MS x[2] LS MS x[3] LS + ms ls ms ls ms ls ms ls ms ls ms ls ms ls ms ls + 88...95 80...87 72...79 64...71 120.127 112.119 104.111 96..103 + + On big endian machines the bit indexes translate into the bit + positions within four 32-bit words in the following way + + MS x[0] LS MS x[1] LS + ms ls ms ls ms ls ms ls ms ls ms ls ms ls ms ls + 00...07 08...15 16...23 24...31 32...39 40...47 48...55 56...63 + + MS x[2] LS MS x[3] LS + ms ls ms ls ms ls ms ls ms ls ms ls ms ls ms ls + 64...71 72...79 80...87 88...95 96..103 104.111 112.119 120.127 +*/ + +/* A slow generic version of gf_mul, implemented for lle and bbe + * It multiplies a and b and puts the result in a */ +void gf128mul_lle(be128 *a, const be128 *b); + +void gf128mul_bbe(be128 *a, const be128 *b); + +/* + * The following functions multiply a field element by x in + * the polynomial field representation. They use 64-bit word operations + * to gain speed but compensate for machine endianness and hence work + * correctly on both styles of machine. + * + * They are defined here for performance. + */ + +static inline u64 gf128mul_mask_from_bit(u64 x, int which) +{ + /* a constant-time version of 'x & ((u64)1 << which) ? (u64)-1 : 0' */ + return ((s64)(x << (63 - which)) >> 63); +} + +static inline void gf128mul_x_lle(be128 *r, const be128 *x) +{ + u64 a = be64_to_cpu(x->a); + u64 b = be64_to_cpu(x->b); + + /* equivalent to gf128mul_table_le[(b << 7) & 0xff] << 48 + * (see crypto/gf128mul.c): */ + u64 _tt = gf128mul_mask_from_bit(b, 0) & ((u64)0xe1 << 56); + + r->b = cpu_to_be64((b >> 1) | (a << 63)); + r->a = cpu_to_be64((a >> 1) ^ _tt); +} + +static inline void gf128mul_x_bbe(be128 *r, const be128 *x) +{ + u64 a = be64_to_cpu(x->a); + u64 b = be64_to_cpu(x->b); + + /* equivalent to gf128mul_table_be[a >> 63] (see crypto/gf128mul.c): */ + u64 _tt = gf128mul_mask_from_bit(a, 63) & 0x87; + + r->a = cpu_to_be64((a << 1) | (b >> 63)); + r->b = cpu_to_be64((b << 1) ^ _tt); +} + +/* needed by XTS */ +static inline void gf128mul_x_ble(le128 *r, const le128 *x) +{ + u64 a = le64_to_cpu(x->a); + u64 b = le64_to_cpu(x->b); + + /* equivalent to gf128mul_table_be[b >> 63] (see crypto/gf128mul.c): */ + u64 _tt = gf128mul_mask_from_bit(a, 63) & 0x87; + + r->a = cpu_to_le64((a << 1) | (b >> 63)); + r->b = cpu_to_le64((b << 1) ^ _tt); +} + +/* 4k table optimization */ + +struct gf128mul_4k { + be128 t[256]; +}; + +struct gf128mul_4k *gf128mul_init_4k_lle(const be128 *g); +struct gf128mul_4k *gf128mul_init_4k_bbe(const be128 *g); +void gf128mul_4k_lle(be128 *a, const struct gf128mul_4k *t); +void gf128mul_4k_bbe(be128 *a, const struct gf128mul_4k *t); +void gf128mul_x8_ble(le128 *r, const le128 *x); +static inline void gf128mul_free_4k(struct gf128mul_4k *t) +{ + kfree_sensitive(t); +} + + +/* 64k table optimization, implemented for bbe */ + +struct gf128mul_64k { + struct gf128mul_4k *t[16]; +}; + +/* First initialize with the constant factor with which you + * want to multiply and then call gf128mul_64k_bbe with the other + * factor in the first argument, and the table in the second. + * Afterwards, the result is stored in *a. + */ +struct gf128mul_64k *gf128mul_init_64k_bbe(const be128 *g); +void gf128mul_free_64k(struct gf128mul_64k *t); +void gf128mul_64k_bbe(be128 *a, const struct gf128mul_64k *t); + +#endif /* _CRYPTO_GF128MUL_H */ |