diff options
author | 2023-02-21 18:24:12 -0800 | |
---|---|---|
committer | 2023-02-21 18:24:12 -0800 | |
commit | 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 (patch) | |
tree | cc5c2d0a898769fd59549594fedb3ee6f84e59a0 /Documentation/process/deprecated.rst | |
download | linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.tar.gz linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.zip |
Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextgrafted
Pull networking updates from Jakub Kicinski:
"Core:
- Add dedicated kmem_cache for typical/small skb->head, avoid having
to access struct page at kfree time, and improve memory use.
- Introduce sysctl to set default RPS configuration for new netdevs.
- Define Netlink protocol specification format which can be used to
describe messages used by each family and auto-generate parsers.
Add tools for generating kernel data structures and uAPI headers.
- Expose all net/core sysctls inside netns.
- Remove 4s sleep in netpoll if carrier is instantly detected on
boot.
- Add configurable limit of MDB entries per port, and port-vlan.
- Continue populating drop reasons throughout the stack.
- Retire a handful of legacy Qdiscs and classifiers.
Protocols:
- Support IPv4 big TCP (TSO frames larger than 64kB).
- Add IP_LOCAL_PORT_RANGE socket option, to control local port range
on socket by socket basis.
- Track and report in procfs number of MPTCP sockets used.
- Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path
manager.
- IPv6: don't check net.ipv6.route.max_size and rely on garbage
collection to free memory (similarly to IPv4).
- Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).
- ICMP: add per-rate limit counters.
- Add support for user scanning requests in ieee802154.
- Remove static WEP support.
- Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
reporting.
- WiFi 7 EHT channel puncturing support (client & AP).
BPF:
- Add a rbtree data structure following the "next-gen data structure"
precedent set by recently added linked list, that is, by using
kfunc + kptr instead of adding a new BPF map type.
- Expose XDP hints via kfuncs with initial support for RX hash and
timestamp metadata.
- Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to
better support decap on GRE tunnel devices not operating in collect
metadata.
- Improve x86 JIT's codegen for PROBE_MEM runtime error checks.
- Remove the need for trace_printk_lock for bpf_trace_printk and
bpf_trace_vprintk helpers.
- Extend libbpf's bpf_tracing.h support for tracing arguments of
kprobes/uprobes and syscall as a special case.
- Significantly reduce the search time for module symbols by
livepatch and BPF.
- Enable cpumasks to be used as kptrs, which is useful for tracing
programs tracking which tasks end up running on which CPUs in
different time intervals.
- Add support for BPF trampoline on s390x and riscv64.
- Add capability to export the XDP features supported by the NIC.
- Add __bpf_kfunc tag for marking kernel functions as kfuncs.
- Add cgroup.memory=nobpf kernel parameter option to disable BPF
memory accounting for container environments.
Netfilter:
- Remove the CLUSTERIP target. It has been marked as obsolete for
years, and we still have WARN splats wrt races of the out-of-band
/proc interface installed by this target.
- Add 'destroy' commands to nf_tables. They are identical to the
existing 'delete' commands, but do not return an error if the
referenced object (set, chain, rule...) did not exist.
Driver API:
- Improve cpumask_local_spread() locality to help NICs set the right
IRQ affinity on AMD platforms.
- Separate C22 and C45 MDIO bus transactions more clearly.
- Introduce new DCB table to control DSCP rewrite on egress.
- Support configuration of Physical Layer Collision Avoidance (PLCA)
Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
shared medium Ethernet.
- Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
preemption of low priority frames by high priority frames.
- Add support for controlling MACSec offload using netlink SET.
- Rework devlink instance refcounts to allow registration and
de-registration under the instance lock. Split the code into
multiple files, drop some of the unnecessarily granular locks and
factor out common parts of netlink operation handling.
- Add TX frame aggregation parameters (for USB drivers).
- Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
messages with notifications for debug.
- Allow offloading of UDP NEW connections via act_ct.
- Add support for per action HW stats in TC.
- Support hardware miss to TC action (continue processing in SW from
a specific point in the action chain).
- Warn if old Wireless Extension user space interface is used with
modern cfg80211/mac80211 drivers. Do not support Wireless
Extensions for Wi-Fi 7 devices at all. Everyone should switch to
using nl80211 interface instead.
- Improve the CAN bit timing configuration. Use extack to return
error messages directly to user space, update the SJW handling,
including the definition of a new default value that will benefit
CAN-FD controllers, by increasing their oscillator tolerance.
New hardware / drivers:
- Ethernet:
- nVidia BlueField-3 support (control traffic driver)
- Ethernet support for imx93 SoCs
- Motorcomm yt8531 gigabit Ethernet PHY
- onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
- Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
- Amlogic gxl MDIO mux
- WiFi:
- RealTek RTL8188EU (rtl8xxxu)
- Qualcomm Wi-Fi 7 devices (ath12k)
- CAN:
- Renesas R-Car V4H
Drivers:
- Bluetooth:
- Set Per Platform Antenna Gain (PPAG) for Intel controllers.
- Ethernet NICs:
- Intel (1G, igc):
- support TSN / Qbv / packet scheduling features of i226 model
- Intel (100G, ice):
- use GNSS subsystem instead of TTY
- multi-buffer XDP support
- extend support for GPIO pins to E823 devices
- nVidia/Mellanox:
- update the shared buffer configuration on PFC commands
- implement PTP adjphase function for HW offset control
- TC support for Geneve and GRE with VF tunnel offload
- more efficient crypto key management method
- multi-port eswitch support
- Netronome/Corigine:
- add DCB IEEE support
- support IPsec offloading for NFP3800
- Freescale/NXP (enetc):
- support XDP_REDIRECT for XDP non-linear buffers
- improve reconfig, avoid link flap and waiting for idle
- support MAC Merge layer
- Other NICs:
- sfc/ef100: add basic devlink support for ef100
- ionic: rx_push mode operation (writing descriptors via MMIO)
- bnxt: use the auxiliary bus abstraction for RDMA
- r8169: disable ASPM and reset bus in case of tx timeout
- cpsw: support QSGMII mode for J721e CPSW9G
- cpts: support pulse-per-second output
- ngbe: add an mdio bus driver
- usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
- r8152: handle devices with FW with NCM support
- amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
- virtio-net: support multi buffer XDP
- virtio/vsock: replace virtio_vsock_pkt with sk_buff
- tsnep: XDP support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add support for latency TLV (in FW control messages)
- Microchip (sparx5):
- separate explicit and implicit traffic forwarding rules, make
the implicit rules always active
- add support for egress DSCP rewrite
- IS0 VCAP support (Ingress Classification)
- IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS
etc.)
- ES2 VCAP support (Egress Access Control)
- support for Per-Stream Filtering and Policing (802.1Q,
8.6.5.1)
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- add MAB (port auth) offload support
- enable PTP receive for mv88e6390
- NXP (ocelot):
- support MAC Merge layer
- support for the the vsc7512 internal copper phys
- Microchip:
- lan9303: convert to PHYLINK
- lan966x: support TC flower filter statistics
- lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
- lan937x: support Credit Based Shaper configuration
- ksz9477: support Energy Efficient Ethernet
- other:
- qca8k: convert to regmap read/write API, use bulk operations
- rswitch: Improve TX timestamp accuracy
- Intel WiFi (iwlwifi):
- EHT (Wi-Fi 7) rate reporting
- STEP equalizer support: transfer some STEP (connection to radio
on platforms with integrated wifi) related parameters from the
BIOS to the firmware.
- Qualcomm 802.11ax WiFi (ath11k):
- IPQ5018 support
- Fine Timing Measurement (FTM) responder role support
- channel 177 support
- MediaTek WiFi (mt76):
- per-PHY LED support
- mt7996: EHT (Wi-Fi 7) support
- Wireless Ethernet Dispatch (WED) reset support
- switch to using page pool allocator
- RealTek WiFi (rtw89):
- support new version of Bluetooth co-existance
- Mobile:
- rmnet: support TX aggregation"
* tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits)
page_pool: add a comment explaining the fragment counter usage
net: ethtool: fix __ethtool_dev_mm_supported() implementation
ethtool: pse-pd: Fix double word in comments
xsk: add linux/vmalloc.h to xsk.c
sefltests: netdevsim: wait for devlink instance after netns removal
selftest: fib_tests: Always cleanup before exit
net/mlx5e: Align IPsec ASO result memory to be as required by hardware
net/mlx5e: TC, Set CT miss to the specific ct action instance
net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
net/mlx5: Refactor tc miss handling to a single function
net/mlx5: Kconfig: Make tc offload depend on tc skb extension
net/sched: flower: Support hardware miss to tc action
net/sched: flower: Move filter handle initialization earlier
net/sched: cls_api: Support hardware miss to tc action
net/sched: Rename user cookie and act cookie
sfc: fix builds without CONFIG_RTC_LIB
sfc: clean up some inconsistent indentings
net/mlx4_en: Introduce flexible array to silence overflow warning
net: lan966x: Fix possible deadlock inside PTP
net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
...
Diffstat (limited to 'Documentation/process/deprecated.rst')
-rw-r--r-- | Documentation/process/deprecated.rst | 348 |
1 files changed, 348 insertions, 0 deletions
diff --git a/Documentation/process/deprecated.rst b/Documentation/process/deprecated.rst new file mode 100644 index 000000000..c8fd53a11 --- /dev/null +++ b/Documentation/process/deprecated.rst @@ -0,0 +1,348 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _deprecated: + +===================================================================== +Deprecated Interfaces, Language Features, Attributes, and Conventions +===================================================================== + +In a perfect world, it would be possible to convert all instances of +some deprecated API into the new API and entirely remove the old API in +a single development cycle. However, due to the size of the kernel, the +maintainership hierarchy, and timing, it's not always feasible to do these +kinds of conversions at once. This means that new instances may sneak into +the kernel while old ones are being removed, only making the amount of +work to remove the API grow. In order to educate developers about what +has been deprecated and why, this list has been created as a place to +point when uses of deprecated things are proposed for inclusion in the +kernel. + +__deprecated +------------ +While this attribute does visually mark an interface as deprecated, +it `does not produce warnings during builds any more +<https://git.kernel.org/linus/771c035372a036f83353eef46dbb829780330234>`_ +because one of the standing goals of the kernel is to build without +warnings and no one was actually doing anything to remove these deprecated +interfaces. While using `__deprecated` is nice to note an old API in +a header file, it isn't the full solution. Such interfaces must either +be fully removed from the kernel, or added to this file to discourage +others from using them in the future. + +BUG() and BUG_ON() +------------------ +Use WARN() and WARN_ON() instead, and handle the "impossible" +error condition as gracefully as possible. While the BUG()-family +of APIs were originally designed to act as an "impossible situation" +assert and to kill a kernel thread "safely", they turn out to just be +too risky. (e.g. "In what order do locks need to be released? Have +various states been restored?") Very commonly, using BUG() will +destabilize a system or entirely break it, which makes it impossible +to debug or even get viable crash reports. Linus has `very strong +<https://lore.kernel.org/lkml/CA+55aFy6jNLsywVYdGp83AMrXBo_P-pkjkphPGrO=82SPKCpLQ@mail.gmail.com/>`_ +feelings `about this +<https://lore.kernel.org/lkml/CAHk-=whDHsbK3HTOpTF=ue_o04onRwTEaK_ZoJp_fjbqq4+=Jw@mail.gmail.com/>`_. + +Note that the WARN()-family should only be used for "expected to +be unreachable" situations. If you want to warn about "reachable +but undesirable" situations, please use the pr_warn()-family of +functions. System owners may have set the *panic_on_warn* sysctl, +to make sure their systems do not continue running in the face of +"unreachable" conditions. (For example, see commits like `this one +<https://git.kernel.org/linus/d4689846881d160a4d12a514e991a740bcb5d65a>`_.) + +open-coded arithmetic in allocator arguments +-------------------------------------------- +Dynamic size calculations (especially multiplication) should not be +performed in memory allocator (or similar) function arguments due to the +risk of them overflowing. This could lead to values wrapping around and a +smaller allocation being made than the caller was expecting. Using those +allocations could lead to linear overflows of heap memory and other +misbehaviors. (One exception to this is literal values where the compiler +can warn if they might overflow. However, the preferred way in these +cases is to refactor the code as suggested below to avoid the open-coded +arithmetic.) + +For example, do not use ``count * size`` as an argument, as in:: + + foo = kmalloc(count * size, GFP_KERNEL); + +Instead, the 2-factor form of the allocator should be used:: + + foo = kmalloc_array(count, size, GFP_KERNEL); + +Specifically, kmalloc() can be replaced with kmalloc_array(), and +kzalloc() can be replaced with kcalloc(). + +If no 2-factor form is available, the saturate-on-overflow helpers should +be used:: + + bar = vmalloc(array_size(count, size)); + +Another common case to avoid is calculating the size of a structure with +a trailing array of others structures, as in:: + + header = kzalloc(sizeof(*header) + count * sizeof(*header->item), + GFP_KERNEL); + +Instead, use the helper:: + + header = kzalloc(struct_size(header, item, count), GFP_KERNEL); + +.. note:: If you are using struct_size() on a structure containing a zero-length + or a one-element array as a trailing array member, please refactor such + array usage and switch to a `flexible array member + <#zero-length-and-one-element-arrays>`_ instead. + +For other calculations, please compose the use of the size_mul(), +size_add(), and size_sub() helpers. For example, in the case of:: + + foo = krealloc(current_size + chunk_size * (count - 3), GFP_KERNEL); + +Instead, use the helpers:: + + foo = krealloc(size_add(current_size, + size_mul(chunk_size, + size_sub(count, 3))), GFP_KERNEL); + +For more details, also see array3_size() and flex_array_size(), +as well as the related check_mul_overflow(), check_add_overflow(), +check_sub_overflow(), and check_shl_overflow() family of functions. + +simple_strtol(), simple_strtoll(), simple_strtoul(), simple_strtoull() +---------------------------------------------------------------------- +The simple_strtol(), simple_strtoll(), +simple_strtoul(), and simple_strtoull() functions +explicitly ignore overflows, which may lead to unexpected results +in callers. The respective kstrtol(), kstrtoll(), +kstrtoul(), and kstrtoull() functions tend to be the +correct replacements, though note that those require the string to be +NUL or newline terminated. + +strcpy() +-------- +strcpy() performs no bounds checking on the destination buffer. This +could result in linear overflows beyond the end of the buffer, leading to +all kinds of misbehaviors. While `CONFIG_FORTIFY_SOURCE=y` and various +compiler flags help reduce the risk of using this function, there is +no good reason to add new uses of this function. The safe replacement +is strscpy(), though care must be given to any cases where the return +value of strcpy() was used, since strscpy() does not return a pointer to +the destination, but rather a count of non-NUL bytes copied (or negative +errno when it truncates). + +strncpy() on NUL-terminated strings +----------------------------------- +Use of strncpy() does not guarantee that the destination buffer will +be NUL terminated. This can lead to various linear read overflows and +other misbehavior due to the missing termination. It also NUL-pads +the destination buffer if the source contents are shorter than the +destination buffer size, which may be a needless performance penalty +for callers using only NUL-terminated strings. + +When the destination is required to be NUL-terminated, the replacement is +strscpy(), though care must be given to any cases where the return value +of strncpy() was used, since strscpy() does not return a pointer to the +destination, but rather a count of non-NUL bytes copied (or negative +errno when it truncates). Any cases still needing NUL-padding should +instead use strscpy_pad(). + +If a caller is using non-NUL-terminated strings, strtomem() should be +used, and the destinations should be marked with the `__nonstring +<https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html>`_ +attribute to avoid future compiler warnings. For cases still needing +NUL-padding, strtomem_pad() can be used. + +strlcpy() +--------- +strlcpy() reads the entire source buffer first (since the return value +is meant to match that of strlen()). This read may exceed the destination +size limit. This is both inefficient and can lead to linear read overflows +if a source string is not NUL-terminated. The safe replacement is strscpy(), +though care must be given to any cases where the return value of strlcpy() +is used, since strscpy() will return negative errno values when it truncates. + +%p format specifier +------------------- +Traditionally, using "%p" in format strings would lead to regular address +exposure flaws in dmesg, proc, sysfs, etc. Instead of leaving these to +be exploitable, all "%p" uses in the kernel are being printed as a hashed +value, rendering them unusable for addressing. New uses of "%p" should not +be added to the kernel. For text addresses, using "%pS" is likely better, +as it produces the more useful symbol name instead. For nearly everything +else, just do not add "%p" at all. + +Paraphrasing Linus's current `guidance <https://lore.kernel.org/lkml/CA+55aFwQEd_d40g4mUCSsVRZzrFPUJt74vc6PPpb675hYNXcKw@mail.gmail.com/>`_: + +- If the hashed "%p" value is pointless, ask yourself whether the pointer + itself is important. Maybe it should be removed entirely? +- If you really think the true pointer value is important, why is some + system state or user privilege level considered "special"? If you think + you can justify it (in comments and commit log) well enough to stand + up to Linus's scrutiny, maybe you can use "%px", along with making sure + you have sensible permissions. + +If you are debugging something where "%p" hashing is causing problems, +you can temporarily boot with the debug flag "`no_hash_pointers +<https://git.kernel.org/linus/5ead723a20e0447bc7db33dc3070b420e5f80aa6>`_". + +Variable Length Arrays (VLAs) +----------------------------- +Using stack VLAs produces much worse machine code than statically +sized stack arrays. While these non-trivial `performance issues +<https://git.kernel.org/linus/02361bc77888>`_ are reason enough to +eliminate VLAs, they are also a security risk. Dynamic growth of a stack +array may exceed the remaining memory in the stack segment. This could +lead to a crash, possible overwriting sensitive contents at the end of the +stack (when built without `CONFIG_THREAD_INFO_IN_TASK=y`), or overwriting +memory adjacent to the stack (when built without `CONFIG_VMAP_STACK=y`) + +Implicit switch case fall-through +--------------------------------- +The C language allows switch cases to fall through to the next case +when a "break" statement is missing at the end of a case. This, however, +introduces ambiguity in the code, as it's not always clear if the missing +break is intentional or a bug. For example, it's not obvious just from +looking at the code if `STATE_ONE` is intentionally designed to fall +through into `STATE_TWO`:: + + switch (value) { + case STATE_ONE: + do_something(); + case STATE_TWO: + do_other(); + break; + default: + WARN("unknown state"); + } + +As there have been a long list of flaws `due to missing "break" statements +<https://cwe.mitre.org/data/definitions/484.html>`_, we no longer allow +implicit fall-through. In order to identify intentional fall-through +cases, we have adopted a pseudo-keyword macro "fallthrough" which +expands to gcc's extension `__attribute__((__fallthrough__)) +<https://gcc.gnu.org/onlinedocs/gcc/Statement-Attributes.html>`_. +(When the C17/C18 `[[fallthrough]]` syntax is more commonly supported by +C compilers, static analyzers, and IDEs, we can switch to using that syntax +for the macro pseudo-keyword.) + +All switch/case blocks must end in one of: + +* break; +* fallthrough; +* continue; +* goto <label>; +* return [expression]; + +Zero-length and one-element arrays +---------------------------------- +There is a regular need in the kernel to provide a way to declare having +a dynamically sized set of trailing elements in a structure. Kernel code +should always use `"flexible array members" <https://en.wikipedia.org/wiki/Flexible_array_member>`_ +for these cases. The older style of one-element or zero-length arrays should +no longer be used. + +In older C code, dynamically sized trailing elements were done by specifying +a one-element array at the end of a structure:: + + struct something { + size_t count; + struct foo items[1]; + }; + +This led to fragile size calculations via sizeof() (which would need to +remove the size of the single trailing element to get a correct size of +the "header"). A `GNU C extension <https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html>`_ +was introduced to allow for zero-length arrays, to avoid these kinds of +size problems:: + + struct something { + size_t count; + struct foo items[0]; + }; + +But this led to other problems, and didn't solve some problems shared by +both styles, like not being able to detect when such an array is accidentally +being used _not_ at the end of a structure (which could happen directly, or +when such a struct was in unions, structs of structs, etc). + +C99 introduced "flexible array members", which lacks a numeric size for +the array declaration entirely:: + + struct something { + size_t count; + struct foo items[]; + }; + +This is the way the kernel expects dynamically sized trailing elements +to be declared. It allows the compiler to generate errors when the +flexible array does not occur last in the structure, which helps to prevent +some kind of `undefined behavior +<https://git.kernel.org/linus/76497732932f15e7323dc805e8ea8dc11bb587cf>`_ +bugs from being inadvertently introduced to the codebase. It also allows +the compiler to correctly analyze array sizes (via sizeof(), +`CONFIG_FORTIFY_SOURCE`, and `CONFIG_UBSAN_BOUNDS`). For instance, +there is no mechanism that warns us that the following application of the +sizeof() operator to a zero-length array always results in zero:: + + struct something { + size_t count; + struct foo items[0]; + }; + + struct something *instance; + + instance = kmalloc(struct_size(instance, items, count), GFP_KERNEL); + instance->count = count; + + size = sizeof(instance->items) * instance->count; + memcpy(instance->items, source, size); + +At the last line of code above, ``size`` turns out to be ``zero``, when one might +have thought it represents the total size in bytes of the dynamic memory recently +allocated for the trailing array ``items``. Here are a couple examples of this +issue: `link 1 +<https://git.kernel.org/linus/f2cd32a443da694ac4e28fbf4ac6f9d5cc63a539>`_, +`link 2 +<https://git.kernel.org/linus/ab91c2a89f86be2898cee208d492816ec238b2cf>`_. +Instead, `flexible array members have incomplete type, and so the sizeof() +operator may not be applied <https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html>`_, +so any misuse of such operators will be immediately noticed at build time. + +With respect to one-element arrays, one has to be acutely aware that `such arrays +occupy at least as much space as a single object of the type +<https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html>`_, +hence they contribute to the size of the enclosing structure. This is prone +to error every time people want to calculate the total size of dynamic memory +to allocate for a structure containing an array of this kind as a member:: + + struct something { + size_t count; + struct foo items[1]; + }; + + struct something *instance; + + instance = kmalloc(struct_size(instance, items, count - 1), GFP_KERNEL); + instance->count = count; + + size = sizeof(instance->items) * instance->count; + memcpy(instance->items, source, size); + +In the example above, we had to remember to calculate ``count - 1`` when using +the struct_size() helper, otherwise we would have --unintentionally-- allocated +memory for one too many ``items`` objects. The cleanest and least error-prone way +to implement this is through the use of a `flexible array member`, together with +struct_size() and flex_array_size() helpers:: + + struct something { + size_t count; + struct foo items[]; + }; + + struct something *instance; + + instance = kmalloc(struct_size(instance, items, count), GFP_KERNEL); + instance->count = count; + + memcpy(instance->items, source, flex_array_size(instance, items, instance->count)); |