diff options
author | 2023-02-21 18:24:12 -0800 | |
---|---|---|
committer | 2023-02-21 18:24:12 -0800 | |
commit | 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 (patch) | |
tree | cc5c2d0a898769fd59549594fedb3ee6f84e59a0 /Documentation/dev-tools/kasan.rst | |
download | linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.tar.gz linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.zip |
Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextgrafted
Pull networking updates from Jakub Kicinski:
"Core:
- Add dedicated kmem_cache for typical/small skb->head, avoid having
to access struct page at kfree time, and improve memory use.
- Introduce sysctl to set default RPS configuration for new netdevs.
- Define Netlink protocol specification format which can be used to
describe messages used by each family and auto-generate parsers.
Add tools for generating kernel data structures and uAPI headers.
- Expose all net/core sysctls inside netns.
- Remove 4s sleep in netpoll if carrier is instantly detected on
boot.
- Add configurable limit of MDB entries per port, and port-vlan.
- Continue populating drop reasons throughout the stack.
- Retire a handful of legacy Qdiscs and classifiers.
Protocols:
- Support IPv4 big TCP (TSO frames larger than 64kB).
- Add IP_LOCAL_PORT_RANGE socket option, to control local port range
on socket by socket basis.
- Track and report in procfs number of MPTCP sockets used.
- Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path
manager.
- IPv6: don't check net.ipv6.route.max_size and rely on garbage
collection to free memory (similarly to IPv4).
- Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).
- ICMP: add per-rate limit counters.
- Add support for user scanning requests in ieee802154.
- Remove static WEP support.
- Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
reporting.
- WiFi 7 EHT channel puncturing support (client & AP).
BPF:
- Add a rbtree data structure following the "next-gen data structure"
precedent set by recently added linked list, that is, by using
kfunc + kptr instead of adding a new BPF map type.
- Expose XDP hints via kfuncs with initial support for RX hash and
timestamp metadata.
- Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to
better support decap on GRE tunnel devices not operating in collect
metadata.
- Improve x86 JIT's codegen for PROBE_MEM runtime error checks.
- Remove the need for trace_printk_lock for bpf_trace_printk and
bpf_trace_vprintk helpers.
- Extend libbpf's bpf_tracing.h support for tracing arguments of
kprobes/uprobes and syscall as a special case.
- Significantly reduce the search time for module symbols by
livepatch and BPF.
- Enable cpumasks to be used as kptrs, which is useful for tracing
programs tracking which tasks end up running on which CPUs in
different time intervals.
- Add support for BPF trampoline on s390x and riscv64.
- Add capability to export the XDP features supported by the NIC.
- Add __bpf_kfunc tag for marking kernel functions as kfuncs.
- Add cgroup.memory=nobpf kernel parameter option to disable BPF
memory accounting for container environments.
Netfilter:
- Remove the CLUSTERIP target. It has been marked as obsolete for
years, and we still have WARN splats wrt races of the out-of-band
/proc interface installed by this target.
- Add 'destroy' commands to nf_tables. They are identical to the
existing 'delete' commands, but do not return an error if the
referenced object (set, chain, rule...) did not exist.
Driver API:
- Improve cpumask_local_spread() locality to help NICs set the right
IRQ affinity on AMD platforms.
- Separate C22 and C45 MDIO bus transactions more clearly.
- Introduce new DCB table to control DSCP rewrite on egress.
- Support configuration of Physical Layer Collision Avoidance (PLCA)
Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
shared medium Ethernet.
- Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
preemption of low priority frames by high priority frames.
- Add support for controlling MACSec offload using netlink SET.
- Rework devlink instance refcounts to allow registration and
de-registration under the instance lock. Split the code into
multiple files, drop some of the unnecessarily granular locks and
factor out common parts of netlink operation handling.
- Add TX frame aggregation parameters (for USB drivers).
- Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
messages with notifications for debug.
- Allow offloading of UDP NEW connections via act_ct.
- Add support for per action HW stats in TC.
- Support hardware miss to TC action (continue processing in SW from
a specific point in the action chain).
- Warn if old Wireless Extension user space interface is used with
modern cfg80211/mac80211 drivers. Do not support Wireless
Extensions for Wi-Fi 7 devices at all. Everyone should switch to
using nl80211 interface instead.
- Improve the CAN bit timing configuration. Use extack to return
error messages directly to user space, update the SJW handling,
including the definition of a new default value that will benefit
CAN-FD controllers, by increasing their oscillator tolerance.
New hardware / drivers:
- Ethernet:
- nVidia BlueField-3 support (control traffic driver)
- Ethernet support for imx93 SoCs
- Motorcomm yt8531 gigabit Ethernet PHY
- onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
- Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
- Amlogic gxl MDIO mux
- WiFi:
- RealTek RTL8188EU (rtl8xxxu)
- Qualcomm Wi-Fi 7 devices (ath12k)
- CAN:
- Renesas R-Car V4H
Drivers:
- Bluetooth:
- Set Per Platform Antenna Gain (PPAG) for Intel controllers.
- Ethernet NICs:
- Intel (1G, igc):
- support TSN / Qbv / packet scheduling features of i226 model
- Intel (100G, ice):
- use GNSS subsystem instead of TTY
- multi-buffer XDP support
- extend support for GPIO pins to E823 devices
- nVidia/Mellanox:
- update the shared buffer configuration on PFC commands
- implement PTP adjphase function for HW offset control
- TC support for Geneve and GRE with VF tunnel offload
- more efficient crypto key management method
- multi-port eswitch support
- Netronome/Corigine:
- add DCB IEEE support
- support IPsec offloading for NFP3800
- Freescale/NXP (enetc):
- support XDP_REDIRECT for XDP non-linear buffers
- improve reconfig, avoid link flap and waiting for idle
- support MAC Merge layer
- Other NICs:
- sfc/ef100: add basic devlink support for ef100
- ionic: rx_push mode operation (writing descriptors via MMIO)
- bnxt: use the auxiliary bus abstraction for RDMA
- r8169: disable ASPM and reset bus in case of tx timeout
- cpsw: support QSGMII mode for J721e CPSW9G
- cpts: support pulse-per-second output
- ngbe: add an mdio bus driver
- usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
- r8152: handle devices with FW with NCM support
- amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
- virtio-net: support multi buffer XDP
- virtio/vsock: replace virtio_vsock_pkt with sk_buff
- tsnep: XDP support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add support for latency TLV (in FW control messages)
- Microchip (sparx5):
- separate explicit and implicit traffic forwarding rules, make
the implicit rules always active
- add support for egress DSCP rewrite
- IS0 VCAP support (Ingress Classification)
- IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS
etc.)
- ES2 VCAP support (Egress Access Control)
- support for Per-Stream Filtering and Policing (802.1Q,
8.6.5.1)
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- add MAB (port auth) offload support
- enable PTP receive for mv88e6390
- NXP (ocelot):
- support MAC Merge layer
- support for the the vsc7512 internal copper phys
- Microchip:
- lan9303: convert to PHYLINK
- lan966x: support TC flower filter statistics
- lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
- lan937x: support Credit Based Shaper configuration
- ksz9477: support Energy Efficient Ethernet
- other:
- qca8k: convert to regmap read/write API, use bulk operations
- rswitch: Improve TX timestamp accuracy
- Intel WiFi (iwlwifi):
- EHT (Wi-Fi 7) rate reporting
- STEP equalizer support: transfer some STEP (connection to radio
on platforms with integrated wifi) related parameters from the
BIOS to the firmware.
- Qualcomm 802.11ax WiFi (ath11k):
- IPQ5018 support
- Fine Timing Measurement (FTM) responder role support
- channel 177 support
- MediaTek WiFi (mt76):
- per-PHY LED support
- mt7996: EHT (Wi-Fi 7) support
- Wireless Ethernet Dispatch (WED) reset support
- switch to using page pool allocator
- RealTek WiFi (rtw89):
- support new version of Bluetooth co-existance
- Mobile:
- rmnet: support TX aggregation"
* tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits)
page_pool: add a comment explaining the fragment counter usage
net: ethtool: fix __ethtool_dev_mm_supported() implementation
ethtool: pse-pd: Fix double word in comments
xsk: add linux/vmalloc.h to xsk.c
sefltests: netdevsim: wait for devlink instance after netns removal
selftest: fib_tests: Always cleanup before exit
net/mlx5e: Align IPsec ASO result memory to be as required by hardware
net/mlx5e: TC, Set CT miss to the specific ct action instance
net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
net/mlx5: Refactor tc miss handling to a single function
net/mlx5: Kconfig: Make tc offload depend on tc skb extension
net/sched: flower: Support hardware miss to tc action
net/sched: flower: Move filter handle initialization earlier
net/sched: cls_api: Support hardware miss to tc action
net/sched: Rename user cookie and act cookie
sfc: fix builds without CONFIG_RTC_LIB
sfc: clean up some inconsistent indentings
net/mlx4_en: Introduce flexible array to silence overflow warning
net: lan966x: Fix possible deadlock inside PTP
net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
...
Diffstat (limited to 'Documentation/dev-tools/kasan.rst')
-rw-r--r-- | Documentation/dev-tools/kasan.rst | 529 |
1 files changed, 529 insertions, 0 deletions
diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst new file mode 100644 index 000000000..5c93ab915 --- /dev/null +++ b/Documentation/dev-tools/kasan.rst @@ -0,0 +1,529 @@ +The Kernel Address Sanitizer (KASAN) +==================================== + +Overview +-------- + +Kernel Address Sanitizer (KASAN) is a dynamic memory safety error detector +designed to find out-of-bounds and use-after-free bugs. + +KASAN has three modes: + +1. Generic KASAN +2. Software Tag-Based KASAN +3. Hardware Tag-Based KASAN + +Generic KASAN, enabled with CONFIG_KASAN_GENERIC, is the mode intended for +debugging, similar to userspace ASan. This mode is supported on many CPU +architectures, but it has significant performance and memory overheads. + +Software Tag-Based KASAN or SW_TAGS KASAN, enabled with CONFIG_KASAN_SW_TAGS, +can be used for both debugging and dogfood testing, similar to userspace HWASan. +This mode is only supported for arm64, but its moderate memory overhead allows +using it for testing on memory-restricted devices with real workloads. + +Hardware Tag-Based KASAN or HW_TAGS KASAN, enabled with CONFIG_KASAN_HW_TAGS, +is the mode intended to be used as an in-field memory bug detector or as a +security mitigation. This mode only works on arm64 CPUs that support MTE +(Memory Tagging Extension), but it has low memory and performance overheads and +thus can be used in production. + +For details about the memory and performance impact of each KASAN mode, see the +descriptions of the corresponding Kconfig options. + +The Generic and the Software Tag-Based modes are commonly referred to as the +software modes. The Software Tag-Based and the Hardware Tag-Based modes are +referred to as the tag-based modes. + +Support +------- + +Architectures +~~~~~~~~~~~~~ + +Generic KASAN is supported on x86_64, arm, arm64, powerpc, riscv, s390, and +xtensa, and the tag-based KASAN modes are supported only on arm64. + +Compilers +~~~~~~~~~ + +Software KASAN modes use compile-time instrumentation to insert validity checks +before every memory access and thus require a compiler version that provides +support for that. The Hardware Tag-Based mode relies on hardware to perform +these checks but still requires a compiler version that supports the memory +tagging instructions. + +Generic KASAN requires GCC version 8.3.0 or later +or any Clang version supported by the kernel. + +Software Tag-Based KASAN requires GCC 11+ +or any Clang version supported by the kernel. + +Hardware Tag-Based KASAN requires GCC 10+ or Clang 12+. + +Memory types +~~~~~~~~~~~~ + +Generic KASAN supports finding bugs in all of slab, page_alloc, vmap, vmalloc, +stack, and global memory. + +Software Tag-Based KASAN supports slab, page_alloc, vmalloc, and stack memory. + +Hardware Tag-Based KASAN supports slab, page_alloc, and non-executable vmalloc +memory. + +For slab, both software KASAN modes support SLUB and SLAB allocators, while +Hardware Tag-Based KASAN only supports SLUB. + +Usage +----- + +To enable KASAN, configure the kernel with:: + + CONFIG_KASAN=y + +and choose between ``CONFIG_KASAN_GENERIC`` (to enable Generic KASAN), +``CONFIG_KASAN_SW_TAGS`` (to enable Software Tag-Based KASAN), and +``CONFIG_KASAN_HW_TAGS`` (to enable Hardware Tag-Based KASAN). + +For the software modes, also choose between ``CONFIG_KASAN_OUTLINE`` and +``CONFIG_KASAN_INLINE``. Outline and inline are compiler instrumentation types. +The former produces a smaller binary while the latter is up to 2 times faster. + +To include alloc and free stack traces of affected slab objects into reports, +enable ``CONFIG_STACKTRACE``. To include alloc and free stack traces of affected +physical pages, enable ``CONFIG_PAGE_OWNER`` and boot with ``page_owner=on``. + +Boot parameters +~~~~~~~~~~~~~~~ + +KASAN is affected by the generic ``panic_on_warn`` command line parameter. +When it is enabled, KASAN panics the kernel after printing a bug report. + +By default, KASAN prints a bug report only for the first invalid memory access. +With ``kasan_multi_shot``, KASAN prints a report on every invalid access. This +effectively disables ``panic_on_warn`` for KASAN reports. + +Alternatively, independent of ``panic_on_warn``, the ``kasan.fault=`` boot +parameter can be used to control panic and reporting behaviour: + +- ``kasan.fault=report`` or ``=panic`` controls whether to only print a KASAN + report or also panic the kernel (default: ``report``). The panic happens even + if ``kasan_multi_shot`` is enabled. + +Software and Hardware Tag-Based KASAN modes (see the section about various +modes below) support altering stack trace collection behavior: + +- ``kasan.stacktrace=off`` or ``=on`` disables or enables alloc and free stack + traces collection (default: ``on``). +- ``kasan.stack_ring_size=<number of entries>`` specifies the number of entries + in the stack ring (default: ``32768``). + +Hardware Tag-Based KASAN mode is intended for use in production as a security +mitigation. Therefore, it supports additional boot parameters that allow +disabling KASAN altogether or controlling its features: + +- ``kasan=off`` or ``=on`` controls whether KASAN is enabled (default: ``on``). + +- ``kasan.mode=sync``, ``=async`` or ``=asymm`` controls whether KASAN + is configured in synchronous, asynchronous or asymmetric mode of + execution (default: ``sync``). + Synchronous mode: a bad access is detected immediately when a tag + check fault occurs. + Asynchronous mode: a bad access detection is delayed. When a tag check + fault occurs, the information is stored in hardware (in the TFSR_EL1 + register for arm64). The kernel periodically checks the hardware and + only reports tag faults during these checks. + Asymmetric mode: a bad access is detected synchronously on reads and + asynchronously on writes. + +- ``kasan.vmalloc=off`` or ``=on`` disables or enables tagging of vmalloc + allocations (default: ``on``). + +Error reports +~~~~~~~~~~~~~ + +A typical KASAN report looks like this:: + + ================================================================== + BUG: KASAN: slab-out-of-bounds in kmalloc_oob_right+0xa8/0xbc [test_kasan] + Write of size 1 at addr ffff8801f44ec37b by task insmod/2760 + + CPU: 1 PID: 2760 Comm: insmod Not tainted 4.19.0-rc3+ #698 + Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 + Call Trace: + dump_stack+0x94/0xd8 + print_address_description+0x73/0x280 + kasan_report+0x144/0x187 + __asan_report_store1_noabort+0x17/0x20 + kmalloc_oob_right+0xa8/0xbc [test_kasan] + kmalloc_tests_init+0x16/0x700 [test_kasan] + do_one_initcall+0xa5/0x3ae + do_init_module+0x1b6/0x547 + load_module+0x75df/0x8070 + __do_sys_init_module+0x1c6/0x200 + __x64_sys_init_module+0x6e/0xb0 + do_syscall_64+0x9f/0x2c0 + entry_SYSCALL_64_after_hwframe+0x44/0xa9 + RIP: 0033:0x7f96443109da + RSP: 002b:00007ffcf0b51b08 EFLAGS: 00000202 ORIG_RAX: 00000000000000af + RAX: ffffffffffffffda RBX: 000055dc3ee521a0 RCX: 00007f96443109da + RDX: 00007f96445cff88 RSI: 0000000000057a50 RDI: 00007f9644992000 + RBP: 000055dc3ee510b0 R08: 0000000000000003 R09: 0000000000000000 + R10: 00007f964430cd0a R11: 0000000000000202 R12: 00007f96445cff88 + R13: 000055dc3ee51090 R14: 0000000000000000 R15: 0000000000000000 + + Allocated by task 2760: + save_stack+0x43/0xd0 + kasan_kmalloc+0xa7/0xd0 + kmem_cache_alloc_trace+0xe1/0x1b0 + kmalloc_oob_right+0x56/0xbc [test_kasan] + kmalloc_tests_init+0x16/0x700 [test_kasan] + do_one_initcall+0xa5/0x3ae + do_init_module+0x1b6/0x547 + load_module+0x75df/0x8070 + __do_sys_init_module+0x1c6/0x200 + __x64_sys_init_module+0x6e/0xb0 + do_syscall_64+0x9f/0x2c0 + entry_SYSCALL_64_after_hwframe+0x44/0xa9 + + Freed by task 815: + save_stack+0x43/0xd0 + __kasan_slab_free+0x135/0x190 + kasan_slab_free+0xe/0x10 + kfree+0x93/0x1a0 + umh_complete+0x6a/0xa0 + call_usermodehelper_exec_async+0x4c3/0x640 + ret_from_fork+0x35/0x40 + + The buggy address belongs to the object at ffff8801f44ec300 + which belongs to the cache kmalloc-128 of size 128 + The buggy address is located 123 bytes inside of + 128-byte region [ffff8801f44ec300, ffff8801f44ec380) + The buggy address belongs to the page: + page:ffffea0007d13b00 count:1 mapcount:0 mapping:ffff8801f7001640 index:0x0 + flags: 0x200000000000100(slab) + raw: 0200000000000100 ffffea0007d11dc0 0000001a0000001a ffff8801f7001640 + raw: 0000000000000000 0000000080150015 00000001ffffffff 0000000000000000 + page dumped because: kasan: bad access detected + + Memory state around the buggy address: + ffff8801f44ec200: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb + ffff8801f44ec280: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc + >ffff8801f44ec300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 + ^ + ffff8801f44ec380: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb + ffff8801f44ec400: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc + ================================================================== + +The report header summarizes what kind of bug happened and what kind of access +caused it. It is followed by a stack trace of the bad access, a stack trace of +where the accessed memory was allocated (in case a slab object was accessed), +and a stack trace of where the object was freed (in case of a use-after-free +bug report). Next comes a description of the accessed slab object and the +information about the accessed memory page. + +In the end, the report shows the memory state around the accessed address. +Internally, KASAN tracks memory state separately for each memory granule, which +is either 8 or 16 aligned bytes depending on KASAN mode. Each number in the +memory state section of the report shows the state of one of the memory +granules that surround the accessed address. + +For Generic KASAN, the size of each memory granule is 8. The state of each +granule is encoded in one shadow byte. Those 8 bytes can be accessible, +partially accessible, freed, or be a part of a redzone. KASAN uses the following +encoding for each shadow byte: 00 means that all 8 bytes of the corresponding +memory region are accessible; number N (1 <= N <= 7) means that the first N +bytes are accessible, and other (8 - N) bytes are not; any negative value +indicates that the entire 8-byte word is inaccessible. KASAN uses different +negative values to distinguish between different kinds of inaccessible memory +like redzones or freed memory (see mm/kasan/kasan.h). + +In the report above, the arrow points to the shadow byte ``03``, which means +that the accessed address is partially accessible. + +For tag-based KASAN modes, this last report section shows the memory tags around +the accessed address (see the `Implementation details`_ section). + +Note that KASAN bug titles (like ``slab-out-of-bounds`` or ``use-after-free``) +are best-effort: KASAN prints the most probable bug type based on the limited +information it has. The actual type of the bug might be different. + +Generic KASAN also reports up to two auxiliary call stack traces. These stack +traces point to places in code that interacted with the object but that are not +directly present in the bad access stack trace. Currently, this includes +call_rcu() and workqueue queuing. + +Implementation details +---------------------- + +Generic KASAN +~~~~~~~~~~~~~ + +Software KASAN modes use shadow memory to record whether each byte of memory is +safe to access and use compile-time instrumentation to insert shadow memory +checks before each memory access. + +Generic KASAN dedicates 1/8th of kernel memory to its shadow memory (16TB +to cover 128TB on x86_64) and uses direct mapping with a scale and offset to +translate a memory address to its corresponding shadow address. + +Here is the function which translates an address to its corresponding shadow +address:: + + static inline void *kasan_mem_to_shadow(const void *addr) + { + return (void *)((unsigned long)addr >> KASAN_SHADOW_SCALE_SHIFT) + + KASAN_SHADOW_OFFSET; + } + +where ``KASAN_SHADOW_SCALE_SHIFT = 3``. + +Compile-time instrumentation is used to insert memory access checks. Compiler +inserts function calls (``__asan_load*(addr)``, ``__asan_store*(addr)``) before +each memory access of size 1, 2, 4, 8, or 16. These functions check whether +memory accesses are valid or not by checking corresponding shadow memory. + +With inline instrumentation, instead of making function calls, the compiler +directly inserts the code to check shadow memory. This option significantly +enlarges the kernel, but it gives an x1.1-x2 performance boost over the +outline-instrumented kernel. + +Generic KASAN is the only mode that delays the reuse of freed objects via +quarantine (see mm/kasan/quarantine.c for implementation). + +Software Tag-Based KASAN +~~~~~~~~~~~~~~~~~~~~~~~~ + +Software Tag-Based KASAN uses a software memory tagging approach to checking +access validity. It is currently only implemented for the arm64 architecture. + +Software Tag-Based KASAN uses the Top Byte Ignore (TBI) feature of arm64 CPUs +to store a pointer tag in the top byte of kernel pointers. It uses shadow memory +to store memory tags associated with each 16-byte memory cell (therefore, it +dedicates 1/16th of the kernel memory for shadow memory). + +On each memory allocation, Software Tag-Based KASAN generates a random tag, tags +the allocated memory with this tag, and embeds the same tag into the returned +pointer. + +Software Tag-Based KASAN uses compile-time instrumentation to insert checks +before each memory access. These checks make sure that the tag of the memory +that is being accessed is equal to the tag of the pointer that is used to access +this memory. In case of a tag mismatch, Software Tag-Based KASAN prints a bug +report. + +Software Tag-Based KASAN also has two instrumentation modes (outline, which +emits callbacks to check memory accesses; and inline, which performs the shadow +memory checks inline). With outline instrumentation mode, a bug report is +printed from the function that performs the access check. With inline +instrumentation, a ``brk`` instruction is emitted by the compiler, and a +dedicated ``brk`` handler is used to print bug reports. + +Software Tag-Based KASAN uses 0xFF as a match-all pointer tag (accesses through +pointers with the 0xFF pointer tag are not checked). The value 0xFE is currently +reserved to tag freed memory regions. + +Hardware Tag-Based KASAN +~~~~~~~~~~~~~~~~~~~~~~~~ + +Hardware Tag-Based KASAN is similar to the software mode in concept but uses +hardware memory tagging support instead of compiler instrumentation and +shadow memory. + +Hardware Tag-Based KASAN is currently only implemented for arm64 architecture +and based on both arm64 Memory Tagging Extension (MTE) introduced in ARMv8.5 +Instruction Set Architecture and Top Byte Ignore (TBI). + +Special arm64 instructions are used to assign memory tags for each allocation. +Same tags are assigned to pointers to those allocations. On every memory +access, hardware makes sure that the tag of the memory that is being accessed is +equal to the tag of the pointer that is used to access this memory. In case of a +tag mismatch, a fault is generated, and a report is printed. + +Hardware Tag-Based KASAN uses 0xFF as a match-all pointer tag (accesses through +pointers with the 0xFF pointer tag are not checked). The value 0xFE is currently +reserved to tag freed memory regions. + +If the hardware does not support MTE (pre ARMv8.5), Hardware Tag-Based KASAN +will not be enabled. In this case, all KASAN boot parameters are ignored. + +Note that enabling CONFIG_KASAN_HW_TAGS always results in in-kernel TBI being +enabled. Even when ``kasan.mode=off`` is provided or when the hardware does not +support MTE (but supports TBI). + +Hardware Tag-Based KASAN only reports the first found bug. After that, MTE tag +checking gets disabled. + +Shadow memory +------------- + +The contents of this section are only applicable to software KASAN modes. + +The kernel maps memory in several different parts of the address space. +The range of kernel virtual addresses is large: there is not enough real +memory to support a real shadow region for every address that could be +accessed by the kernel. Therefore, KASAN only maps real shadow for certain +parts of the address space. + +Default behaviour +~~~~~~~~~~~~~~~~~ + +By default, architectures only map real memory over the shadow region +for the linear mapping (and potentially other small areas). For all +other areas - such as vmalloc and vmemmap space - a single read-only +page is mapped over the shadow area. This read-only shadow page +declares all memory accesses as permitted. + +This presents a problem for modules: they do not live in the linear +mapping but in a dedicated module space. By hooking into the module +allocator, KASAN temporarily maps real shadow memory to cover them. +This allows detection of invalid accesses to module globals, for example. + +This also creates an incompatibility with ``VMAP_STACK``: if the stack +lives in vmalloc space, it will be shadowed by the read-only page, and +the kernel will fault when trying to set up the shadow data for stack +variables. + +CONFIG_KASAN_VMALLOC +~~~~~~~~~~~~~~~~~~~~ + +With ``CONFIG_KASAN_VMALLOC``, KASAN can cover vmalloc space at the +cost of greater memory usage. Currently, this is supported on x86, +arm64, riscv, s390, and powerpc. + +This works by hooking into vmalloc and vmap and dynamically +allocating real shadow memory to back the mappings. + +Most mappings in vmalloc space are small, requiring less than a full +page of shadow space. Allocating a full shadow page per mapping would +therefore be wasteful. Furthermore, to ensure that different mappings +use different shadow pages, mappings would have to be aligned to +``KASAN_GRANULE_SIZE * PAGE_SIZE``. + +Instead, KASAN shares backing space across multiple mappings. It allocates +a backing page when a mapping in vmalloc space uses a particular page +of the shadow region. This page can be shared by other vmalloc +mappings later on. + +KASAN hooks into the vmap infrastructure to lazily clean up unused shadow +memory. + +To avoid the difficulties around swapping mappings around, KASAN expects +that the part of the shadow region that covers the vmalloc space will +not be covered by the early shadow page but will be left unmapped. +This will require changes in arch-specific code. + +This allows ``VMAP_STACK`` support on x86 and can simplify support of +architectures that do not have a fixed module region. + +For developers +-------------- + +Ignoring accesses +~~~~~~~~~~~~~~~~~ + +Software KASAN modes use compiler instrumentation to insert validity checks. +Such instrumentation might be incompatible with some parts of the kernel, and +therefore needs to be disabled. + +Other parts of the kernel might access metadata for allocated objects. +Normally, KASAN detects and reports such accesses, but in some cases (e.g., +in memory allocators), these accesses are valid. + +For software KASAN modes, to disable instrumentation for a specific file or +directory, add a ``KASAN_SANITIZE`` annotation to the respective kernel +Makefile: + +- For a single file (e.g., main.o):: + + KASAN_SANITIZE_main.o := n + +- For all files in one directory:: + + KASAN_SANITIZE := n + +For software KASAN modes, to disable instrumentation on a per-function basis, +use the KASAN-specific ``__no_sanitize_address`` function attribute or the +generic ``noinstr`` one. + +Note that disabling compiler instrumentation (either on a per-file or a +per-function basis) makes KASAN ignore the accesses that happen directly in +that code for software KASAN modes. It does not help when the accesses happen +indirectly (through calls to instrumented functions) or with Hardware +Tag-Based KASAN, which does not use compiler instrumentation. + +For software KASAN modes, to disable KASAN reports in a part of the kernel code +for the current task, annotate this part of the code with a +``kasan_disable_current()``/``kasan_enable_current()`` section. This also +disables the reports for indirect accesses that happen through function calls. + +For tag-based KASAN modes, to disable access checking, use +``kasan_reset_tag()`` or ``page_kasan_tag_reset()``. Note that temporarily +disabling access checking via ``page_kasan_tag_reset()`` requires saving and +restoring the per-page KASAN tag via ``page_kasan_tag``/``page_kasan_tag_set``. + +Tests +~~~~~ + +There are KASAN tests that allow verifying that KASAN works and can detect +certain types of memory corruptions. The tests consist of two parts: + +1. Tests that are integrated with the KUnit Test Framework. Enabled with +``CONFIG_KASAN_KUNIT_TEST``. These tests can be run and partially verified +automatically in a few different ways; see the instructions below. + +2. Tests that are currently incompatible with KUnit. Enabled with +``CONFIG_KASAN_MODULE_TEST`` and can only be run as a module. These tests can +only be verified manually by loading the kernel module and inspecting the +kernel log for KASAN reports. + +Each KUnit-compatible KASAN test prints one of multiple KASAN reports if an +error is detected. Then the test prints its number and status. + +When a test passes:: + + ok 28 - kmalloc_double_kzfree + +When a test fails due to a failed ``kmalloc``:: + + # kmalloc_large_oob_right: ASSERTION FAILED at lib/test_kasan.c:163 + Expected ptr is not null, but is + not ok 4 - kmalloc_large_oob_right + +When a test fails due to a missing KASAN report:: + + # kmalloc_double_kzfree: EXPECTATION FAILED at lib/test_kasan.c:974 + KASAN failure expected in "kfree_sensitive(ptr)", but none occurred + not ok 44 - kmalloc_double_kzfree + + +At the end the cumulative status of all KASAN tests is printed. On success:: + + ok 1 - kasan + +Or, if one of the tests failed:: + + not ok 1 - kasan + +There are a few ways to run KUnit-compatible KASAN tests. + +1. Loadable module + + With ``CONFIG_KUNIT`` enabled, KASAN-KUnit tests can be built as a loadable + module and run by loading ``test_kasan.ko`` with ``insmod`` or ``modprobe``. + +2. Built-In + + With ``CONFIG_KUNIT`` built-in, KASAN-KUnit tests can be built-in as well. + In this case, the tests will run at boot as a late-init call. + +3. Using kunit_tool + + With ``CONFIG_KUNIT`` and ``CONFIG_KASAN_KUNIT_TEST`` built-in, it is also + possible to use ``kunit_tool`` to see the results of KUnit tests in a more + readable way. This will not print the KASAN reports of the tests that passed. + See `KUnit documentation <https://www.kernel.org/doc/html/latest/dev-tools/kunit/index.html>`_ + for more up-to-date information on ``kunit_tool``. + +.. _KUnit: https://www.kernel.org/doc/html/latest/dev-tools/kunit/index.html |