From 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Tue, 21 Feb 2023 18:24:12 -0800 Subject: Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - Add dedicated kmem_cache for typical/small skb->head, avoid having to access struct page at kfree time, and improve memory use. - Introduce sysctl to set default RPS configuration for new netdevs. - Define Netlink protocol specification format which can be used to describe messages used by each family and auto-generate parsers. Add tools for generating kernel data structures and uAPI headers. - Expose all net/core sysctls inside netns. - Remove 4s sleep in netpoll if carrier is instantly detected on boot. - Add configurable limit of MDB entries per port, and port-vlan. - Continue populating drop reasons throughout the stack. - Retire a handful of legacy Qdiscs and classifiers. Protocols: - Support IPv4 big TCP (TSO frames larger than 64kB). - Add IP_LOCAL_PORT_RANGE socket option, to control local port range on socket by socket basis. - Track and report in procfs number of MPTCP sockets used. - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path manager. - IPv6: don't check net.ipv6.route.max_size and rely on garbage collection to free memory (similarly to IPv4). - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986). - ICMP: add per-rate limit counters. - Add support for user scanning requests in ieee802154. - Remove static WEP support. - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate reporting. - WiFi 7 EHT channel puncturing support (client & AP). BPF: - Add a rbtree data structure following the "next-gen data structure" precedent set by recently added linked list, that is, by using kfunc + kptr instead of adding a new BPF map type. - Expose XDP hints via kfuncs with initial support for RX hash and timestamp metadata. - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better support decap on GRE tunnel devices not operating in collect metadata. - Improve x86 JIT's codegen for PROBE_MEM runtime error checks. - Remove the need for trace_printk_lock for bpf_trace_printk and bpf_trace_vprintk helpers. - Extend libbpf's bpf_tracing.h support for tracing arguments of kprobes/uprobes and syscall as a special case. - Significantly reduce the search time for module symbols by livepatch and BPF. - Enable cpumasks to be used as kptrs, which is useful for tracing programs tracking which tasks end up running on which CPUs in different time intervals. - Add support for BPF trampoline on s390x and riscv64. - Add capability to export the XDP features supported by the NIC. - Add __bpf_kfunc tag for marking kernel functions as kfuncs. - Add cgroup.memory=nobpf kernel parameter option to disable BPF memory accounting for container environments. Netfilter: - Remove the CLUSTERIP target. It has been marked as obsolete for years, and we still have WARN splats wrt races of the out-of-band /proc interface installed by this target. - Add 'destroy' commands to nf_tables. They are identical to the existing 'delete' commands, but do not return an error if the referenced object (set, chain, rule...) did not exist. Driver API: - Improve cpumask_local_spread() locality to help NICs set the right IRQ affinity on AMD platforms. - Separate C22 and C45 MDIO bus transactions more clearly. - Introduce new DCB table to control DSCP rewrite on egress. - Support configuration of Physical Layer Collision Avoidance (PLCA) Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of shared medium Ethernet. - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing preemption of low priority frames by high priority frames. - Add support for controlling MACSec offload using netlink SET. - Rework devlink instance refcounts to allow registration and de-registration under the instance lock. Split the code into multiple files, drop some of the unnecessarily granular locks and factor out common parts of netlink operation handling. - Add TX frame aggregation parameters (for USB drivers). - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning messages with notifications for debug. - Allow offloading of UDP NEW connections via act_ct. - Add support for per action HW stats in TC. - Support hardware miss to TC action (continue processing in SW from a specific point in the action chain). - Warn if old Wireless Extension user space interface is used with modern cfg80211/mac80211 drivers. Do not support Wireless Extensions for Wi-Fi 7 devices at all. Everyone should switch to using nl80211 interface instead. - Improve the CAN bit timing configuration. Use extack to return error messages directly to user space, update the SJW handling, including the definition of a new default value that will benefit CAN-FD controllers, by increasing their oscillator tolerance. New hardware / drivers: - Ethernet: - nVidia BlueField-3 support (control traffic driver) - Ethernet support for imx93 SoCs - Motorcomm yt8531 gigabit Ethernet PHY - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA) - Microchip LAN8841 PHY (incl. cable diagnostics and PTP) - Amlogic gxl MDIO mux - WiFi: - RealTek RTL8188EU (rtl8xxxu) - Qualcomm Wi-Fi 7 devices (ath12k) - CAN: - Renesas R-Car V4H Drivers: - Bluetooth: - Set Per Platform Antenna Gain (PPAG) for Intel controllers. - Ethernet NICs: - Intel (1G, igc): - support TSN / Qbv / packet scheduling features of i226 model - Intel (100G, ice): - use GNSS subsystem instead of TTY - multi-buffer XDP support - extend support for GPIO pins to E823 devices - nVidia/Mellanox: - update the shared buffer configuration on PFC commands - implement PTP adjphase function for HW offset control - TC support for Geneve and GRE with VF tunnel offload - more efficient crypto key management method - multi-port eswitch support - Netronome/Corigine: - add DCB IEEE support - support IPsec offloading for NFP3800 - Freescale/NXP (enetc): - support XDP_REDIRECT for XDP non-linear buffers - improve reconfig, avoid link flap and waiting for idle - support MAC Merge layer - Other NICs: - sfc/ef100: add basic devlink support for ef100 - ionic: rx_push mode operation (writing descriptors via MMIO) - bnxt: use the auxiliary bus abstraction for RDMA - r8169: disable ASPM and reset bus in case of tx timeout - cpsw: support QSGMII mode for J721e CPSW9G - cpts: support pulse-per-second output - ngbe: add an mdio bus driver - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing - r8152: handle devices with FW with NCM support - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation - virtio-net: support multi buffer XDP - virtio/vsock: replace virtio_vsock_pkt with sk_buff - tsnep: XDP support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add support for latency TLV (in FW control messages) - Microchip (sparx5): - separate explicit and implicit traffic forwarding rules, make the implicit rules always active - add support for egress DSCP rewrite - IS0 VCAP support (Ingress Classification) - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS etc.) - ES2 VCAP support (Egress Access Control) - support for Per-Stream Filtering and Policing (802.1Q, 8.6.5.1) - Ethernet embedded switches: - Marvell (mv88e6xxx): - add MAB (port auth) offload support - enable PTP receive for mv88e6390 - NXP (ocelot): - support MAC Merge layer - support for the the vsc7512 internal copper phys - Microchip: - lan9303: convert to PHYLINK - lan966x: support TC flower filter statistics - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x - lan937x: support Credit Based Shaper configuration - ksz9477: support Energy Efficient Ethernet - other: - qca8k: convert to regmap read/write API, use bulk operations - rswitch: Improve TX timestamp accuracy - Intel WiFi (iwlwifi): - EHT (Wi-Fi 7) rate reporting - STEP equalizer support: transfer some STEP (connection to radio on platforms with integrated wifi) related parameters from the BIOS to the firmware. - Qualcomm 802.11ax WiFi (ath11k): - IPQ5018 support - Fine Timing Measurement (FTM) responder role support - channel 177 support - MediaTek WiFi (mt76): - per-PHY LED support - mt7996: EHT (Wi-Fi 7) support - Wireless Ethernet Dispatch (WED) reset support - switch to using page pool allocator - RealTek WiFi (rtw89): - support new version of Bluetooth co-existance - Mobile: - rmnet: support TX aggregation" * tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits) page_pool: add a comment explaining the fragment counter usage net: ethtool: fix __ethtool_dev_mm_supported() implementation ethtool: pse-pd: Fix double word in comments xsk: add linux/vmalloc.h to xsk.c sefltests: netdevsim: wait for devlink instance after netns removal selftest: fib_tests: Always cleanup before exit net/mlx5e: Align IPsec ASO result memory to be as required by hardware net/mlx5e: TC, Set CT miss to the specific ct action instance net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG net/mlx5: Refactor tc miss handling to a single function net/mlx5: Kconfig: Make tc offload depend on tc skb extension net/sched: flower: Support hardware miss to tc action net/sched: flower: Move filter handle initialization earlier net/sched: cls_api: Support hardware miss to tc action net/sched: Rename user cookie and act cookie sfc: fix builds without CONFIG_RTC_LIB sfc: clean up some inconsistent indentings net/mlx4_en: Introduce flexible array to silence overflow warning net: lan966x: Fix possible deadlock inside PTP net/ulp: Remove redundant ->clone() test in inet_clone_ulp(). ... --- Documentation/filesystems/caching/backend-api.rst | 479 ++++++++++++++++++++++ 1 file changed, 479 insertions(+) create mode 100644 Documentation/filesystems/caching/backend-api.rst (limited to 'Documentation/filesystems/caching/backend-api.rst') diff --git a/Documentation/filesystems/caching/backend-api.rst b/Documentation/filesystems/caching/backend-api.rst new file mode 100644 index 000000000..3a199fc50 --- /dev/null +++ b/Documentation/filesystems/caching/backend-api.rst @@ -0,0 +1,479 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================= +Cache Backend API +================= + +The FS-Cache system provides an API by which actual caches can be supplied to +FS-Cache for it to then serve out to network filesystems and other interested +parties. This API is used by:: + + #include . + + +Overview +======== + +Interaction with the API is handled on three levels: cache, volume and data +storage, and each level has its own type of cookie object: + + ======================= ======================= + COOKIE C TYPE + ======================= ======================= + Cache cookie struct fscache_cache + Volume cookie struct fscache_volume + Data storage cookie struct fscache_cookie + ======================= ======================= + +Cookies are used to provide some filesystem data to the cache, manage state and +pin the cache during access in addition to acting as reference points for the +API functions. Each cookie has a debugging ID that is included in trace points +to make it easier to correlate traces. Note, though, that debugging IDs are +simply allocated from incrementing counters and will eventually wrap. + +The cache backend and the network filesystem can both ask for cache cookies - +and if they ask for one of the same name, they'll get the same cookie. Volume +and data cookies, however, are created at the behest of the filesystem only. + + +Cache Cookies +============= + +Caches are represented in the API by cache cookies. These are objects of +type:: + + struct fscache_cache { + void *cache_priv; + unsigned int debug_id; + char *name; + ... + }; + +There are a few fields that the cache backend might be interested in. The +``debug_id`` can be used in tracing to match lines referring to the same cache +and ``name`` is the name the cache was registered with. The ``cache_priv`` +member is private data provided by the cache when it is brought online. The +other fields are for internal use. + + +Registering a Cache +=================== + +When a cache backend wants to bring a cache online, it should first register +the cache name and that will get it a cache cookie. This is done with:: + + struct fscache_cache *fscache_acquire_cache(const char *name); + +This will look up and potentially create a cache cookie. The cache cookie may +have already been created by a network filesystem looking for it, in which case +that cache cookie will be used. If the cache cookie is not in use by another +cache, it will be moved into the preparing state, otherwise it will return +busy. + +If successful, the cache backend can then start setting up the cache. In the +event that the initialisation fails, the cache backend should call:: + + void fscache_relinquish_cache(struct fscache_cache *cache); + +to reset and discard the cookie. + + +Bringing a Cache Online +======================= + +Once the cache is set up, it can be brought online by calling:: + + int fscache_add_cache(struct fscache_cache *cache, + const struct fscache_cache_ops *ops, + void *cache_priv); + +This stores the cache operations table pointer and cache private data into the +cache cookie and moves the cache to the active state, thereby allowing accesses +to take place. + + +Withdrawing a Cache From Service +================================ + +The cache backend can withdraw a cache from service by calling this function:: + + void fscache_withdraw_cache(struct fscache_cache *cache); + +This moves the cache to the withdrawn state to prevent new cache- and +volume-level accesses from starting and then waits for outstanding cache-level +accesses to complete. + +The cache must then go through the data storage objects it has and tell fscache +to withdraw them, calling:: + + void fscache_withdraw_cookie(struct fscache_cookie *cookie); + +on the cookie that each object belongs to. This schedules the specified cookie +for withdrawal. This gets offloaded to a workqueue. The cache backend can +wait for completion by calling:: + + void fscache_wait_for_objects(struct fscache_cache *cache); + +Once all the cookies are withdrawn, a cache backend can withdraw all the +volumes, calling:: + + void fscache_withdraw_volume(struct fscache_volume *volume); + +to tell fscache that a volume has been withdrawn. This waits for all +outstanding accesses on the volume to complete before returning. + +When the cache is completely withdrawn, fscache should be notified by +calling:: + + void fscache_relinquish_cache(struct fscache_cache *cache); + +to clear fields in the cookie and discard the caller's ref on it. + + +Volume Cookies +============== + +Within a cache, the data storage objects are organised into logical volumes. +These are represented in the API as objects of type:: + + struct fscache_volume { + struct fscache_cache *cache; + void *cache_priv; + unsigned int debug_id; + char *key; + unsigned int key_hash; + ... + u8 coherency_len; + u8 coherency[]; + }; + +There are a number of fields here that are of interest to the caching backend: + + * ``cache`` - The parent cache cookie. + + * ``cache_priv`` - A place for the cache to stash private data. + + * ``debug_id`` - A debugging ID for logging in tracepoints. + + * ``key`` - A printable string with no '/' characters in it that represents + the index key for the volume. The key is NUL-terminated and padded out to + a multiple of 4 bytes. + + * ``key_hash`` - A hash of the index key. This should work out the same, no + matter the cpu arch and endianness. + + * ``coherency`` - A piece of coherency data that should be checked when the + volume is bound to in the cache. + + * ``coherency_len`` - The amount of data in the coherency buffer. + + +Data Storage Cookies +==================== + +A volume is a logical group of data storage objects, each of which is +represented to the network filesystem by a cookie. Cookies are represented in +the API as objects of type:: + + struct fscache_cookie { + struct fscache_volume *volume; + void *cache_priv; + unsigned long flags; + unsigned int debug_id; + unsigned int inval_counter; + loff_t object_size; + u8 advice; + u32 key_hash; + u8 key_len; + u8 aux_len; + ... + }; + +The fields in the cookie that are of interest to the cache backend are: + + * ``volume`` - The parent volume cookie. + + * ``cache_priv`` - A place for the cache to stash private data. + + * ``flags`` - A collection of bit flags, including: + + * FSCACHE_COOKIE_NO_DATA_TO_READ - There is no data available in the + cache to be read as the cookie has been created or invalidated. + + * FSCACHE_COOKIE_NEEDS_UPDATE - The coherency data and/or object size has + been changed and needs committing. + + * FSCACHE_COOKIE_LOCAL_WRITE - The netfs's data has been modified + locally, so the cache object may be in an incoherent state with respect + to the server. + + * FSCACHE_COOKIE_HAVE_DATA - The backend should set this if it + successfully stores data into the cache. + + * FSCACHE_COOKIE_RETIRED - The cookie was invalidated when it was + relinquished and the cached data should be discarded. + + * ``debug_id`` - A debugging ID for logging in tracepoints. + + * ``inval_counter`` - The number of invalidations done on the cookie. + + * ``advice`` - Information about how the cookie is to be used. + + * ``key_hash`` - A hash of the index key. This should work out the same, no + matter the cpu arch and endianness. + + * ``key_len`` - The length of the index key. + + * ``aux_len`` - The length of the coherency data buffer. + +Each cookie has an index key, which may be stored inline to the cookie or +elsewhere. A pointer to this can be obtained by calling:: + + void *fscache_get_key(struct fscache_cookie *cookie); + +The index key is a binary blob, the storage for which is padded out to a +multiple of 4 bytes. + +Each cookie also has a buffer for coherency data. This may also be inline or +detached from the cookie and a pointer is obtained by calling:: + + void *fscache_get_aux(struct fscache_cookie *cookie); + + + +Cookie Accounting +================= + +Data storage cookies are counted and this is used to block cache withdrawal +completion until all objects have been destroyed. The following functions are +provided to the cache to deal with that:: + + void fscache_count_object(struct fscache_cache *cache); + void fscache_uncount_object(struct fscache_cache *cache); + void fscache_wait_for_objects(struct fscache_cache *cache); + +The count function records the allocation of an object in a cache and the +uncount function records its destruction. Warning: by the time the uncount +function returns, the cache may have been destroyed. + +The wait function can be used during the withdrawal procedure to wait for +fscache to finish withdrawing all the objects in the cache. When it completes, +there will be no remaining objects referring to the cache object or any volume +objects. + + +Cache Management API +==================== + +The cache backend implements the cache management API by providing a table of +operations that fscache can use to manage various aspects of the cache. These +are held in a structure of type:: + + struct fscache_cache_ops { + const char *name; + ... + }; + +This contains a printable name for the cache backend driver plus a number of +pointers to methods to allow fscache to request management of the cache: + + * Set up a volume cookie [optional]:: + + void (*acquire_volume)(struct fscache_volume *volume); + + This method is called when a volume cookie is being created. The caller + holds a cache-level access pin to prevent the cache from going away for + the duration. This method should set up the resources to access a volume + in the cache and should not return until it has done so. + + If successful, it can set ``cache_priv`` to its own data. + + + * Clean up volume cookie [optional]:: + + void (*free_volume)(struct fscache_volume *volume); + + This method is called when a volume cookie is being released if + ``cache_priv`` is set. + + + * Look up a cookie in the cache [mandatory]:: + + bool (*lookup_cookie)(struct fscache_cookie *cookie); + + This method is called to look up/create the resources needed to access the + data storage for a cookie. It is called from a worker thread with a + volume-level access pin in the cache to prevent it from being withdrawn. + + True should be returned if successful and false otherwise. If false is + returned, the withdraw_cookie op (see below) will be called. + + If lookup fails, but the object could still be created (e.g. it hasn't + been cached before), then:: + + void fscache_cookie_lookup_negative( + struct fscache_cookie *cookie); + + can be called to let the network filesystem proceed and start downloading + stuff whilst the cache backend gets on with the job of creating things. + + If successful, ``cookie->cache_priv`` can be set. + + + * Withdraw an object without any cookie access counts held [mandatory]:: + + void (*withdraw_cookie)(struct fscache_cookie *cookie); + + This method is called to withdraw a cookie from service. It will be + called when the cookie is relinquished by the netfs, withdrawn or culled + by the cache backend or closed after a period of non-use by fscache. + + The caller doesn't hold any access pins, but it is called from a + non-reentrant work item to manage races between the various ways + withdrawal can occur. + + The cookie will have the ``FSCACHE_COOKIE_RETIRED`` flag set on it if the + associated data is to be removed from the cache. + + + * Change the size of a data storage object [mandatory]:: + + void (*resize_cookie)(struct netfs_cache_resources *cres, + loff_t new_size); + + This method is called to inform the cache backend of a change in size of + the netfs file due to local truncation. The cache backend should make all + of the changes it needs to make before returning as this is done under the + netfs inode mutex. + + The caller holds a cookie-level access pin to prevent a race with + withdrawal and the netfs must have the cookie marked in-use to prevent + garbage collection or culling from removing any resources. + + + * Invalidate a data storage object [mandatory]:: + + bool (*invalidate_cookie)(struct fscache_cookie *cookie); + + This is called when the network filesystem detects a third-party + modification or when an O_DIRECT write is made locally. This requests + that the cache backend should throw away all the data in the cache for + this object and start afresh. It should return true if successful and + false otherwise. + + On entry, new I O/operations are blocked. Once the cache is in a position + to accept I/O again, the backend should release the block by calling:: + + void fscache_resume_after_invalidation(struct fscache_cookie *cookie); + + If the method returns false, caching will be withdrawn for this cookie. + + + * Prepare to make local modifications to the cache [mandatory]:: + + void (*prepare_to_write)(struct fscache_cookie *cookie); + + This method is called when the network filesystem finds that it is going + to need to modify the contents of the cache due to local writes or + truncations. This gives the cache a chance to note that a cache object + may be incoherent with respect to the server and may need writing back + later. This may also cause the cached data to be scrapped on later + rebinding if not properly committed. + + + * Begin an operation for the netfs lib [mandatory]:: + + bool (*begin_operation)(struct netfs_cache_resources *cres, + enum fscache_want_state want_state); + + This method is called when an I/O operation is being set up (read, write + or resize). The caller holds an access pin on the cookie and must have + marked the cookie as in-use. + + If it can, the backend should attach any resources it needs to keep around + to the netfs_cache_resources object and return true. + + If it can't complete the setup, it should return false. + + The want_state parameter indicates the state the caller needs the cache + object to be in and what it wants to do during the operation: + + * ``FSCACHE_WANT_PARAMS`` - The caller just wants to access cache + object parameters; it doesn't need to do data I/O yet. + + * ``FSCACHE_WANT_READ`` - The caller wants to read data. + + * ``FSCACHE_WANT_WRITE`` - The caller wants to write to or resize the + cache object. + + Note that there won't necessarily be anything attached to the cookie's + cache_priv yet if the cookie is still being created. + + +Data I/O API +============ + +A cache backend provides a data I/O API by through the netfs library's ``struct +netfs_cache_ops`` attached to a ``struct netfs_cache_resources`` by the +``begin_operation`` method described above. + +See the Documentation/filesystems/netfs_library.rst for a description. + + +Miscellaneous Functions +======================= + +FS-Cache provides some utilities that a cache backend may make use of: + + * Note occurrence of an I/O error in a cache:: + + void fscache_io_error(struct fscache_cache *cache); + + This tells FS-Cache that an I/O error occurred in the cache. This + prevents any new I/O from being started on the cache. + + This does not actually withdraw the cache. That must be done separately. + + * Note cessation of caching on a cookie due to failure:: + + void fscache_caching_failed(struct fscache_cookie *cookie); + + This notes that a the caching that was being done on a cookie failed in + some way, for instance the backing storage failed to be created or + invalidation failed and that no further I/O operations should take place + on it until the cache is reset. + + * Count I/O requests:: + + void fscache_count_read(void); + void fscache_count_write(void); + + These record reads and writes from/to the cache. The numbers are + displayed in /proc/fs/fscache/stats. + + * Count out-of-space errors:: + + void fscache_count_no_write_space(void); + void fscache_count_no_create_space(void); + + These record ENOSPC errors in the cache, divided into failures of data + writes and failures of filesystem object creations (e.g. mkdir). + + * Count objects culled:: + + void fscache_count_culled(void); + + This records the culling of an object. + + * Get the cookie from a set of cache resources:: + + struct fscache_cookie *fscache_cres_cookie(struct netfs_cache_resources *cres) + + Pull a pointer to the cookie from the cache resources. This may return a + NULL cookie if no cookie was set. + + +API Function Reference +====================== + +.. kernel-doc:: include/linux/fscache-cache.h -- cgit v1.2.3