diff options
author | 2023-02-21 18:24:12 -0800 | |
---|---|---|
committer | 2023-02-21 18:24:12 -0800 | |
commit | 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 (patch) | |
tree | cc5c2d0a898769fd59549594fedb3ee6f84e59a0 /Documentation/filesystems/autofs.rst | |
download | linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.tar.gz linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.zip |
Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextgrafted
Pull networking updates from Jakub Kicinski:
"Core:
- Add dedicated kmem_cache for typical/small skb->head, avoid having
to access struct page at kfree time, and improve memory use.
- Introduce sysctl to set default RPS configuration for new netdevs.
- Define Netlink protocol specification format which can be used to
describe messages used by each family and auto-generate parsers.
Add tools for generating kernel data structures and uAPI headers.
- Expose all net/core sysctls inside netns.
- Remove 4s sleep in netpoll if carrier is instantly detected on
boot.
- Add configurable limit of MDB entries per port, and port-vlan.
- Continue populating drop reasons throughout the stack.
- Retire a handful of legacy Qdiscs and classifiers.
Protocols:
- Support IPv4 big TCP (TSO frames larger than 64kB).
- Add IP_LOCAL_PORT_RANGE socket option, to control local port range
on socket by socket basis.
- Track and report in procfs number of MPTCP sockets used.
- Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path
manager.
- IPv6: don't check net.ipv6.route.max_size and rely on garbage
collection to free memory (similarly to IPv4).
- Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).
- ICMP: add per-rate limit counters.
- Add support for user scanning requests in ieee802154.
- Remove static WEP support.
- Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
reporting.
- WiFi 7 EHT channel puncturing support (client & AP).
BPF:
- Add a rbtree data structure following the "next-gen data structure"
precedent set by recently added linked list, that is, by using
kfunc + kptr instead of adding a new BPF map type.
- Expose XDP hints via kfuncs with initial support for RX hash and
timestamp metadata.
- Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to
better support decap on GRE tunnel devices not operating in collect
metadata.
- Improve x86 JIT's codegen for PROBE_MEM runtime error checks.
- Remove the need for trace_printk_lock for bpf_trace_printk and
bpf_trace_vprintk helpers.
- Extend libbpf's bpf_tracing.h support for tracing arguments of
kprobes/uprobes and syscall as a special case.
- Significantly reduce the search time for module symbols by
livepatch and BPF.
- Enable cpumasks to be used as kptrs, which is useful for tracing
programs tracking which tasks end up running on which CPUs in
different time intervals.
- Add support for BPF trampoline on s390x and riscv64.
- Add capability to export the XDP features supported by the NIC.
- Add __bpf_kfunc tag for marking kernel functions as kfuncs.
- Add cgroup.memory=nobpf kernel parameter option to disable BPF
memory accounting for container environments.
Netfilter:
- Remove the CLUSTERIP target. It has been marked as obsolete for
years, and we still have WARN splats wrt races of the out-of-band
/proc interface installed by this target.
- Add 'destroy' commands to nf_tables. They are identical to the
existing 'delete' commands, but do not return an error if the
referenced object (set, chain, rule...) did not exist.
Driver API:
- Improve cpumask_local_spread() locality to help NICs set the right
IRQ affinity on AMD platforms.
- Separate C22 and C45 MDIO bus transactions more clearly.
- Introduce new DCB table to control DSCP rewrite on egress.
- Support configuration of Physical Layer Collision Avoidance (PLCA)
Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
shared medium Ethernet.
- Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
preemption of low priority frames by high priority frames.
- Add support for controlling MACSec offload using netlink SET.
- Rework devlink instance refcounts to allow registration and
de-registration under the instance lock. Split the code into
multiple files, drop some of the unnecessarily granular locks and
factor out common parts of netlink operation handling.
- Add TX frame aggregation parameters (for USB drivers).
- Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
messages with notifications for debug.
- Allow offloading of UDP NEW connections via act_ct.
- Add support for per action HW stats in TC.
- Support hardware miss to TC action (continue processing in SW from
a specific point in the action chain).
- Warn if old Wireless Extension user space interface is used with
modern cfg80211/mac80211 drivers. Do not support Wireless
Extensions for Wi-Fi 7 devices at all. Everyone should switch to
using nl80211 interface instead.
- Improve the CAN bit timing configuration. Use extack to return
error messages directly to user space, update the SJW handling,
including the definition of a new default value that will benefit
CAN-FD controllers, by increasing their oscillator tolerance.
New hardware / drivers:
- Ethernet:
- nVidia BlueField-3 support (control traffic driver)
- Ethernet support for imx93 SoCs
- Motorcomm yt8531 gigabit Ethernet PHY
- onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
- Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
- Amlogic gxl MDIO mux
- WiFi:
- RealTek RTL8188EU (rtl8xxxu)
- Qualcomm Wi-Fi 7 devices (ath12k)
- CAN:
- Renesas R-Car V4H
Drivers:
- Bluetooth:
- Set Per Platform Antenna Gain (PPAG) for Intel controllers.
- Ethernet NICs:
- Intel (1G, igc):
- support TSN / Qbv / packet scheduling features of i226 model
- Intel (100G, ice):
- use GNSS subsystem instead of TTY
- multi-buffer XDP support
- extend support for GPIO pins to E823 devices
- nVidia/Mellanox:
- update the shared buffer configuration on PFC commands
- implement PTP adjphase function for HW offset control
- TC support for Geneve and GRE with VF tunnel offload
- more efficient crypto key management method
- multi-port eswitch support
- Netronome/Corigine:
- add DCB IEEE support
- support IPsec offloading for NFP3800
- Freescale/NXP (enetc):
- support XDP_REDIRECT for XDP non-linear buffers
- improve reconfig, avoid link flap and waiting for idle
- support MAC Merge layer
- Other NICs:
- sfc/ef100: add basic devlink support for ef100
- ionic: rx_push mode operation (writing descriptors via MMIO)
- bnxt: use the auxiliary bus abstraction for RDMA
- r8169: disable ASPM and reset bus in case of tx timeout
- cpsw: support QSGMII mode for J721e CPSW9G
- cpts: support pulse-per-second output
- ngbe: add an mdio bus driver
- usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
- r8152: handle devices with FW with NCM support
- amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
- virtio-net: support multi buffer XDP
- virtio/vsock: replace virtio_vsock_pkt with sk_buff
- tsnep: XDP support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add support for latency TLV (in FW control messages)
- Microchip (sparx5):
- separate explicit and implicit traffic forwarding rules, make
the implicit rules always active
- add support for egress DSCP rewrite
- IS0 VCAP support (Ingress Classification)
- IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS
etc.)
- ES2 VCAP support (Egress Access Control)
- support for Per-Stream Filtering and Policing (802.1Q,
8.6.5.1)
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- add MAB (port auth) offload support
- enable PTP receive for mv88e6390
- NXP (ocelot):
- support MAC Merge layer
- support for the the vsc7512 internal copper phys
- Microchip:
- lan9303: convert to PHYLINK
- lan966x: support TC flower filter statistics
- lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
- lan937x: support Credit Based Shaper configuration
- ksz9477: support Energy Efficient Ethernet
- other:
- qca8k: convert to regmap read/write API, use bulk operations
- rswitch: Improve TX timestamp accuracy
- Intel WiFi (iwlwifi):
- EHT (Wi-Fi 7) rate reporting
- STEP equalizer support: transfer some STEP (connection to radio
on platforms with integrated wifi) related parameters from the
BIOS to the firmware.
- Qualcomm 802.11ax WiFi (ath11k):
- IPQ5018 support
- Fine Timing Measurement (FTM) responder role support
- channel 177 support
- MediaTek WiFi (mt76):
- per-PHY LED support
- mt7996: EHT (Wi-Fi 7) support
- Wireless Ethernet Dispatch (WED) reset support
- switch to using page pool allocator
- RealTek WiFi (rtw89):
- support new version of Bluetooth co-existance
- Mobile:
- rmnet: support TX aggregation"
* tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits)
page_pool: add a comment explaining the fragment counter usage
net: ethtool: fix __ethtool_dev_mm_supported() implementation
ethtool: pse-pd: Fix double word in comments
xsk: add linux/vmalloc.h to xsk.c
sefltests: netdevsim: wait for devlink instance after netns removal
selftest: fib_tests: Always cleanup before exit
net/mlx5e: Align IPsec ASO result memory to be as required by hardware
net/mlx5e: TC, Set CT miss to the specific ct action instance
net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
net/mlx5: Refactor tc miss handling to a single function
net/mlx5: Kconfig: Make tc offload depend on tc skb extension
net/sched: flower: Support hardware miss to tc action
net/sched: flower: Move filter handle initialization earlier
net/sched: cls_api: Support hardware miss to tc action
net/sched: Rename user cookie and act cookie
sfc: fix builds without CONFIG_RTC_LIB
sfc: clean up some inconsistent indentings
net/mlx4_en: Introduce flexible array to silence overflow warning
net: lan966x: Fix possible deadlock inside PTP
net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
...
Diffstat (limited to 'Documentation/filesystems/autofs.rst')
-rw-r--r-- | Documentation/filesystems/autofs.rst | 580 |
1 files changed, 580 insertions, 0 deletions
diff --git a/Documentation/filesystems/autofs.rst b/Documentation/filesystems/autofs.rst new file mode 100644 index 000000000..4f490278d --- /dev/null +++ b/Documentation/filesystems/autofs.rst @@ -0,0 +1,580 @@ +===================== +autofs - how it works +===================== + +Purpose +======= + +The goal of autofs is to provide on-demand mounting and race free +automatic unmounting of various other filesystems. This provides two +key advantages: + +1. There is no need to delay boot until all filesystems that + might be needed are mounted. Processes that try to access those + slow filesystems might be delayed but other processes can + continue freely. This is particularly important for + network filesystems (e.g. NFS) or filesystems stored on + media with a media-changing robot. + +2. The names and locations of filesystems can be stored in + a remote database and can change at any time. The content + in that data base at the time of access will be used to provide + a target for the access. The interpretation of names in the + filesystem can even be programmatic rather than database-backed, + allowing wildcards for example, and can vary based on the user who + first accessed a name. + +Context +======= + +The "autofs" filesystem module is only one part of an autofs system. +There also needs to be a user-space program which looks up names +and mounts filesystems. This will often be the "automount" program, +though other tools including "systemd" can make use of "autofs". +This document describes only the kernel module and the interactions +required with any user-space program. Subsequent text refers to this +as the "automount daemon" or simply "the daemon". + +"autofs" is a Linux kernel module which provides the "autofs" +filesystem type. Several "autofs" filesystems can be mounted and they +can each be managed separately, or all managed by the same daemon. + +Content +======= + +An autofs filesystem can contain 3 sorts of objects: directories, +symbolic links and mount traps. Mount traps are directories with +extra properties as described in the next section. + +Objects can only be created by the automount daemon: symlinks are +created with a regular `symlink` system call, while directories and +mount traps are created with `mkdir`. The determination of whether a +directory should be a mount trap is based on a master map. This master +map is consulted by autofs to determine which directories are mount +points. Mount points can be *direct*/*indirect*/*offset*. +On most systems, the default master map is located at */etc/auto.master*. + +If neither the *direct* or *offset* mount options are given (so the +mount is considered to be *indirect*), then the root directory is +always a regular directory, otherwise it is a mount trap when it is +empty and a regular directory when not empty. Note that *direct* and +*offset* are treated identically so a concise summary is that the root +directory is a mount trap only if the filesystem is mounted *direct* +and the root is empty. + +Directories created in the root directory are mount traps only if the +filesystem is mounted *indirect* and they are empty. + +Directories further down the tree depend on the *maxproto* mount +option and particularly whether it is less than five or not. +When *maxproto* is five, no directories further down the +tree are ever mount traps, they are always regular directories. When +the *maxproto* is four (or three), these directories are mount traps +precisely when they are empty. + +So: non-empty (i.e. non-leaf) directories are never mount traps. Empty +directories are sometimes mount traps, and sometimes not depending on +where in the tree they are (root, top level, or lower), the *maxproto*, +and whether the mount was *indirect* or not. + +Mount Traps +=========== + +A core element of the implementation of autofs is the Mount Traps +which are provided by the Linux VFS. Any directory provided by a +filesystem can be designated as a trap. This involves two separate +features that work together to allow autofs to do its job. + +**DCACHE_NEED_AUTOMOUNT** + +If a dentry has the DCACHE_NEED_AUTOMOUNT flag set (which gets set if +the inode has S_AUTOMOUNT set, or can be set directly) then it is +(potentially) a mount trap. Any access to this directory beyond a +"`stat`" will (normally) cause the `d_op->d_automount()` dentry operation +to be called. The task of this method is to find the filesystem that +should be mounted on the directory and to return it. The VFS is +responsible for actually mounting the root of this filesystem on the +directory. + +autofs doesn't find the filesystem itself but sends a message to the +automount daemon asking it to find and mount the filesystem. The +autofs `d_automount` method then waits for the daemon to report that +everything is ready. It will then return "`NULL`" indicating that the +mount has already happened. The VFS doesn't try to mount anything but +follows down the mount that is already there. + +This functionality is sufficient for some users of mount traps such +as NFS which creates traps so that mountpoints on the server can be +reflected on the client. However it is not sufficient for autofs. As +mounting onto a directory is considered to be "beyond a `stat`", the +automount daemon would not be able to mount a filesystem on the 'trap' +directory without some way to avoid getting caught in the trap. For +that purpose there is another flag. + +**DCACHE_MANAGE_TRANSIT** + +If a dentry has DCACHE_MANAGE_TRANSIT set then two very different but +related behaviours are invoked, both using the `d_op->d_manage()` +dentry operation. + +Firstly, before checking to see if any filesystem is mounted on the +directory, d_manage() will be called with the `rcu_walk` parameter set +to `false`. It may return one of three things: + +- A return value of zero indicates that there is nothing special + about this dentry and normal checks for mounts and automounts + should proceed. + + autofs normally returns zero, but first waits for any + expiry (automatic unmounting of the mounted filesystem) to + complete. This avoids races. + +- A return value of `-EISDIR` tells the VFS to ignore any mounts + on the directory and to not consider calling `->d_automount()`. + This effectively disables the **DCACHE_NEED_AUTOMOUNT** flag + causing the directory not be a mount trap after all. + + autofs returns this if it detects that the process performing the + lookup is the automount daemon and that the mount has been + requested but has not yet completed. How it determines this is + discussed later. This allows the automount daemon not to get + caught in the mount trap. + + There is a subtlety here. It is possible that a second autofs + filesystem can be mounted below the first and for both of them to + be managed by the same daemon. For the daemon to be able to mount + something on the second it must be able to "walk" down past the + first. This means that d_manage cannot *always* return -EISDIR for + the automount daemon. It must only return it when a mount has + been requested, but has not yet completed. + + `d_manage` also returns `-EISDIR` if the dentry shouldn't be a + mount trap, either because it is a symbolic link or because it is + not empty. + +- Any other negative value is treated as an error and returned + to the caller. + + autofs can return + + - -ENOENT if the automount daemon failed to mount anything, + - -ENOMEM if it ran out of memory, + - -EINTR if a signal arrived while waiting for expiry to + complete + - or any other error sent down by the automount daemon. + + +The second use case only occurs during an "RCU-walk" and so `rcu_walk` +will be set. + +An RCU-walk is a fast and lightweight process for walking down a +filename path (i.e. it is like running on tip-toes). RCU-walk cannot +cope with all situations so when it finds a difficulty it falls back +to "REF-walk", which is slower but more robust. + +RCU-walk will never call `->d_automount`; the filesystems must already +be mounted or RCU-walk cannot handle the path. +To determine if a mount-trap is safe for RCU-walk mode it calls +`->d_manage()` with `rcu_walk` set to `true`. + +In this case `d_manage()` must avoid blocking and should avoid taking +spinlocks if at all possible. Its sole purpose is to determine if it +would be safe to follow down into any mounted directory and the only +reason that it might not be is if an expiry of the mount is +underway. + +In the `rcu_walk` case, `d_manage()` cannot return -EISDIR to tell the +VFS that this is a directory that doesn't require d_automount. If +`rcu_walk` sees a dentry with DCACHE_NEED_AUTOMOUNT set but nothing +mounted, it *will* fall back to REF-walk. `d_manage()` cannot make the +VFS remain in RCU-walk mode, but can only tell it to get out of +RCU-walk mode by returning `-ECHILD`. + +So `d_manage()`, when called with `rcu_walk` set, should either return +-ECHILD if there is any reason to believe it is unsafe to enter the +mounted filesystem, otherwise it should return 0. + +autofs will return `-ECHILD` if an expiry of the filesystem has been +initiated or is being considered, otherwise it returns 0. + + +Mountpoint expiry +================= + +The VFS has a mechanism for automatically expiring unused mounts, +much as it can expire any unused dentry information from the dcache. +This is guided by the MNT_SHRINKABLE flag. This only applies to +mounts that were created by `d_automount()` returning a filesystem to be +mounted. As autofs doesn't return such a filesystem but leaves the +mounting to the automount daemon, it must involve the automount daemon +in unmounting as well. This also means that autofs has more control +over expiry. + +The VFS also supports "expiry" of mounts using the MNT_EXPIRE flag to +the `umount` system call. Unmounting with MNT_EXPIRE will fail unless +a previous attempt had been made, and the filesystem has been inactive +and untouched since that previous attempt. autofs does not depend on +this but has its own internal tracking of whether filesystems were +recently used. This allows individual names in the autofs directory +to expire separately. + +With version 4 of the protocol, the automount daemon can try to +unmount any filesystems mounted on the autofs filesystem or remove any +symbolic links or empty directories any time it likes. If the unmount +or removal is successful the filesystem will be returned to the state +it was before the mount or creation, so that any access of the name +will trigger normal auto-mount processing. In particular, `rmdir` and +`unlink` do not leave negative entries in the dcache as a normal +filesystem would, so an attempt to access a recently-removed object is +passed to autofs for handling. + +With version 5, this is not safe except for unmounting from top-level +directories. As lower-level directories are never mount traps, other +processes will see an empty directory as soon as the filesystem is +unmounted. So it is generally safest to use the autofs expiry +protocol described below. + +Normally the daemon only wants to remove entries which haven't been +used for a while. For this purpose autofs maintains a "`last_used`" +time stamp on each directory or symlink. For symlinks it genuinely +does record the last time the symlink was "used" or followed to find +out where it points to. For directories the field is used slightly +differently. The field is updated at mount time and during expire +checks if it is found to be in use (ie. open file descriptor or +process working directory) and during path walks. The update done +during path walks prevents frequent expire and immediate mount of +frequently accessed automounts. But in the case where a GUI continually +access or an application frequently scans an autofs directory tree +there can be an accumulation of mounts that aren't actually being +used. To cater for this case the "`strictexpire`" autofs mount option +can be used to avoid the "`last_used`" update on path walk thereby +preventing this apparent inability to expire mounts that aren't +really in use. + +The daemon is able to ask autofs if anything is due to be expired, +using an `ioctl` as discussed later. For a *direct* mount, autofs +considers if the entire mount-tree can be unmounted or not. For an +*indirect* mount, autofs considers each of the names in the top level +directory to determine if any of those can be unmounted and cleaned +up. + +There is an option with indirect mounts to consider each of the leaves +that has been mounted on instead of considering the top-level names. +This was originally intended for compatibility with version 4 of autofs +and should be considered as deprecated for Sun Format automount maps. +However, it may be used again for amd format mount maps (which are +generally indirect maps) because the amd automounter allows for the +setting of an expire timeout for individual mounts. But there are +some difficulties in making the needed changes for this. + +When autofs considers a directory it checks the `last_used` time and +compares it with the "timeout" value set when the filesystem was +mounted, though this check is ignored in some cases. It also checks if +the directory or anything below it is in use. For symbolic links, +only the `last_used` time is ever considered. + +If both appear to support expiring the directory or symlink, an action +is taken. + +There are two ways to ask autofs to consider expiry. The first is to +use the **AUTOFS_IOC_EXPIRE** ioctl. This only works for indirect +mounts. If it finds something in the root directory to expire it will +return the name of that thing. Once a name has been returned the +automount daemon needs to unmount any filesystems mounted below the +name normally. As described above, this is unsafe for non-toplevel +mounts in a version-5 autofs. For this reason the current `automount(8)` +does not use this ioctl. + +The second mechanism uses either the **AUTOFS_DEV_IOCTL_EXPIRE_CMD** or +the **AUTOFS_IOC_EXPIRE_MULTI** ioctl. This will work for both direct and +indirect mounts. If it selects an object to expire, it will notify +the daemon using the notification mechanism described below. This +will block until the daemon acknowledges the expiry notification. +This implies that the "`EXPIRE`" ioctl must be sent from a different +thread than the one which handles notification. + +While the ioctl is blocking, the entry is marked as "expiring" and +`d_manage` will block until the daemon affirms that the unmount has +completed (together with removing any directories that might have been +necessary), or has been aborted. + +Communicating with autofs: detecting the daemon +=============================================== + +There are several forms of communication between the automount daemon +and the filesystem. As we have already seen, the daemon can create and +remove directories and symlinks using normal filesystem operations. +autofs knows whether a process requesting some operation is the daemon +or not based on its process-group id number (see getpgid(1)). + +When an autofs filesystem is mounted the pgid of the mounting +processes is recorded unless the "pgrp=" option is given, in which +case that number is recorded instead. Any request arriving from a +process in that process group is considered to come from the daemon. +If the daemon ever has to be stopped and restarted a new pgid can be +provided through an ioctl as will be described below. + +Communicating with autofs: the event pipe +========================================= + +When an autofs filesystem is mounted, the 'write' end of a pipe must +be passed using the 'fd=' mount option. autofs will write +notification messages to this pipe for the daemon to respond to. +For version 5, the format of the message is:: + + struct autofs_v5_packet { + struct autofs_packet_hdr hdr; + autofs_wqt_t wait_queue_token; + __u32 dev; + __u64 ino; + __u32 uid; + __u32 gid; + __u32 pid; + __u32 tgid; + __u32 len; + char name[NAME_MAX+1]; + }; + +And the format of the header is:: + + struct autofs_packet_hdr { + int proto_version; /* Protocol version */ + int type; /* Type of packet */ + }; + +where the type is one of :: + + autofs_ptype_missing_indirect + autofs_ptype_expire_indirect + autofs_ptype_missing_direct + autofs_ptype_expire_direct + +so messages can indicate that a name is missing (something tried to +access it but it isn't there) or that it has been selected for expiry. + +The pipe will be set to "packet mode" (equivalent to passing +`O_DIRECT`) to _pipe2(2)_ so that a read from the pipe will return at +most one packet, and any unread portion of a packet will be discarded. + +The `wait_queue_token` is a unique number which can identify a +particular request to be acknowledged. When a message is sent over +the pipe the affected dentry is marked as either "active" or +"expiring" and other accesses to it block until the message is +acknowledged using one of the ioctls below with the relevant +`wait_queue_token`. + +Communicating with autofs: root directory ioctls +================================================ + +The root directory of an autofs filesystem will respond to a number of +ioctls. The process issuing the ioctl must have the CAP_SYS_ADMIN +capability, or must be the automount daemon. + +The available ioctl commands are: + +- **AUTOFS_IOC_READY**: + a notification has been handled. The argument + to the ioctl command is the "wait_queue_token" number + corresponding to the notification being acknowledged. +- **AUTOFS_IOC_FAIL**: + similar to above, but indicates failure with + the error code `ENOENT`. +- **AUTOFS_IOC_CATATONIC**: + Causes the autofs to enter "catatonic" + mode meaning that it stops sending notifications to the daemon. + This mode is also entered if a write to the pipe fails. +- **AUTOFS_IOC_PROTOVER**: + This returns the protocol version in use. +- **AUTOFS_IOC_PROTOSUBVER**: + Returns the protocol sub-version which + is really a version number for the implementation. +- **AUTOFS_IOC_SETTIMEOUT**: + This passes a pointer to an unsigned + long. The value is used to set the timeout for expiry, and + the current timeout value is stored back through the pointer. +- **AUTOFS_IOC_ASKUMOUNT**: + Returns, in the pointed-to `int`, 1 if + the filesystem could be unmounted. This is only a hint as + the situation could change at any instant. This call can be + used to avoid a more expensive full unmount attempt. +- **AUTOFS_IOC_EXPIRE**: + as described above, this asks if there is + anything suitable to expire. A pointer to a packet:: + + struct autofs_packet_expire_multi { + struct autofs_packet_hdr hdr; + autofs_wqt_t wait_queue_token; + int len; + char name[NAME_MAX+1]; + }; + + is required. This is filled in with the name of something + that can be unmounted or removed. If nothing can be expired, + `errno` is set to `EAGAIN`. Even though a `wait_queue_token` + is present in the structure, no "wait queue" is established + and no acknowledgment is needed. +- **AUTOFS_IOC_EXPIRE_MULTI**: + This is similar to + **AUTOFS_IOC_EXPIRE** except that it causes notification to be + sent to the daemon, and it blocks until the daemon acknowledges. + The argument is an integer which can contain two different flags. + + **AUTOFS_EXP_IMMEDIATE** causes `last_used` time to be ignored + and objects are expired if the are not in use. + + **AUTOFS_EXP_FORCED** causes the in use status to be ignored + and objects are expired ieven if they are in use. This assumes + that the daemon has requested this because it is capable of + performing the umount. + + **AUTOFS_EXP_LEAVES** will select a leaf rather than a top-level + name to expire. This is only safe when *maxproto* is 4. + +Communicating with autofs: char-device ioctls +============================================= + +It is not always possible to open the root of an autofs filesystem, +particularly a *direct* mounted filesystem. If the automount daemon +is restarted there is no way for it to regain control of existing +mounts using any of the above communication channels. To address this +need there is a "miscellaneous" character device (major 10, minor 235) +which can be used to communicate directly with the autofs filesystem. +It requires CAP_SYS_ADMIN for access. + +The 'ioctl's that can be used on this device are described in a separate +document `autofs-mount-control.txt`, and are summarised briefly here. +Each ioctl is passed a pointer to an `autofs_dev_ioctl` structure:: + + struct autofs_dev_ioctl { + __u32 ver_major; + __u32 ver_minor; + __u32 size; /* total size of data passed in + * including this struct */ + __s32 ioctlfd; /* automount command fd */ + + /* Command parameters */ + union { + struct args_protover protover; + struct args_protosubver protosubver; + struct args_openmount openmount; + struct args_ready ready; + struct args_fail fail; + struct args_setpipefd setpipefd; + struct args_timeout timeout; + struct args_requester requester; + struct args_expire expire; + struct args_askumount askumount; + struct args_ismountpoint ismountpoint; + }; + + char path[0]; + }; + +For the **OPEN_MOUNT** and **IS_MOUNTPOINT** commands, the target +filesystem is identified by the `path`. All other commands identify +the filesystem by the `ioctlfd` which is a file descriptor open on the +root, and which can be returned by **OPEN_MOUNT**. + +The `ver_major` and `ver_minor` are in/out parameters which check that +the requested version is supported, and report the maximum version +that the kernel module can support. + +Commands are: + +- **AUTOFS_DEV_IOCTL_VERSION_CMD**: + does nothing, except validate and + set version numbers. +- **AUTOFS_DEV_IOCTL_OPENMOUNT_CMD**: + return an open file descriptor + on the root of an autofs filesystem. The filesystem is identified + by name and device number, which is stored in `openmount.devid`. + Device numbers for existing filesystems can be found in + `/proc/self/mountinfo`. +- **AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD**: + same as `close(ioctlfd)`. +- **AUTOFS_DEV_IOCTL_SETPIPEFD_CMD**: + if the filesystem is in + catatonic mode, this can provide the write end of a new pipe + in `setpipefd.pipefd` to re-establish communication with a daemon. + The process group of the calling process is used to identify the + daemon. +- **AUTOFS_DEV_IOCTL_REQUESTER_CMD**: + `path` should be a + name within the filesystem that has been auto-mounted on. + On successful return, `requester.uid` and `requester.gid` will be + the UID and GID of the process which triggered that mount. +- **AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD**: + Check if path is a + mountpoint of a particular type - see separate documentation for + details. + +- **AUTOFS_DEV_IOCTL_PROTOVER_CMD** +- **AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD** +- **AUTOFS_DEV_IOCTL_READY_CMD** +- **AUTOFS_DEV_IOCTL_FAIL_CMD** +- **AUTOFS_DEV_IOCTL_CATATONIC_CMD** +- **AUTOFS_DEV_IOCTL_TIMEOUT_CMD** +- **AUTOFS_DEV_IOCTL_EXPIRE_CMD** +- **AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD** + +These all have the same +function as the similarly named **AUTOFS_IOC** ioctls, except +that **FAIL** can be given an explicit error number in `fail.status` +instead of assuming `ENOENT`, and this **EXPIRE** command +corresponds to **AUTOFS_IOC_EXPIRE_MULTI**. + +Catatonic mode +============== + +As mentioned, an autofs mount can enter "catatonic" mode. This +happens if a write to the notification pipe fails, or if it is +explicitly requested by an `ioctl`. + +When entering catatonic mode, the pipe is closed and any pending +notifications are acknowledged with the error `ENOENT`. + +Once in catatonic mode attempts to access non-existing names will +result in `ENOENT` while attempts to access existing directories will +be treated in the same way as if they came from the daemon, so mount +traps will not fire. + +When the filesystem is mounted a _uid_ and _gid_ can be given which +set the ownership of directories and symbolic links. When the +filesystem is in catatonic mode, any process with a matching UID can +create directories or symlinks in the root directory, but not in other +directories. + +Catatonic mode can only be left via the +**AUTOFS_DEV_IOCTL_OPENMOUNT_CMD** ioctl on the `/dev/autofs`. + +The "ignore" mount option +========================= + +The "ignore" mount option can be used to provide a generic indicator +to applications that the mount entry should be ignored when displaying +mount information. + +In other OSes that provide autofs and that provide a mount list to user +space based on the kernel mount list a no-op mount option ("ignore" is +the one use on the most common OSes) is allowed so that autofs file +system users can optionally use it. + +This is intended to be used by user space programs to exclude autofs +mounts from consideration when reading the mounts list. + +autofs, name spaces, and shared mounts +====================================== + +With bind mounts and name spaces it is possible for an autofs +filesystem to appear at multiple places in one or more filesystem +name spaces. For this to work sensibly, the autofs filesystem should +always be mounted "shared". e.g. :: + + mount --make-shared /autofs/mount/point + +The automount daemon is only able to manage a single mount location for +an autofs filesystem and if mounts on that are not 'shared', other +locations will not behave as expected. In particular access to those +other locations will likely result in the `ELOOP` error :: + + Too many levels of symbolic links |