diff options
author | 2023-02-21 18:24:12 -0800 | |
---|---|---|
committer | 2023-02-21 18:24:12 -0800 | |
commit | 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 (patch) | |
tree | cc5c2d0a898769fd59549594fedb3ee6f84e59a0 /block/blk-sysfs.c | |
download | linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.tar.gz linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.zip |
Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextgrafted
Pull networking updates from Jakub Kicinski:
"Core:
- Add dedicated kmem_cache for typical/small skb->head, avoid having
to access struct page at kfree time, and improve memory use.
- Introduce sysctl to set default RPS configuration for new netdevs.
- Define Netlink protocol specification format which can be used to
describe messages used by each family and auto-generate parsers.
Add tools for generating kernel data structures and uAPI headers.
- Expose all net/core sysctls inside netns.
- Remove 4s sleep in netpoll if carrier is instantly detected on
boot.
- Add configurable limit of MDB entries per port, and port-vlan.
- Continue populating drop reasons throughout the stack.
- Retire a handful of legacy Qdiscs and classifiers.
Protocols:
- Support IPv4 big TCP (TSO frames larger than 64kB).
- Add IP_LOCAL_PORT_RANGE socket option, to control local port range
on socket by socket basis.
- Track and report in procfs number of MPTCP sockets used.
- Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path
manager.
- IPv6: don't check net.ipv6.route.max_size and rely on garbage
collection to free memory (similarly to IPv4).
- Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).
- ICMP: add per-rate limit counters.
- Add support for user scanning requests in ieee802154.
- Remove static WEP support.
- Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
reporting.
- WiFi 7 EHT channel puncturing support (client & AP).
BPF:
- Add a rbtree data structure following the "next-gen data structure"
precedent set by recently added linked list, that is, by using
kfunc + kptr instead of adding a new BPF map type.
- Expose XDP hints via kfuncs with initial support for RX hash and
timestamp metadata.
- Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to
better support decap on GRE tunnel devices not operating in collect
metadata.
- Improve x86 JIT's codegen for PROBE_MEM runtime error checks.
- Remove the need for trace_printk_lock for bpf_trace_printk and
bpf_trace_vprintk helpers.
- Extend libbpf's bpf_tracing.h support for tracing arguments of
kprobes/uprobes and syscall as a special case.
- Significantly reduce the search time for module symbols by
livepatch and BPF.
- Enable cpumasks to be used as kptrs, which is useful for tracing
programs tracking which tasks end up running on which CPUs in
different time intervals.
- Add support for BPF trampoline on s390x and riscv64.
- Add capability to export the XDP features supported by the NIC.
- Add __bpf_kfunc tag for marking kernel functions as kfuncs.
- Add cgroup.memory=nobpf kernel parameter option to disable BPF
memory accounting for container environments.
Netfilter:
- Remove the CLUSTERIP target. It has been marked as obsolete for
years, and we still have WARN splats wrt races of the out-of-band
/proc interface installed by this target.
- Add 'destroy' commands to nf_tables. They are identical to the
existing 'delete' commands, but do not return an error if the
referenced object (set, chain, rule...) did not exist.
Driver API:
- Improve cpumask_local_spread() locality to help NICs set the right
IRQ affinity on AMD platforms.
- Separate C22 and C45 MDIO bus transactions more clearly.
- Introduce new DCB table to control DSCP rewrite on egress.
- Support configuration of Physical Layer Collision Avoidance (PLCA)
Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
shared medium Ethernet.
- Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
preemption of low priority frames by high priority frames.
- Add support for controlling MACSec offload using netlink SET.
- Rework devlink instance refcounts to allow registration and
de-registration under the instance lock. Split the code into
multiple files, drop some of the unnecessarily granular locks and
factor out common parts of netlink operation handling.
- Add TX frame aggregation parameters (for USB drivers).
- Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
messages with notifications for debug.
- Allow offloading of UDP NEW connections via act_ct.
- Add support for per action HW stats in TC.
- Support hardware miss to TC action (continue processing in SW from
a specific point in the action chain).
- Warn if old Wireless Extension user space interface is used with
modern cfg80211/mac80211 drivers. Do not support Wireless
Extensions for Wi-Fi 7 devices at all. Everyone should switch to
using nl80211 interface instead.
- Improve the CAN bit timing configuration. Use extack to return
error messages directly to user space, update the SJW handling,
including the definition of a new default value that will benefit
CAN-FD controllers, by increasing their oscillator tolerance.
New hardware / drivers:
- Ethernet:
- nVidia BlueField-3 support (control traffic driver)
- Ethernet support for imx93 SoCs
- Motorcomm yt8531 gigabit Ethernet PHY
- onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
- Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
- Amlogic gxl MDIO mux
- WiFi:
- RealTek RTL8188EU (rtl8xxxu)
- Qualcomm Wi-Fi 7 devices (ath12k)
- CAN:
- Renesas R-Car V4H
Drivers:
- Bluetooth:
- Set Per Platform Antenna Gain (PPAG) for Intel controllers.
- Ethernet NICs:
- Intel (1G, igc):
- support TSN / Qbv / packet scheduling features of i226 model
- Intel (100G, ice):
- use GNSS subsystem instead of TTY
- multi-buffer XDP support
- extend support for GPIO pins to E823 devices
- nVidia/Mellanox:
- update the shared buffer configuration on PFC commands
- implement PTP adjphase function for HW offset control
- TC support for Geneve and GRE with VF tunnel offload
- more efficient crypto key management method
- multi-port eswitch support
- Netronome/Corigine:
- add DCB IEEE support
- support IPsec offloading for NFP3800
- Freescale/NXP (enetc):
- support XDP_REDIRECT for XDP non-linear buffers
- improve reconfig, avoid link flap and waiting for idle
- support MAC Merge layer
- Other NICs:
- sfc/ef100: add basic devlink support for ef100
- ionic: rx_push mode operation (writing descriptors via MMIO)
- bnxt: use the auxiliary bus abstraction for RDMA
- r8169: disable ASPM and reset bus in case of tx timeout
- cpsw: support QSGMII mode for J721e CPSW9G
- cpts: support pulse-per-second output
- ngbe: add an mdio bus driver
- usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
- r8152: handle devices with FW with NCM support
- amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
- virtio-net: support multi buffer XDP
- virtio/vsock: replace virtio_vsock_pkt with sk_buff
- tsnep: XDP support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add support for latency TLV (in FW control messages)
- Microchip (sparx5):
- separate explicit and implicit traffic forwarding rules, make
the implicit rules always active
- add support for egress DSCP rewrite
- IS0 VCAP support (Ingress Classification)
- IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS
etc.)
- ES2 VCAP support (Egress Access Control)
- support for Per-Stream Filtering and Policing (802.1Q,
8.6.5.1)
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- add MAB (port auth) offload support
- enable PTP receive for mv88e6390
- NXP (ocelot):
- support MAC Merge layer
- support for the the vsc7512 internal copper phys
- Microchip:
- lan9303: convert to PHYLINK
- lan966x: support TC flower filter statistics
- lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
- lan937x: support Credit Based Shaper configuration
- ksz9477: support Energy Efficient Ethernet
- other:
- qca8k: convert to regmap read/write API, use bulk operations
- rswitch: Improve TX timestamp accuracy
- Intel WiFi (iwlwifi):
- EHT (Wi-Fi 7) rate reporting
- STEP equalizer support: transfer some STEP (connection to radio
on platforms with integrated wifi) related parameters from the
BIOS to the firmware.
- Qualcomm 802.11ax WiFi (ath11k):
- IPQ5018 support
- Fine Timing Measurement (FTM) responder role support
- channel 177 support
- MediaTek WiFi (mt76):
- per-PHY LED support
- mt7996: EHT (Wi-Fi 7) support
- Wireless Ethernet Dispatch (WED) reset support
- switch to using page pool allocator
- RealTek WiFi (rtw89):
- support new version of Bluetooth co-existance
- Mobile:
- rmnet: support TX aggregation"
* tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits)
page_pool: add a comment explaining the fragment counter usage
net: ethtool: fix __ethtool_dev_mm_supported() implementation
ethtool: pse-pd: Fix double word in comments
xsk: add linux/vmalloc.h to xsk.c
sefltests: netdevsim: wait for devlink instance after netns removal
selftest: fib_tests: Always cleanup before exit
net/mlx5e: Align IPsec ASO result memory to be as required by hardware
net/mlx5e: TC, Set CT miss to the specific ct action instance
net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
net/mlx5: Refactor tc miss handling to a single function
net/mlx5: Kconfig: Make tc offload depend on tc skb extension
net/sched: flower: Support hardware miss to tc action
net/sched: flower: Move filter handle initialization earlier
net/sched: cls_api: Support hardware miss to tc action
net/sched: Rename user cookie and act cookie
sfc: fix builds without CONFIG_RTC_LIB
sfc: clean up some inconsistent indentings
net/mlx4_en: Introduce flexible array to silence overflow warning
net: lan966x: Fix possible deadlock inside PTP
net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
...
Diffstat (limited to '')
-rw-r--r-- | block/blk-sysfs.c | 916 |
1 files changed, 916 insertions, 0 deletions
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c new file mode 100644 index 000000000..f1fce1c7f --- /dev/null +++ b/block/blk-sysfs.c @@ -0,0 +1,916 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Functions related to sysfs handling + */ +#include <linux/kernel.h> +#include <linux/slab.h> +#include <linux/module.h> +#include <linux/bio.h> +#include <linux/blkdev.h> +#include <linux/backing-dev.h> +#include <linux/blktrace_api.h> +#include <linux/blk-mq.h> +#include <linux/debugfs.h> + +#include "blk.h" +#include "blk-mq.h" +#include "blk-mq-debugfs.h" +#include "blk-mq-sched.h" +#include "blk-rq-qos.h" +#include "blk-wbt.h" +#include "blk-cgroup.h" +#include "blk-throttle.h" + +struct queue_sysfs_entry { + struct attribute attr; + ssize_t (*show)(struct request_queue *, char *); + ssize_t (*store)(struct request_queue *, const char *, size_t); +}; + +static ssize_t +queue_var_show(unsigned long var, char *page) +{ + return sprintf(page, "%lu\n", var); +} + +static ssize_t +queue_var_store(unsigned long *var, const char *page, size_t count) +{ + int err; + unsigned long v; + + err = kstrtoul(page, 10, &v); + if (err || v > UINT_MAX) + return -EINVAL; + + *var = v; + + return count; +} + +static ssize_t queue_var_store64(s64 *var, const char *page) +{ + int err; + s64 v; + + err = kstrtos64(page, 10, &v); + if (err < 0) + return err; + + *var = v; + return 0; +} + +static ssize_t queue_requests_show(struct request_queue *q, char *page) +{ + return queue_var_show(q->nr_requests, page); +} + +static ssize_t +queue_requests_store(struct request_queue *q, const char *page, size_t count) +{ + unsigned long nr; + int ret, err; + + if (!queue_is_mq(q)) + return -EINVAL; + + ret = queue_var_store(&nr, page, count); + if (ret < 0) + return ret; + + if (nr < BLKDEV_MIN_RQ) + nr = BLKDEV_MIN_RQ; + + err = blk_mq_update_nr_requests(q, nr); + if (err) + return err; + + return ret; +} + +static ssize_t queue_ra_show(struct request_queue *q, char *page) +{ + unsigned long ra_kb; + + if (!q->disk) + return -EINVAL; + ra_kb = q->disk->bdi->ra_pages << (PAGE_SHIFT - 10); + return queue_var_show(ra_kb, page); +} + +static ssize_t +queue_ra_store(struct request_queue *q, const char *page, size_t count) +{ + unsigned long ra_kb; + ssize_t ret; + + if (!q->disk) + return -EINVAL; + ret = queue_var_store(&ra_kb, page, count); + if (ret < 0) + return ret; + q->disk->bdi->ra_pages = ra_kb >> (PAGE_SHIFT - 10); + return ret; +} + +static ssize_t queue_max_sectors_show(struct request_queue *q, char *page) +{ + int max_sectors_kb = queue_max_sectors(q) >> 1; + + return queue_var_show(max_sectors_kb, page); +} + +static ssize_t queue_max_segments_show(struct request_queue *q, char *page) +{ + return queue_var_show(queue_max_segments(q), page); +} + +static ssize_t queue_max_discard_segments_show(struct request_queue *q, + char *page) +{ + return queue_var_show(queue_max_discard_segments(q), page); +} + +static ssize_t queue_max_integrity_segments_show(struct request_queue *q, char *page) +{ + return queue_var_show(q->limits.max_integrity_segments, page); +} + +static ssize_t queue_max_segment_size_show(struct request_queue *q, char *page) +{ + return queue_var_show(queue_max_segment_size(q), page); +} + +static ssize_t queue_logical_block_size_show(struct request_queue *q, char *page) +{ + return queue_var_show(queue_logical_block_size(q), page); +} + +static ssize_t queue_physical_block_size_show(struct request_queue *q, char *page) +{ + return queue_var_show(queue_physical_block_size(q), page); +} + +static ssize_t queue_chunk_sectors_show(struct request_queue *q, char *page) +{ + return queue_var_show(q->limits.chunk_sectors, page); +} + +static ssize_t queue_io_min_show(struct request_queue *q, char *page) +{ + return queue_var_show(queue_io_min(q), page); +} + +static ssize_t queue_io_opt_show(struct request_queue *q, char *page) +{ + return queue_var_show(queue_io_opt(q), page); +} + +static ssize_t queue_discard_granularity_show(struct request_queue *q, char *page) +{ + return queue_var_show(q->limits.discard_granularity, page); +} + +static ssize_t queue_discard_max_hw_show(struct request_queue *q, char *page) +{ + + return sprintf(page, "%llu\n", + (unsigned long long)q->limits.max_hw_discard_sectors << 9); +} + +static ssize_t queue_discard_max_show(struct request_queue *q, char *page) +{ + return sprintf(page, "%llu\n", + (unsigned long long)q->limits.max_discard_sectors << 9); +} + +static ssize_t queue_discard_max_store(struct request_queue *q, + const char *page, size_t count) +{ + unsigned long max_discard; + ssize_t ret = queue_var_store(&max_discard, page, count); + + if (ret < 0) + return ret; + + if (max_discard & (q->limits.discard_granularity - 1)) + return -EINVAL; + + max_discard >>= 9; + if (max_discard > UINT_MAX) + return -EINVAL; + + if (max_discard > q->limits.max_hw_discard_sectors) + max_discard = q->limits.max_hw_discard_sectors; + + q->limits.max_discard_sectors = max_discard; + return ret; +} + +static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *page) +{ + return queue_var_show(0, page); +} + +static ssize_t queue_write_same_max_show(struct request_queue *q, char *page) +{ + return queue_var_show(0, page); +} + +static ssize_t queue_write_zeroes_max_show(struct request_queue *q, char *page) +{ + return sprintf(page, "%llu\n", + (unsigned long long)q->limits.max_write_zeroes_sectors << 9); +} + +static ssize_t queue_zone_write_granularity_show(struct request_queue *q, + char *page) +{ + return queue_var_show(queue_zone_write_granularity(q), page); +} + +static ssize_t queue_zone_append_max_show(struct request_queue *q, char *page) +{ + unsigned long long max_sectors = q->limits.max_zone_append_sectors; + + return sprintf(page, "%llu\n", max_sectors << SECTOR_SHIFT); +} + +static ssize_t +queue_max_sectors_store(struct request_queue *q, const char *page, size_t count) +{ + unsigned long var; + unsigned int max_sectors_kb, + max_hw_sectors_kb = queue_max_hw_sectors(q) >> 1, + page_kb = 1 << (PAGE_SHIFT - 10); + ssize_t ret = queue_var_store(&var, page, count); + + if (ret < 0) + return ret; + + max_sectors_kb = (unsigned int)var; + max_hw_sectors_kb = min_not_zero(max_hw_sectors_kb, + q->limits.max_dev_sectors >> 1); + if (max_sectors_kb == 0) { + q->limits.max_user_sectors = 0; + max_sectors_kb = min(max_hw_sectors_kb, + BLK_DEF_MAX_SECTORS >> 1); + } else { + if (max_sectors_kb > max_hw_sectors_kb || + max_sectors_kb < page_kb) + return -EINVAL; + q->limits.max_user_sectors = max_sectors_kb << 1; + } + + spin_lock_irq(&q->queue_lock); + q->limits.max_sectors = max_sectors_kb << 1; + if (q->disk) + q->disk->bdi->io_pages = max_sectors_kb >> (PAGE_SHIFT - 10); + spin_unlock_irq(&q->queue_lock); + + return ret; +} + +static ssize_t queue_max_hw_sectors_show(struct request_queue *q, char *page) +{ + int max_hw_sectors_kb = queue_max_hw_sectors(q) >> 1; + + return queue_var_show(max_hw_sectors_kb, page); +} + +static ssize_t queue_virt_boundary_mask_show(struct request_queue *q, char *page) +{ + return queue_var_show(q->limits.virt_boundary_mask, page); +} + +static ssize_t queue_dma_alignment_show(struct request_queue *q, char *page) +{ + return queue_var_show(queue_dma_alignment(q), page); +} + +#define QUEUE_SYSFS_BIT_FNS(name, flag, neg) \ +static ssize_t \ +queue_##name##_show(struct request_queue *q, char *page) \ +{ \ + int bit; \ + bit = test_bit(QUEUE_FLAG_##flag, &q->queue_flags); \ + return queue_var_show(neg ? !bit : bit, page); \ +} \ +static ssize_t \ +queue_##name##_store(struct request_queue *q, const char *page, size_t count) \ +{ \ + unsigned long val; \ + ssize_t ret; \ + ret = queue_var_store(&val, page, count); \ + if (ret < 0) \ + return ret; \ + if (neg) \ + val = !val; \ + \ + if (val) \ + blk_queue_flag_set(QUEUE_FLAG_##flag, q); \ + else \ + blk_queue_flag_clear(QUEUE_FLAG_##flag, q); \ + return ret; \ +} + +QUEUE_SYSFS_BIT_FNS(nonrot, NONROT, 1); +QUEUE_SYSFS_BIT_FNS(random, ADD_RANDOM, 0); +QUEUE_SYSFS_BIT_FNS(iostats, IO_STAT, 0); +QUEUE_SYSFS_BIT_FNS(stable_writes, STABLE_WRITES, 0); +#undef QUEUE_SYSFS_BIT_FNS + +static ssize_t queue_zoned_show(struct request_queue *q, char *page) +{ + switch (blk_queue_zoned_model(q)) { + case BLK_ZONED_HA: + return sprintf(page, "host-aware\n"); + case BLK_ZONED_HM: + return sprintf(page, "host-managed\n"); + default: + return sprintf(page, "none\n"); + } +} + +static ssize_t queue_nr_zones_show(struct request_queue *q, char *page) +{ + return queue_var_show(disk_nr_zones(q->disk), page); +} + +static ssize_t queue_max_open_zones_show(struct request_queue *q, char *page) +{ + return queue_var_show(bdev_max_open_zones(q->disk->part0), page); +} + +static ssize_t queue_max_active_zones_show(struct request_queue *q, char *page) +{ + return queue_var_show(bdev_max_active_zones(q->disk->part0), page); +} + +static ssize_t queue_nomerges_show(struct request_queue *q, char *page) +{ + return queue_var_show((blk_queue_nomerges(q) << 1) | + blk_queue_noxmerges(q), page); +} + +static ssize_t queue_nomerges_store(struct request_queue *q, const char *page, + size_t count) +{ + unsigned long nm; + ssize_t ret = queue_var_store(&nm, page, count); + + if (ret < 0) + return ret; + + blk_queue_flag_clear(QUEUE_FLAG_NOMERGES, q); + blk_queue_flag_clear(QUEUE_FLAG_NOXMERGES, q); + if (nm == 2) + blk_queue_flag_set(QUEUE_FLAG_NOMERGES, q); + else if (nm) + blk_queue_flag_set(QUEUE_FLAG_NOXMERGES, q); + + return ret; +} + +static ssize_t queue_rq_affinity_show(struct request_queue *q, char *page) +{ + bool set = test_bit(QUEUE_FLAG_SAME_COMP, &q->queue_flags); + bool force = test_bit(QUEUE_FLAG_SAME_FORCE, &q->queue_flags); + + return queue_var_show(set << force, page); +} + +static ssize_t +queue_rq_affinity_store(struct request_queue *q, const char *page, size_t count) +{ + ssize_t ret = -EINVAL; +#ifdef CONFIG_SMP + unsigned long val; + + ret = queue_var_store(&val, page, count); + if (ret < 0) + return ret; + + if (val == 2) { + blk_queue_flag_set(QUEUE_FLAG_SAME_COMP, q); + blk_queue_flag_set(QUEUE_FLAG_SAME_FORCE, q); + } else if (val == 1) { + blk_queue_flag_set(QUEUE_FLAG_SAME_COMP, q); + blk_queue_flag_clear(QUEUE_FLAG_SAME_FORCE, q); + } else if (val == 0) { + blk_queue_flag_clear(QUEUE_FLAG_SAME_COMP, q); + blk_queue_flag_clear(QUEUE_FLAG_SAME_FORCE, q); + } +#endif + return ret; +} + +static ssize_t queue_poll_delay_show(struct request_queue *q, char *page) +{ + int val; + + if (q->poll_nsec == BLK_MQ_POLL_CLASSIC) + val = BLK_MQ_POLL_CLASSIC; + else + val = q->poll_nsec / 1000; + + return sprintf(page, "%d\n", val); +} + +static ssize_t queue_poll_delay_store(struct request_queue *q, const char *page, + size_t count) +{ + int err, val; + + if (!q->mq_ops || !q->mq_ops->poll) + return -EINVAL; + + err = kstrtoint(page, 10, &val); + if (err < 0) + return err; + + if (val == BLK_MQ_POLL_CLASSIC) + q->poll_nsec = BLK_MQ_POLL_CLASSIC; + else if (val >= 0) + q->poll_nsec = val * 1000; + else + return -EINVAL; + + return count; +} + +static ssize_t queue_poll_show(struct request_queue *q, char *page) +{ + return queue_var_show(test_bit(QUEUE_FLAG_POLL, &q->queue_flags), page); +} + +static ssize_t queue_poll_store(struct request_queue *q, const char *page, + size_t count) +{ + if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags)) + return -EINVAL; + pr_info_ratelimited("writes to the poll attribute are ignored.\n"); + pr_info_ratelimited("please use driver specific parameters instead.\n"); + return count; +} + +static ssize_t queue_io_timeout_show(struct request_queue *q, char *page) +{ + return sprintf(page, "%u\n", jiffies_to_msecs(q->rq_timeout)); +} + +static ssize_t queue_io_timeout_store(struct request_queue *q, const char *page, + size_t count) +{ + unsigned int val; + int err; + + err = kstrtou32(page, 10, &val); + if (err || val == 0) + return -EINVAL; + + blk_queue_rq_timeout(q, msecs_to_jiffies(val)); + + return count; +} + +static ssize_t queue_wb_lat_show(struct request_queue *q, char *page) +{ + if (!wbt_rq_qos(q)) + return -EINVAL; + + if (wbt_disabled(q)) + return sprintf(page, "0\n"); + + return sprintf(page, "%llu\n", div_u64(wbt_get_min_lat(q), 1000)); +} + +static ssize_t queue_wb_lat_store(struct request_queue *q, const char *page, + size_t count) +{ + struct rq_qos *rqos; + ssize_t ret; + s64 val; + + ret = queue_var_store64(&val, page); + if (ret < 0) + return ret; + if (val < -1) + return -EINVAL; + + rqos = wbt_rq_qos(q); + if (!rqos) { + ret = wbt_init(q->disk); + if (ret) + return ret; + } + + if (val == -1) + val = wbt_default_latency_nsec(q); + else if (val >= 0) + val *= 1000ULL; + + if (wbt_get_min_lat(q) == val) + return count; + + /* + * Ensure that the queue is idled, in case the latency update + * ends up either enabling or disabling wbt completely. We can't + * have IO inflight if that happens. + */ + blk_mq_freeze_queue(q); + blk_mq_quiesce_queue(q); + + wbt_set_min_lat(q, val); + + blk_mq_unquiesce_queue(q); + blk_mq_unfreeze_queue(q); + + return count; +} + +static ssize_t queue_wc_show(struct request_queue *q, char *page) +{ + if (test_bit(QUEUE_FLAG_WC, &q->queue_flags)) + return sprintf(page, "write back\n"); + + return sprintf(page, "write through\n"); +} + +static ssize_t queue_wc_store(struct request_queue *q, const char *page, + size_t count) +{ + int set = -1; + + if (!strncmp(page, "write back", 10)) + set = 1; + else if (!strncmp(page, "write through", 13) || + !strncmp(page, "none", 4)) + set = 0; + + if (set == -1) + return -EINVAL; + + if (set) + blk_queue_flag_set(QUEUE_FLAG_WC, q); + else + blk_queue_flag_clear(QUEUE_FLAG_WC, q); + + return count; +} + +static ssize_t queue_fua_show(struct request_queue *q, char *page) +{ + return sprintf(page, "%u\n", test_bit(QUEUE_FLAG_FUA, &q->queue_flags)); +} + +static ssize_t queue_dax_show(struct request_queue *q, char *page) +{ + return queue_var_show(blk_queue_dax(q), page); +} + +#define QUEUE_RO_ENTRY(_prefix, _name) \ +static struct queue_sysfs_entry _prefix##_entry = { \ + .attr = { .name = _name, .mode = 0444 }, \ + .show = _prefix##_show, \ +}; + +#define QUEUE_RW_ENTRY(_prefix, _name) \ +static struct queue_sysfs_entry _prefix##_entry = { \ + .attr = { .name = _name, .mode = 0644 }, \ + .show = _prefix##_show, \ + .store = _prefix##_store, \ +}; + +QUEUE_RW_ENTRY(queue_requests, "nr_requests"); +QUEUE_RW_ENTRY(queue_ra, "read_ahead_kb"); +QUEUE_RW_ENTRY(queue_max_sectors, "max_sectors_kb"); +QUEUE_RO_ENTRY(queue_max_hw_sectors, "max_hw_sectors_kb"); +QUEUE_RO_ENTRY(queue_max_segments, "max_segments"); +QUEUE_RO_ENTRY(queue_max_integrity_segments, "max_integrity_segments"); +QUEUE_RO_ENTRY(queue_max_segment_size, "max_segment_size"); +QUEUE_RW_ENTRY(elv_iosched, "scheduler"); + +QUEUE_RO_ENTRY(queue_logical_block_size, "logical_block_size"); +QUEUE_RO_ENTRY(queue_physical_block_size, "physical_block_size"); +QUEUE_RO_ENTRY(queue_chunk_sectors, "chunk_sectors"); +QUEUE_RO_ENTRY(queue_io_min, "minimum_io_size"); +QUEUE_RO_ENTRY(queue_io_opt, "optimal_io_size"); + +QUEUE_RO_ENTRY(queue_max_discard_segments, "max_discard_segments"); +QUEUE_RO_ENTRY(queue_discard_granularity, "discard_granularity"); +QUEUE_RO_ENTRY(queue_discard_max_hw, "discard_max_hw_bytes"); +QUEUE_RW_ENTRY(queue_discard_max, "discard_max_bytes"); +QUEUE_RO_ENTRY(queue_discard_zeroes_data, "discard_zeroes_data"); + +QUEUE_RO_ENTRY(queue_write_same_max, "write_same_max_bytes"); +QUEUE_RO_ENTRY(queue_write_zeroes_max, "write_zeroes_max_bytes"); +QUEUE_RO_ENTRY(queue_zone_append_max, "zone_append_max_bytes"); +QUEUE_RO_ENTRY(queue_zone_write_granularity, "zone_write_granularity"); + +QUEUE_RO_ENTRY(queue_zoned, "zoned"); +QUEUE_RO_ENTRY(queue_nr_zones, "nr_zones"); +QUEUE_RO_ENTRY(queue_max_open_zones, "max_open_zones"); +QUEUE_RO_ENTRY(queue_max_active_zones, "max_active_zones"); + +QUEUE_RW_ENTRY(queue_nomerges, "nomerges"); +QUEUE_RW_ENTRY(queue_rq_affinity, "rq_affinity"); +QUEUE_RW_ENTRY(queue_poll, "io_poll"); +QUEUE_RW_ENTRY(queue_poll_delay, "io_poll_delay"); +QUEUE_RW_ENTRY(queue_wc, "write_cache"); +QUEUE_RO_ENTRY(queue_fua, "fua"); +QUEUE_RO_ENTRY(queue_dax, "dax"); +QUEUE_RW_ENTRY(queue_io_timeout, "io_timeout"); +QUEUE_RW_ENTRY(queue_wb_lat, "wbt_lat_usec"); +QUEUE_RO_ENTRY(queue_virt_boundary_mask, "virt_boundary_mask"); +QUEUE_RO_ENTRY(queue_dma_alignment, "dma_alignment"); + +#ifdef CONFIG_BLK_DEV_THROTTLING_LOW +QUEUE_RW_ENTRY(blk_throtl_sample_time, "throttle_sample_time"); +#endif + +/* legacy alias for logical_block_size: */ +static struct queue_sysfs_entry queue_hw_sector_size_entry = { + .attr = {.name = "hw_sector_size", .mode = 0444 }, + .show = queue_logical_block_size_show, +}; + +QUEUE_RW_ENTRY(queue_nonrot, "rotational"); +QUEUE_RW_ENTRY(queue_iostats, "iostats"); +QUEUE_RW_ENTRY(queue_random, "add_random"); +QUEUE_RW_ENTRY(queue_stable_writes, "stable_writes"); + +static struct attribute *queue_attrs[] = { + &queue_requests_entry.attr, + &queue_ra_entry.attr, + &queue_max_hw_sectors_entry.attr, + &queue_max_sectors_entry.attr, + &queue_max_segments_entry.attr, + &queue_max_discard_segments_entry.attr, + &queue_max_integrity_segments_entry.attr, + &queue_max_segment_size_entry.attr, + &elv_iosched_entry.attr, + &queue_hw_sector_size_entry.attr, + &queue_logical_block_size_entry.attr, + &queue_physical_block_size_entry.attr, + &queue_chunk_sectors_entry.attr, + &queue_io_min_entry.attr, + &queue_io_opt_entry.attr, + &queue_discard_granularity_entry.attr, + &queue_discard_max_entry.attr, + &queue_discard_max_hw_entry.attr, + &queue_discard_zeroes_data_entry.attr, + &queue_write_same_max_entry.attr, + &queue_write_zeroes_max_entry.attr, + &queue_zone_append_max_entry.attr, + &queue_zone_write_granularity_entry.attr, + &queue_nonrot_entry.attr, + &queue_zoned_entry.attr, + &queue_nr_zones_entry.attr, + &queue_max_open_zones_entry.attr, + &queue_max_active_zones_entry.attr, + &queue_nomerges_entry.attr, + &queue_rq_affinity_entry.attr, + &queue_iostats_entry.attr, + &queue_stable_writes_entry.attr, + &queue_random_entry.attr, + &queue_poll_entry.attr, + &queue_wc_entry.attr, + &queue_fua_entry.attr, + &queue_dax_entry.attr, + &queue_wb_lat_entry.attr, + &queue_poll_delay_entry.attr, + &queue_io_timeout_entry.attr, +#ifdef CONFIG_BLK_DEV_THROTTLING_LOW + &blk_throtl_sample_time_entry.attr, +#endif + &queue_virt_boundary_mask_entry.attr, + &queue_dma_alignment_entry.attr, + NULL, +}; + +static umode_t queue_attr_visible(struct kobject *kobj, struct attribute *attr, + int n) +{ + struct gendisk *disk = container_of(kobj, struct gendisk, queue_kobj); + struct request_queue *q = disk->queue; + + if (attr == &queue_io_timeout_entry.attr && + (!q->mq_ops || !q->mq_ops->timeout)) + return 0; + + if ((attr == &queue_max_open_zones_entry.attr || + attr == &queue_max_active_zones_entry.attr) && + !blk_queue_is_zoned(q)) + return 0; + + return attr->mode; +} + +static struct attribute_group queue_attr_group = { + .attrs = queue_attrs, + .is_visible = queue_attr_visible, +}; + + +#define to_queue(atr) container_of((atr), struct queue_sysfs_entry, attr) + +static ssize_t +queue_attr_show(struct kobject *kobj, struct attribute *attr, char *page) +{ + struct queue_sysfs_entry *entry = to_queue(attr); + struct gendisk *disk = container_of(kobj, struct gendisk, queue_kobj); + struct request_queue *q = disk->queue; + ssize_t res; + + if (!entry->show) + return -EIO; + mutex_lock(&q->sysfs_lock); + res = entry->show(q, page); + mutex_unlock(&q->sysfs_lock); + return res; +} + +static ssize_t +queue_attr_store(struct kobject *kobj, struct attribute *attr, + const char *page, size_t length) +{ + struct queue_sysfs_entry *entry = to_queue(attr); + struct gendisk *disk = container_of(kobj, struct gendisk, queue_kobj); + struct request_queue *q = disk->queue; + ssize_t res; + + if (!entry->store) + return -EIO; + + mutex_lock(&q->sysfs_lock); + res = entry->store(q, page, length); + mutex_unlock(&q->sysfs_lock); + return res; +} + +static const struct sysfs_ops queue_sysfs_ops = { + .show = queue_attr_show, + .store = queue_attr_store, +}; + +static const struct attribute_group *blk_queue_attr_groups[] = { + &queue_attr_group, + NULL +}; + +static void blk_queue_release(struct kobject *kobj) +{ + /* nothing to do here, all data is associated with the parent gendisk */ +} + +static const struct kobj_type blk_queue_ktype = { + .default_groups = blk_queue_attr_groups, + .sysfs_ops = &queue_sysfs_ops, + .release = blk_queue_release, +}; + +static void blk_debugfs_remove(struct gendisk *disk) +{ + struct request_queue *q = disk->queue; + + mutex_lock(&q->debugfs_mutex); + blk_trace_shutdown(q); + debugfs_remove_recursive(q->debugfs_dir); + q->debugfs_dir = NULL; + q->sched_debugfs_dir = NULL; + q->rqos_debugfs_dir = NULL; + mutex_unlock(&q->debugfs_mutex); +} + +/** + * blk_register_queue - register a block layer queue with sysfs + * @disk: Disk of which the request queue should be registered with sysfs. + */ +int blk_register_queue(struct gendisk *disk) +{ + struct request_queue *q = disk->queue; + int ret; + + mutex_lock(&q->sysfs_dir_lock); + kobject_init(&disk->queue_kobj, &blk_queue_ktype); + ret = kobject_add(&disk->queue_kobj, &disk_to_dev(disk)->kobj, "queue"); + if (ret < 0) + goto out_put_queue_kobj; + + if (queue_is_mq(q)) { + ret = blk_mq_sysfs_register(disk); + if (ret) + goto out_put_queue_kobj; + } + mutex_lock(&q->sysfs_lock); + + mutex_lock(&q->debugfs_mutex); + q->debugfs_dir = debugfs_create_dir(disk->disk_name, blk_debugfs_root); + if (queue_is_mq(q)) + blk_mq_debugfs_register(q); + mutex_unlock(&q->debugfs_mutex); + + ret = disk_register_independent_access_ranges(disk); + if (ret) + goto out_debugfs_remove; + + if (q->elevator) { + ret = elv_register_queue(q, false); + if (ret) + goto out_unregister_ia_ranges; + } + + ret = blk_crypto_sysfs_register(disk); + if (ret) + goto out_elv_unregister; + + blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q); + wbt_enable_default(disk); + blk_throtl_register(disk); + + /* Now everything is ready and send out KOBJ_ADD uevent */ + kobject_uevent(&disk->queue_kobj, KOBJ_ADD); + if (q->elevator) + kobject_uevent(&q->elevator->kobj, KOBJ_ADD); + mutex_unlock(&q->sysfs_lock); + mutex_unlock(&q->sysfs_dir_lock); + + /* + * SCSI probing may synchronously create and destroy a lot of + * request_queues for non-existent devices. Shutting down a fully + * functional queue takes measureable wallclock time as RCU grace + * periods are involved. To avoid excessive latency in these + * cases, a request_queue starts out in a degraded mode which is + * faster to shut down and is made fully functional here as + * request_queues for non-existent devices never get registered. + */ + if (!blk_queue_init_done(q)) { + blk_queue_flag_set(QUEUE_FLAG_INIT_DONE, q); + percpu_ref_switch_to_percpu(&q->q_usage_counter); + } + + return ret; + +out_elv_unregister: + elv_unregister_queue(q); +out_unregister_ia_ranges: + disk_unregister_independent_access_ranges(disk); +out_debugfs_remove: + blk_debugfs_remove(disk); + mutex_unlock(&q->sysfs_lock); +out_put_queue_kobj: + kobject_put(&disk->queue_kobj); + mutex_unlock(&q->sysfs_dir_lock); + return ret; +} + +/** + * blk_unregister_queue - counterpart of blk_register_queue() + * @disk: Disk of which the request queue should be unregistered from sysfs. + * + * Note: the caller is responsible for guaranteeing that this function is called + * after blk_register_queue() has finished. + */ +void blk_unregister_queue(struct gendisk *disk) +{ + struct request_queue *q = disk->queue; + + if (WARN_ON(!q)) + return; + + /* Return early if disk->queue was never registered. */ + if (!blk_queue_registered(q)) + return; + + /* + * Since sysfs_remove_dir() prevents adding new directory entries + * before removal of existing entries starts, protect against + * concurrent elv_iosched_store() calls. + */ + mutex_lock(&q->sysfs_lock); + blk_queue_flag_clear(QUEUE_FLAG_REGISTERED, q); + mutex_unlock(&q->sysfs_lock); + + mutex_lock(&q->sysfs_dir_lock); + /* + * Remove the sysfs attributes before unregistering the queue data + * structures that can be modified through sysfs. + */ + if (queue_is_mq(q)) + blk_mq_sysfs_unregister(disk); + blk_crypto_sysfs_unregister(disk); + + mutex_lock(&q->sysfs_lock); + elv_unregister_queue(q); + disk_unregister_independent_access_ranges(disk); + mutex_unlock(&q->sysfs_lock); + + /* Now that we've deleted all child objects, we can delete the queue. */ + kobject_uevent(&disk->queue_kobj, KOBJ_REMOVE); + kobject_del(&disk->queue_kobj); + mutex_unlock(&q->sysfs_dir_lock); + + blk_debugfs_remove(disk); +} |