From 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Tue, 21 Feb 2023 18:24:12 -0800 Subject: Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - Add dedicated kmem_cache for typical/small skb->head, avoid having to access struct page at kfree time, and improve memory use. - Introduce sysctl to set default RPS configuration for new netdevs. - Define Netlink protocol specification format which can be used to describe messages used by each family and auto-generate parsers. Add tools for generating kernel data structures and uAPI headers. - Expose all net/core sysctls inside netns. - Remove 4s sleep in netpoll if carrier is instantly detected on boot. - Add configurable limit of MDB entries per port, and port-vlan. - Continue populating drop reasons throughout the stack. - Retire a handful of legacy Qdiscs and classifiers. Protocols: - Support IPv4 big TCP (TSO frames larger than 64kB). - Add IP_LOCAL_PORT_RANGE socket option, to control local port range on socket by socket basis. - Track and report in procfs number of MPTCP sockets used. - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path manager. - IPv6: don't check net.ipv6.route.max_size and rely on garbage collection to free memory (similarly to IPv4). - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986). - ICMP: add per-rate limit counters. - Add support for user scanning requests in ieee802154. - Remove static WEP support. - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate reporting. - WiFi 7 EHT channel puncturing support (client & AP). BPF: - Add a rbtree data structure following the "next-gen data structure" precedent set by recently added linked list, that is, by using kfunc + kptr instead of adding a new BPF map type. - Expose XDP hints via kfuncs with initial support for RX hash and timestamp metadata. - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better support decap on GRE tunnel devices not operating in collect metadata. - Improve x86 JIT's codegen for PROBE_MEM runtime error checks. - Remove the need for trace_printk_lock for bpf_trace_printk and bpf_trace_vprintk helpers. - Extend libbpf's bpf_tracing.h support for tracing arguments of kprobes/uprobes and syscall as a special case. - Significantly reduce the search time for module symbols by livepatch and BPF. - Enable cpumasks to be used as kptrs, which is useful for tracing programs tracking which tasks end up running on which CPUs in different time intervals. - Add support for BPF trampoline on s390x and riscv64. - Add capability to export the XDP features supported by the NIC. - Add __bpf_kfunc tag for marking kernel functions as kfuncs. - Add cgroup.memory=nobpf kernel parameter option to disable BPF memory accounting for container environments. Netfilter: - Remove the CLUSTERIP target. It has been marked as obsolete for years, and we still have WARN splats wrt races of the out-of-band /proc interface installed by this target. - Add 'destroy' commands to nf_tables. They are identical to the existing 'delete' commands, but do not return an error if the referenced object (set, chain, rule...) did not exist. Driver API: - Improve cpumask_local_spread() locality to help NICs set the right IRQ affinity on AMD platforms. - Separate C22 and C45 MDIO bus transactions more clearly. - Introduce new DCB table to control DSCP rewrite on egress. - Support configuration of Physical Layer Collision Avoidance (PLCA) Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of shared medium Ethernet. - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing preemption of low priority frames by high priority frames. - Add support for controlling MACSec offload using netlink SET. - Rework devlink instance refcounts to allow registration and de-registration under the instance lock. Split the code into multiple files, drop some of the unnecessarily granular locks and factor out common parts of netlink operation handling. - Add TX frame aggregation parameters (for USB drivers). - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning messages with notifications for debug. - Allow offloading of UDP NEW connections via act_ct. - Add support for per action HW stats in TC. - Support hardware miss to TC action (continue processing in SW from a specific point in the action chain). - Warn if old Wireless Extension user space interface is used with modern cfg80211/mac80211 drivers. Do not support Wireless Extensions for Wi-Fi 7 devices at all. Everyone should switch to using nl80211 interface instead. - Improve the CAN bit timing configuration. Use extack to return error messages directly to user space, update the SJW handling, including the definition of a new default value that will benefit CAN-FD controllers, by increasing their oscillator tolerance. New hardware / drivers: - Ethernet: - nVidia BlueField-3 support (control traffic driver) - Ethernet support for imx93 SoCs - Motorcomm yt8531 gigabit Ethernet PHY - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA) - Microchip LAN8841 PHY (incl. cable diagnostics and PTP) - Amlogic gxl MDIO mux - WiFi: - RealTek RTL8188EU (rtl8xxxu) - Qualcomm Wi-Fi 7 devices (ath12k) - CAN: - Renesas R-Car V4H Drivers: - Bluetooth: - Set Per Platform Antenna Gain (PPAG) for Intel controllers. - Ethernet NICs: - Intel (1G, igc): - support TSN / Qbv / packet scheduling features of i226 model - Intel (100G, ice): - use GNSS subsystem instead of TTY - multi-buffer XDP support - extend support for GPIO pins to E823 devices - nVidia/Mellanox: - update the shared buffer configuration on PFC commands - implement PTP adjphase function for HW offset control - TC support for Geneve and GRE with VF tunnel offload - more efficient crypto key management method - multi-port eswitch support - Netronome/Corigine: - add DCB IEEE support - support IPsec offloading for NFP3800 - Freescale/NXP (enetc): - support XDP_REDIRECT for XDP non-linear buffers - improve reconfig, avoid link flap and waiting for idle - support MAC Merge layer - Other NICs: - sfc/ef100: add basic devlink support for ef100 - ionic: rx_push mode operation (writing descriptors via MMIO) - bnxt: use the auxiliary bus abstraction for RDMA - r8169: disable ASPM and reset bus in case of tx timeout - cpsw: support QSGMII mode for J721e CPSW9G - cpts: support pulse-per-second output - ngbe: add an mdio bus driver - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing - r8152: handle devices with FW with NCM support - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation - virtio-net: support multi buffer XDP - virtio/vsock: replace virtio_vsock_pkt with sk_buff - tsnep: XDP support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add support for latency TLV (in FW control messages) - Microchip (sparx5): - separate explicit and implicit traffic forwarding rules, make the implicit rules always active - add support for egress DSCP rewrite - IS0 VCAP support (Ingress Classification) - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS etc.) - ES2 VCAP support (Egress Access Control) - support for Per-Stream Filtering and Policing (802.1Q, 8.6.5.1) - Ethernet embedded switches: - Marvell (mv88e6xxx): - add MAB (port auth) offload support - enable PTP receive for mv88e6390 - NXP (ocelot): - support MAC Merge layer - support for the the vsc7512 internal copper phys - Microchip: - lan9303: convert to PHYLINK - lan966x: support TC flower filter statistics - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x - lan937x: support Credit Based Shaper configuration - ksz9477: support Energy Efficient Ethernet - other: - qca8k: convert to regmap read/write API, use bulk operations - rswitch: Improve TX timestamp accuracy - Intel WiFi (iwlwifi): - EHT (Wi-Fi 7) rate reporting - STEP equalizer support: transfer some STEP (connection to radio on platforms with integrated wifi) related parameters from the BIOS to the firmware. - Qualcomm 802.11ax WiFi (ath11k): - IPQ5018 support - Fine Timing Measurement (FTM) responder role support - channel 177 support - MediaTek WiFi (mt76): - per-PHY LED support - mt7996: EHT (Wi-Fi 7) support - Wireless Ethernet Dispatch (WED) reset support - switch to using page pool allocator - RealTek WiFi (rtw89): - support new version of Bluetooth co-existance - Mobile: - rmnet: support TX aggregation" * tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits) page_pool: add a comment explaining the fragment counter usage net: ethtool: fix __ethtool_dev_mm_supported() implementation ethtool: pse-pd: Fix double word in comments xsk: add linux/vmalloc.h to xsk.c sefltests: netdevsim: wait for devlink instance after netns removal selftest: fib_tests: Always cleanup before exit net/mlx5e: Align IPsec ASO result memory to be as required by hardware net/mlx5e: TC, Set CT miss to the specific ct action instance net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG net/mlx5: Refactor tc miss handling to a single function net/mlx5: Kconfig: Make tc offload depend on tc skb extension net/sched: flower: Support hardware miss to tc action net/sched: flower: Move filter handle initialization earlier net/sched: cls_api: Support hardware miss to tc action net/sched: Rename user cookie and act cookie sfc: fix builds without CONFIG_RTC_LIB sfc: clean up some inconsistent indentings net/mlx4_en: Introduce flexible array to silence overflow warning net: lan966x: Fix possible deadlock inside PTP net/ulp: Remove redundant ->clone() test in inet_clone_ulp(). ... --- Documentation/driver-api/media/v4l2-videobuf.rst | 403 +++++++++++++++++++++++ 1 file changed, 403 insertions(+) create mode 100644 Documentation/driver-api/media/v4l2-videobuf.rst (limited to 'Documentation/driver-api/media/v4l2-videobuf.rst') diff --git a/Documentation/driver-api/media/v4l2-videobuf.rst b/Documentation/driver-api/media/v4l2-videobuf.rst new file mode 100644 index 000000000..4b1d84eef --- /dev/null +++ b/Documentation/driver-api/media/v4l2-videobuf.rst @@ -0,0 +1,403 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _vb_framework: + +Videobuf Framework +================== + +Author: Jonathan Corbet + +Current as of 2.6.33 + +.. note:: + + The videobuf framework was deprecated in favor of videobuf2. Shouldn't + be used on new drivers. + +Introduction +------------ + +The videobuf layer functions as a sort of glue layer between a V4L2 driver +and user space. It handles the allocation and management of buffers for +the storage of video frames. There is a set of functions which can be used +to implement many of the standard POSIX I/O system calls, including read(), +poll(), and, happily, mmap(). Another set of functions can be used to +implement the bulk of the V4L2 ioctl() calls related to streaming I/O, +including buffer allocation, queueing and dequeueing, and streaming +control. Using videobuf imposes a few design decisions on the driver +author, but the payback comes in the form of reduced code in the driver and +a consistent implementation of the V4L2 user-space API. + +Buffer types +------------ + +Not all video devices use the same kind of buffers. In fact, there are (at +least) three common variations: + + - Buffers which are scattered in both the physical and (kernel) virtual + address spaces. (Almost) all user-space buffers are like this, but it + makes great sense to allocate kernel-space buffers this way as well when + it is possible. Unfortunately, it is not always possible; working with + this kind of buffer normally requires hardware which can do + scatter/gather DMA operations. + + - Buffers which are physically scattered, but which are virtually + contiguous; buffers allocated with vmalloc(), in other words. These + buffers are just as hard to use for DMA operations, but they can be + useful in situations where DMA is not available but virtually-contiguous + buffers are convenient. + + - Buffers which are physically contiguous. Allocation of this kind of + buffer can be unreliable on fragmented systems, but simpler DMA + controllers cannot deal with anything else. + +Videobuf can work with all three types of buffers, but the driver author +must pick one at the outset and design the driver around that decision. + +[It's worth noting that there's a fourth kind of buffer: "overlay" buffers +which are located within the system's video memory. The overlay +functionality is considered to be deprecated for most use, but it still +shows up occasionally in system-on-chip drivers where the performance +benefits merit the use of this technique. Overlay buffers can be handled +as a form of scattered buffer, but there are very few implementations in +the kernel and a description of this technique is currently beyond the +scope of this document.] + +Data structures, callbacks, and initialization +---------------------------------------------- + +Depending on which type of buffers are being used, the driver should +include one of the following files: + +.. code-block:: none + + /* Physically scattered */ + /* vmalloc() buffers */ + /* Physically contiguous */ + +The driver's data structure describing a V4L2 device should include a +struct videobuf_queue instance for the management of the buffer queue, +along with a list_head for the queue of available buffers. There will also +need to be an interrupt-safe spinlock which is used to protect (at least) +the queue. + +The next step is to write four simple callbacks to help videobuf deal with +the management of buffers: + +.. code-block:: none + + struct videobuf_queue_ops { + int (*buf_setup)(struct videobuf_queue *q, + unsigned int *count, unsigned int *size); + int (*buf_prepare)(struct videobuf_queue *q, + struct videobuf_buffer *vb, + enum v4l2_field field); + void (*buf_queue)(struct videobuf_queue *q, + struct videobuf_buffer *vb); + void (*buf_release)(struct videobuf_queue *q, + struct videobuf_buffer *vb); + }; + +buf_setup() is called early in the I/O process, when streaming is being +initiated; its purpose is to tell videobuf about the I/O stream. The count +parameter will be a suggested number of buffers to use; the driver should +check it for rationality and adjust it if need be. As a practical rule, a +minimum of two buffers are needed for proper streaming, and there is +usually a maximum (which cannot exceed 32) which makes sense for each +device. The size parameter should be set to the expected (maximum) size +for each frame of data. + +Each buffer (in the form of a struct videobuf_buffer pointer) will be +passed to buf_prepare(), which should set the buffer's size, width, height, +and field fields properly. If the buffer's state field is +VIDEOBUF_NEEDS_INIT, the driver should pass it to: + +.. code-block:: none + + int videobuf_iolock(struct videobuf_queue* q, struct videobuf_buffer *vb, + struct v4l2_framebuffer *fbuf); + +Among other things, this call will usually allocate memory for the buffer. +Finally, the buf_prepare() function should set the buffer's state to +VIDEOBUF_PREPARED. + +When a buffer is queued for I/O, it is passed to buf_queue(), which should +put it onto the driver's list of available buffers and set its state to +VIDEOBUF_QUEUED. Note that this function is called with the queue spinlock +held; if it tries to acquire it as well things will come to a screeching +halt. Yes, this is the voice of experience. Note also that videobuf may +wait on the first buffer in the queue; placing other buffers in front of it +could again gum up the works. So use list_add_tail() to enqueue buffers. + +Finally, buf_release() is called when a buffer is no longer intended to be +used. The driver should ensure that there is no I/O active on the buffer, +then pass it to the appropriate free routine(s): + +.. code-block:: none + + /* Scatter/gather drivers */ + int videobuf_dma_unmap(struct videobuf_queue *q, + struct videobuf_dmabuf *dma); + int videobuf_dma_free(struct videobuf_dmabuf *dma); + + /* vmalloc drivers */ + void videobuf_vmalloc_free (struct videobuf_buffer *buf); + + /* Contiguous drivers */ + void videobuf_dma_contig_free(struct videobuf_queue *q, + struct videobuf_buffer *buf); + +One way to ensure that a buffer is no longer under I/O is to pass it to: + +.. code-block:: none + + int videobuf_waiton(struct videobuf_buffer *vb, int non_blocking, int intr); + +Here, vb is the buffer, non_blocking indicates whether non-blocking I/O +should be used (it should be zero in the buf_release() case), and intr +controls whether an interruptible wait is used. + +File operations +--------------- + +At this point, much of the work is done; much of the rest is slipping +videobuf calls into the implementation of the other driver callbacks. The +first step is in the open() function, which must initialize the +videobuf queue. The function to use depends on the type of buffer used: + +.. code-block:: none + + void videobuf_queue_sg_init(struct videobuf_queue *q, + struct videobuf_queue_ops *ops, + struct device *dev, + spinlock_t *irqlock, + enum v4l2_buf_type type, + enum v4l2_field field, + unsigned int msize, + void *priv); + + void videobuf_queue_vmalloc_init(struct videobuf_queue *q, + struct videobuf_queue_ops *ops, + struct device *dev, + spinlock_t *irqlock, + enum v4l2_buf_type type, + enum v4l2_field field, + unsigned int msize, + void *priv); + + void videobuf_queue_dma_contig_init(struct videobuf_queue *q, + struct videobuf_queue_ops *ops, + struct device *dev, + spinlock_t *irqlock, + enum v4l2_buf_type type, + enum v4l2_field field, + unsigned int msize, + void *priv); + +In each case, the parameters are the same: q is the queue structure for the +device, ops is the set of callbacks as described above, dev is the device +structure for this video device, irqlock is an interrupt-safe spinlock to +protect access to the data structures, type is the buffer type used by the +device (cameras will use V4L2_BUF_TYPE_VIDEO_CAPTURE, for example), field +describes which field is being captured (often V4L2_FIELD_NONE for +progressive devices), msize is the size of any containing structure used +around struct videobuf_buffer, and priv is a private data pointer which +shows up in the priv_data field of struct videobuf_queue. Note that these +are void functions which, evidently, are immune to failure. + +V4L2 capture drivers can be written to support either of two APIs: the +read() system call and the rather more complicated streaming mechanism. As +a general rule, it is necessary to support both to ensure that all +applications have a chance of working with the device. Videobuf makes it +easy to do that with the same code. To implement read(), the driver need +only make a call to one of: + +.. code-block:: none + + ssize_t videobuf_read_one(struct videobuf_queue *q, + char __user *data, size_t count, + loff_t *ppos, int nonblocking); + + ssize_t videobuf_read_stream(struct videobuf_queue *q, + char __user *data, size_t count, + loff_t *ppos, int vbihack, int nonblocking); + +Either one of these functions will read frame data into data, returning the +amount actually read; the difference is that videobuf_read_one() will only +read a single frame, while videobuf_read_stream() will read multiple frames +if they are needed to satisfy the count requested by the application. A +typical driver read() implementation will start the capture engine, call +one of the above functions, then stop the engine before returning (though a +smarter implementation might leave the engine running for a little while in +anticipation of another read() call happening in the near future). + +The poll() function can usually be implemented with a direct call to: + +.. code-block:: none + + unsigned int videobuf_poll_stream(struct file *file, + struct videobuf_queue *q, + poll_table *wait); + +Note that the actual wait queue eventually used will be the one associated +with the first available buffer. + +When streaming I/O is done to kernel-space buffers, the driver must support +the mmap() system call to enable user space to access the data. In many +V4L2 drivers, the often-complex mmap() implementation simplifies to a +single call to: + +.. code-block:: none + + int videobuf_mmap_mapper(struct videobuf_queue *q, + struct vm_area_struct *vma); + +Everything else is handled by the videobuf code. + +The release() function requires two separate videobuf calls: + +.. code-block:: none + + void videobuf_stop(struct videobuf_queue *q); + int videobuf_mmap_free(struct videobuf_queue *q); + +The call to videobuf_stop() terminates any I/O in progress - though it is +still up to the driver to stop the capture engine. The call to +videobuf_mmap_free() will ensure that all buffers have been unmapped; if +so, they will all be passed to the buf_release() callback. If buffers +remain mapped, videobuf_mmap_free() returns an error code instead. The +purpose is clearly to cause the closing of the file descriptor to fail if +buffers are still mapped, but every driver in the 2.6.32 kernel cheerfully +ignores its return value. + +ioctl() operations +------------------ + +The V4L2 API includes a very long list of driver callbacks to respond to +the many ioctl() commands made available to user space. A number of these +- those associated with streaming I/O - turn almost directly into videobuf +calls. The relevant helper functions are: + +.. code-block:: none + + int videobuf_reqbufs(struct videobuf_queue *q, + struct v4l2_requestbuffers *req); + int videobuf_querybuf(struct videobuf_queue *q, struct v4l2_buffer *b); + int videobuf_qbuf(struct videobuf_queue *q, struct v4l2_buffer *b); + int videobuf_dqbuf(struct videobuf_queue *q, struct v4l2_buffer *b, + int nonblocking); + int videobuf_streamon(struct videobuf_queue *q); + int videobuf_streamoff(struct videobuf_queue *q); + +So, for example, a VIDIOC_REQBUFS call turns into a call to the driver's +vidioc_reqbufs() callback which, in turn, usually only needs to locate the +proper struct videobuf_queue pointer and pass it to videobuf_reqbufs(). +These support functions can replace a great deal of buffer management +boilerplate in a lot of V4L2 drivers. + +The vidioc_streamon() and vidioc_streamoff() functions will be a bit more +complex, of course, since they will also need to deal with starting and +stopping the capture engine. + +Buffer allocation +----------------- + +Thus far, we have talked about buffers, but have not looked at how they are +allocated. The scatter/gather case is the most complex on this front. For +allocation, the driver can leave buffer allocation entirely up to the +videobuf layer; in this case, buffers will be allocated as anonymous +user-space pages and will be very scattered indeed. If the application is +using user-space buffers, no allocation is needed; the videobuf layer will +take care of calling get_user_pages() and filling in the scatterlist array. + +If the driver needs to do its own memory allocation, it should be done in +the vidioc_reqbufs() function, *after* calling videobuf_reqbufs(). The +first step is a call to: + +.. code-block:: none + + struct videobuf_dmabuf *videobuf_to_dma(struct videobuf_buffer *buf); + +The returned videobuf_dmabuf structure (defined in +) includes a couple of relevant fields: + +.. code-block:: none + + struct scatterlist *sglist; + int sglen; + +The driver must allocate an appropriately-sized scatterlist array and +populate it with pointers to the pieces of the allocated buffer; sglen +should be set to the length of the array. + +Drivers using the vmalloc() method need not (and cannot) concern themselves +with buffer allocation at all; videobuf will handle those details. The +same is normally true of contiguous-DMA drivers as well; videobuf will +allocate the buffers (with dma_alloc_coherent()) when it sees fit. That +means that these drivers may be trying to do high-order allocations at any +time, an operation which is not always guaranteed to work. Some drivers +play tricks by allocating DMA space at system boot time; videobuf does not +currently play well with those drivers. + +As of 2.6.31, contiguous-DMA drivers can work with a user-supplied buffer, +as long as that buffer is physically contiguous. Normal user-space +allocations will not meet that criterion, but buffers obtained from other +kernel drivers, or those contained within huge pages, will work with these +drivers. + +Filling the buffers +------------------- + +The final part of a videobuf implementation has no direct callback - it's +the portion of the code which actually puts frame data into the buffers, +usually in response to interrupts from the device. For all types of +drivers, this process works approximately as follows: + + - Obtain the next available buffer and make sure that somebody is actually + waiting for it. + + - Get a pointer to the memory and put video data there. + + - Mark the buffer as done and wake up the process waiting for it. + +Step (1) above is done by looking at the driver-managed list_head structure +- the one which is filled in the buf_queue() callback. Because starting +the engine and enqueueing buffers are done in separate steps, it's possible +for the engine to be running without any buffers available - in the +vmalloc() case especially. So the driver should be prepared for the list +to be empty. It is equally possible that nobody is yet interested in the +buffer; the driver should not remove it from the list or fill it until a +process is waiting on it. That test can be done by examining the buffer's +done field (a wait_queue_head_t structure) with waitqueue_active(). + +A buffer's state should be set to VIDEOBUF_ACTIVE before being mapped for +DMA; that ensures that the videobuf layer will not try to do anything with +it while the device is transferring data. + +For scatter/gather drivers, the needed memory pointers will be found in the +scatterlist structure described above. Drivers using the vmalloc() method +can get a memory pointer with: + +.. code-block:: none + + void *videobuf_to_vmalloc(struct videobuf_buffer *buf); + +For contiguous DMA drivers, the function to use is: + +.. code-block:: none + + dma_addr_t videobuf_to_dma_contig(struct videobuf_buffer *buf); + +The contiguous DMA API goes out of its way to hide the kernel-space address +of the DMA buffer from drivers. + +The final step is to set the size field of the relevant videobuf_buffer +structure to the actual size of the captured image, set state to +VIDEOBUF_DONE, then call wake_up() on the done queue. At this point, the +buffer is owned by the videobuf layer and the driver should not touch it +again. + +Developers who are interested in more information can go into the relevant +header files; there are a few low-level functions declared there which have +not been talked about here. Note also that all of these calls are exported +GPL-only, so they will not be available to non-GPL kernel modules. -- cgit v1.2.3