aboutsummaryrefslogtreecommitdiff
path: root/Documentation/sound/designs/compress-offload.rst
diff options
context:
space:
mode:
authorLibravatar Linus Torvalds <torvalds@linux-foundation.org>2023-02-21 18:24:12 -0800
committerLibravatar Linus Torvalds <torvalds@linux-foundation.org>2023-02-21 18:24:12 -0800
commit5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 (patch)
treecc5c2d0a898769fd59549594fedb3ee6f84e59a0 /Documentation/sound/designs/compress-offload.rst
downloadlinux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.tar.gz
linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.zip
Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextgrafted
Pull networking updates from Jakub Kicinski: "Core: - Add dedicated kmem_cache for typical/small skb->head, avoid having to access struct page at kfree time, and improve memory use. - Introduce sysctl to set default RPS configuration for new netdevs. - Define Netlink protocol specification format which can be used to describe messages used by each family and auto-generate parsers. Add tools for generating kernel data structures and uAPI headers. - Expose all net/core sysctls inside netns. - Remove 4s sleep in netpoll if carrier is instantly detected on boot. - Add configurable limit of MDB entries per port, and port-vlan. - Continue populating drop reasons throughout the stack. - Retire a handful of legacy Qdiscs and classifiers. Protocols: - Support IPv4 big TCP (TSO frames larger than 64kB). - Add IP_LOCAL_PORT_RANGE socket option, to control local port range on socket by socket basis. - Track and report in procfs number of MPTCP sockets used. - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path manager. - IPv6: don't check net.ipv6.route.max_size and rely on garbage collection to free memory (similarly to IPv4). - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986). - ICMP: add per-rate limit counters. - Add support for user scanning requests in ieee802154. - Remove static WEP support. - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate reporting. - WiFi 7 EHT channel puncturing support (client & AP). BPF: - Add a rbtree data structure following the "next-gen data structure" precedent set by recently added linked list, that is, by using kfunc + kptr instead of adding a new BPF map type. - Expose XDP hints via kfuncs with initial support for RX hash and timestamp metadata. - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better support decap on GRE tunnel devices not operating in collect metadata. - Improve x86 JIT's codegen for PROBE_MEM runtime error checks. - Remove the need for trace_printk_lock for bpf_trace_printk and bpf_trace_vprintk helpers. - Extend libbpf's bpf_tracing.h support for tracing arguments of kprobes/uprobes and syscall as a special case. - Significantly reduce the search time for module symbols by livepatch and BPF. - Enable cpumasks to be used as kptrs, which is useful for tracing programs tracking which tasks end up running on which CPUs in different time intervals. - Add support for BPF trampoline on s390x and riscv64. - Add capability to export the XDP features supported by the NIC. - Add __bpf_kfunc tag for marking kernel functions as kfuncs. - Add cgroup.memory=nobpf kernel parameter option to disable BPF memory accounting for container environments. Netfilter: - Remove the CLUSTERIP target. It has been marked as obsolete for years, and we still have WARN splats wrt races of the out-of-band /proc interface installed by this target. - Add 'destroy' commands to nf_tables. They are identical to the existing 'delete' commands, but do not return an error if the referenced object (set, chain, rule...) did not exist. Driver API: - Improve cpumask_local_spread() locality to help NICs set the right IRQ affinity on AMD platforms. - Separate C22 and C45 MDIO bus transactions more clearly. - Introduce new DCB table to control DSCP rewrite on egress. - Support configuration of Physical Layer Collision Avoidance (PLCA) Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of shared medium Ethernet. - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing preemption of low priority frames by high priority frames. - Add support for controlling MACSec offload using netlink SET. - Rework devlink instance refcounts to allow registration and de-registration under the instance lock. Split the code into multiple files, drop some of the unnecessarily granular locks and factor out common parts of netlink operation handling. - Add TX frame aggregation parameters (for USB drivers). - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning messages with notifications for debug. - Allow offloading of UDP NEW connections via act_ct. - Add support for per action HW stats in TC. - Support hardware miss to TC action (continue processing in SW from a specific point in the action chain). - Warn if old Wireless Extension user space interface is used with modern cfg80211/mac80211 drivers. Do not support Wireless Extensions for Wi-Fi 7 devices at all. Everyone should switch to using nl80211 interface instead. - Improve the CAN bit timing configuration. Use extack to return error messages directly to user space, update the SJW handling, including the definition of a new default value that will benefit CAN-FD controllers, by increasing their oscillator tolerance. New hardware / drivers: - Ethernet: - nVidia BlueField-3 support (control traffic driver) - Ethernet support for imx93 SoCs - Motorcomm yt8531 gigabit Ethernet PHY - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA) - Microchip LAN8841 PHY (incl. cable diagnostics and PTP) - Amlogic gxl MDIO mux - WiFi: - RealTek RTL8188EU (rtl8xxxu) - Qualcomm Wi-Fi 7 devices (ath12k) - CAN: - Renesas R-Car V4H Drivers: - Bluetooth: - Set Per Platform Antenna Gain (PPAG) for Intel controllers. - Ethernet NICs: - Intel (1G, igc): - support TSN / Qbv / packet scheduling features of i226 model - Intel (100G, ice): - use GNSS subsystem instead of TTY - multi-buffer XDP support - extend support for GPIO pins to E823 devices - nVidia/Mellanox: - update the shared buffer configuration on PFC commands - implement PTP adjphase function for HW offset control - TC support for Geneve and GRE with VF tunnel offload - more efficient crypto key management method - multi-port eswitch support - Netronome/Corigine: - add DCB IEEE support - support IPsec offloading for NFP3800 - Freescale/NXP (enetc): - support XDP_REDIRECT for XDP non-linear buffers - improve reconfig, avoid link flap and waiting for idle - support MAC Merge layer - Other NICs: - sfc/ef100: add basic devlink support for ef100 - ionic: rx_push mode operation (writing descriptors via MMIO) - bnxt: use the auxiliary bus abstraction for RDMA - r8169: disable ASPM and reset bus in case of tx timeout - cpsw: support QSGMII mode for J721e CPSW9G - cpts: support pulse-per-second output - ngbe: add an mdio bus driver - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing - r8152: handle devices with FW with NCM support - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation - virtio-net: support multi buffer XDP - virtio/vsock: replace virtio_vsock_pkt with sk_buff - tsnep: XDP support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add support for latency TLV (in FW control messages) - Microchip (sparx5): - separate explicit and implicit traffic forwarding rules, make the implicit rules always active - add support for egress DSCP rewrite - IS0 VCAP support (Ingress Classification) - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS etc.) - ES2 VCAP support (Egress Access Control) - support for Per-Stream Filtering and Policing (802.1Q, 8.6.5.1) - Ethernet embedded switches: - Marvell (mv88e6xxx): - add MAB (port auth) offload support - enable PTP receive for mv88e6390 - NXP (ocelot): - support MAC Merge layer - support for the the vsc7512 internal copper phys - Microchip: - lan9303: convert to PHYLINK - lan966x: support TC flower filter statistics - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x - lan937x: support Credit Based Shaper configuration - ksz9477: support Energy Efficient Ethernet - other: - qca8k: convert to regmap read/write API, use bulk operations - rswitch: Improve TX timestamp accuracy - Intel WiFi (iwlwifi): - EHT (Wi-Fi 7) rate reporting - STEP equalizer support: transfer some STEP (connection to radio on platforms with integrated wifi) related parameters from the BIOS to the firmware. - Qualcomm 802.11ax WiFi (ath11k): - IPQ5018 support - Fine Timing Measurement (FTM) responder role support - channel 177 support - MediaTek WiFi (mt76): - per-PHY LED support - mt7996: EHT (Wi-Fi 7) support - Wireless Ethernet Dispatch (WED) reset support - switch to using page pool allocator - RealTek WiFi (rtw89): - support new version of Bluetooth co-existance - Mobile: - rmnet: support TX aggregation" * tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits) page_pool: add a comment explaining the fragment counter usage net: ethtool: fix __ethtool_dev_mm_supported() implementation ethtool: pse-pd: Fix double word in comments xsk: add linux/vmalloc.h to xsk.c sefltests: netdevsim: wait for devlink instance after netns removal selftest: fib_tests: Always cleanup before exit net/mlx5e: Align IPsec ASO result memory to be as required by hardware net/mlx5e: TC, Set CT miss to the specific ct action instance net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG net/mlx5: Refactor tc miss handling to a single function net/mlx5: Kconfig: Make tc offload depend on tc skb extension net/sched: flower: Support hardware miss to tc action net/sched: flower: Move filter handle initialization earlier net/sched: cls_api: Support hardware miss to tc action net/sched: Rename user cookie and act cookie sfc: fix builds without CONFIG_RTC_LIB sfc: clean up some inconsistent indentings net/mlx4_en: Introduce flexible array to silence overflow warning net: lan966x: Fix possible deadlock inside PTP net/ulp: Remove redundant ->clone() test in inet_clone_ulp(). ...
Diffstat (limited to 'Documentation/sound/designs/compress-offload.rst')
-rw-r--r--Documentation/sound/designs/compress-offload.rst328
1 files changed, 328 insertions, 0 deletions
diff --git a/Documentation/sound/designs/compress-offload.rst b/Documentation/sound/designs/compress-offload.rst
new file mode 100644
index 000000000..935f325db
--- /dev/null
+++ b/Documentation/sound/designs/compress-offload.rst
@@ -0,0 +1,328 @@
+=========================
+ALSA Compress-Offload API
+=========================
+
+Pierre-Louis.Bossart <pierre-louis.bossart@linux.intel.com>
+
+Vinod Koul <vinod.koul@linux.intel.com>
+
+
+Overview
+========
+Since its early days, the ALSA API was defined with PCM support or
+constant bitrates payloads such as IEC61937 in mind. Arguments and
+returned values in frames are the norm, making it a challenge to
+extend the existing API to compressed data streams.
+
+In recent years, audio digital signal processors (DSP) were integrated
+in system-on-chip designs, and DSPs are also integrated in audio
+codecs. Processing compressed data on such DSPs results in a dramatic
+reduction of power consumption compared to host-based
+processing. Support for such hardware has not been very good in Linux,
+mostly because of a lack of a generic API available in the mainline
+kernel.
+
+Rather than requiring a compatibility break with an API change of the
+ALSA PCM interface, a new 'Compressed Data' API is introduced to
+provide a control and data-streaming interface for audio DSPs.
+
+The design of this API was inspired by the 2-year experience with the
+Intel Moorestown SOC, with many corrections required to upstream the
+API in the mainline kernel instead of the staging tree and make it
+usable by others.
+
+
+Requirements
+============
+The main requirements are:
+
+- separation between byte counts and time. Compressed formats may have
+ a header per file, per frame, or no header at all. The payload size
+ may vary from frame-to-frame. As a result, it is not possible to
+ estimate reliably the duration of audio buffers when handling
+ compressed data. Dedicated mechanisms are required to allow for
+ reliable audio-video synchronization, which requires precise
+ reporting of the number of samples rendered at any given time.
+
+- Handling of multiple formats. PCM data only requires a specification
+ of the sampling rate, number of channels and bits per sample. In
+ contrast, compressed data comes in a variety of formats. Audio DSPs
+ may also provide support for a limited number of audio encoders and
+ decoders embedded in firmware, or may support more choices through
+ dynamic download of libraries.
+
+- Focus on main formats. This API provides support for the most
+ popular formats used for audio and video capture and playback. It is
+ likely that as audio compression technology advances, new formats
+ will be added.
+
+- Handling of multiple configurations. Even for a given format like
+ AAC, some implementations may support AAC multichannel but HE-AAC
+ stereo. Likewise WMA10 level M3 may require too much memory and cpu
+ cycles. The new API needs to provide a generic way of listing these
+ formats.
+
+- Rendering/Grabbing only. This API does not provide any means of
+ hardware acceleration, where PCM samples are provided back to
+ user-space for additional processing. This API focuses instead on
+ streaming compressed data to a DSP, with the assumption that the
+ decoded samples are routed to a physical output or logical back-end.
+
+- Complexity hiding. Existing user-space multimedia frameworks all
+ have existing enums/structures for each compressed format. This new
+ API assumes the existence of a platform-specific compatibility layer
+ to expose, translate and make use of the capabilities of the audio
+ DSP, eg. Android HAL or PulseAudio sinks. By construction, regular
+ applications are not supposed to make use of this API.
+
+
+Design
+======
+The new API shares a number of concepts with the PCM API for flow
+control. Start, pause, resume, drain and stop commands have the same
+semantics no matter what the content is.
+
+The concept of memory ring buffer divided in a set of fragments is
+borrowed from the ALSA PCM API. However, only sizes in bytes can be
+specified.
+
+Seeks/trick modes are assumed to be handled by the host.
+
+The notion of rewinds/forwards is not supported. Data committed to the
+ring buffer cannot be invalidated, except when dropping all buffers.
+
+The Compressed Data API does not make any assumptions on how the data
+is transmitted to the audio DSP. DMA transfers from main memory to an
+embedded audio cluster or to a SPI interface for external DSPs are
+possible. As in the ALSA PCM case, a core set of routines is exposed;
+each driver implementer will have to write support for a set of
+mandatory routines and possibly make use of optional ones.
+
+The main additions are
+
+get_caps
+ This routine returns the list of audio formats supported. Querying the
+ codecs on a capture stream will return encoders, decoders will be
+ listed for playback streams.
+
+get_codec_caps
+ For each codec, this routine returns a list of
+ capabilities. The intent is to make sure all the capabilities
+ correspond to valid settings, and to minimize the risks of
+ configuration failures. For example, for a complex codec such as AAC,
+ the number of channels supported may depend on a specific profile. If
+ the capabilities were exposed with a single descriptor, it may happen
+ that a specific combination of profiles/channels/formats may not be
+ supported. Likewise, embedded DSPs have limited memory and cpu cycles,
+ it is likely that some implementations make the list of capabilities
+ dynamic and dependent on existing workloads. In addition to codec
+ settings, this routine returns the minimum buffer size handled by the
+ implementation. This information can be a function of the DMA buffer
+ sizes, the number of bytes required to synchronize, etc, and can be
+ used by userspace to define how much needs to be written in the ring
+ buffer before playback can start.
+
+set_params
+ This routine sets the configuration chosen for a specific codec. The
+ most important field in the parameters is the codec type; in most
+ cases decoders will ignore other fields, while encoders will strictly
+ comply to the settings
+
+get_params
+ This routines returns the actual settings used by the DSP. Changes to
+ the settings should remain the exception.
+
+get_timestamp
+ The timestamp becomes a multiple field structure. It lists the number
+ of bytes transferred, the number of samples processed and the number
+ of samples rendered/grabbed. All these values can be used to determine
+ the average bitrate, figure out if the ring buffer needs to be
+ refilled or the delay due to decoding/encoding/io on the DSP.
+
+Note that the list of codecs/profiles/modes was derived from the
+OpenMAX AL specification instead of reinventing the wheel.
+Modifications include:
+- Addition of FLAC and IEC formats
+- Merge of encoder/decoder capabilities
+- Profiles/modes listed as bitmasks to make descriptors more compact
+- Addition of set_params for decoders (missing in OpenMAX AL)
+- Addition of AMR/AMR-WB encoding modes (missing in OpenMAX AL)
+- Addition of format information for WMA
+- Addition of encoding options when required (derived from OpenMAX IL)
+- Addition of rateControlSupported (missing in OpenMAX AL)
+
+State Machine
+=============
+
+The compressed audio stream state machine is described below ::
+
+ +----------+
+ | |
+ | OPEN |
+ | |
+ +----------+
+ |
+ |
+ | compr_set_params()
+ |
+ v
+ compr_free() +----------+
+ +------------------------------------| |
+ | | SETUP |
+ | +-------------------------| |<-------------------------+
+ | | compr_write() +----------+ |
+ | | ^ |
+ | | | compr_drain_notify() |
+ | | | or |
+ | | | compr_stop() |
+ | | | |
+ | | +----------+ |
+ | | | | |
+ | | | DRAIN | |
+ | | | | |
+ | | +----------+ |
+ | | ^ |
+ | | | |
+ | | | compr_drain() |
+ | | | |
+ | v | |
+ | +----------+ +----------+ |
+ | | | compr_start() | | compr_stop() |
+ | | PREPARE |------------------->| RUNNING |--------------------------+
+ | | | | | |
+ | +----------+ +----------+ |
+ | | | ^ |
+ | |compr_free() | | |
+ | | compr_pause() | | compr_resume() |
+ | | | | |
+ | v v | |
+ | +----------+ +----------+ |
+ | | | | | compr_stop() |
+ +--->| FREE | | PAUSE |---------------------------+
+ | | | |
+ +----------+ +----------+
+
+
+Gapless Playback
+================
+When playing thru an album, the decoders have the ability to skip the encoder
+delay and padding and directly move from one track content to another. The end
+user can perceive this as gapless playback as we don't have silence while
+switching from one track to another
+
+Also, there might be low-intensity noises due to encoding. Perfect gapless is
+difficult to reach with all types of compressed data, but works fine with most
+music content. The decoder needs to know the encoder delay and encoder padding.
+So we need to pass this to DSP. This metadata is extracted from ID3/MP4 headers
+and are not present by default in the bitstream, hence the need for a new
+interface to pass this information to the DSP. Also DSP and userspace needs to
+switch from one track to another and start using data for second track.
+
+The main additions are:
+
+set_metadata
+ This routine sets the encoder delay and encoder padding. This can be used by
+ decoder to strip the silence. This needs to be set before the data in the track
+ is written.
+
+set_next_track
+ This routine tells DSP that metadata and write operation sent after this would
+ correspond to subsequent track
+
+partial drain
+ This is called when end of file is reached. The userspace can inform DSP that
+ EOF is reached and now DSP can start skipping padding delay. Also next write
+ data would belong to next track
+
+Sequence flow for gapless would be:
+- Open
+- Get caps / codec caps
+- Set params
+- Set metadata of the first track
+- Fill data of the first track
+- Trigger start
+- User-space finished sending all,
+- Indicate next track data by sending set_next_track
+- Set metadata of the next track
+- then call partial_drain to flush most of buffer in DSP
+- Fill data of the next track
+- DSP switches to second track
+
+(note: order for partial_drain and write for next track can be reversed as well)
+
+Gapless Playback SM
+===================
+
+For Gapless, we move from running state to partial drain and back, along
+with setting of meta_data and signalling for next track ::
+
+
+ +----------+
+ compr_drain_notify() | |
+ +------------------------>| RUNNING |
+ | | |
+ | +----------+
+ | |
+ | |
+ | | compr_next_track()
+ | |
+ | V
+ | +----------+
+ | | |
+ | |NEXT_TRACK|
+ | | |
+ | +----------+
+ | |
+ | |
+ | | compr_partial_drain()
+ | |
+ | V
+ | +----------+
+ | | |
+ +------------------------ | PARTIAL_ |
+ | DRAIN |
+ +----------+
+
+Not supported
+=============
+- Support for VoIP/circuit-switched calls is not the target of this
+ API. Support for dynamic bit-rate changes would require a tight
+ coupling between the DSP and the host stack, limiting power savings.
+
+- Packet-loss concealment is not supported. This would require an
+ additional interface to let the decoder synthesize data when frames
+ are lost during transmission. This may be added in the future.
+
+- Volume control/routing is not handled by this API. Devices exposing a
+ compressed data interface will be considered as regular ALSA devices;
+ volume changes and routing information will be provided with regular
+ ALSA kcontrols.
+
+- Embedded audio effects. Such effects should be enabled in the same
+ manner, no matter if the input was PCM or compressed.
+
+- multichannel IEC encoding. Unclear if this is required.
+
+- Encoding/decoding acceleration is not supported as mentioned
+ above. It is possible to route the output of a decoder to a capture
+ stream, or even implement transcoding capabilities. This routing
+ would be enabled with ALSA kcontrols.
+
+- Audio policy/resource management. This API does not provide any
+ hooks to query the utilization of the audio DSP, nor any preemption
+ mechanisms.
+
+- No notion of underrun/overrun. Since the bytes written are compressed
+ in nature and data written/read doesn't translate directly to
+ rendered output in time, this does not deal with underrun/overrun and
+ maybe dealt in user-library
+
+
+Credits
+=======
+- Mark Brown and Liam Girdwood for discussions on the need for this API
+- Harsha Priya for her work on intel_sst compressed API
+- Rakesh Ughreja for valuable feedback
+- Sing Nallasellan, Sikkandar Madar and Prasanna Samaga for
+ demonstrating and quantifying the benefits of audio offload on a
+ real platform.