diff options
author | 2023-02-21 18:24:12 -0800 | |
---|---|---|
committer | 2023-02-21 18:24:12 -0800 | |
commit | 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 (patch) | |
tree | cc5c2d0a898769fd59549594fedb3ee6f84e59a0 /Documentation/driver-api/uio-howto.rst | |
download | linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.tar.gz linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.zip |
Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextgrafted
Pull networking updates from Jakub Kicinski:
"Core:
- Add dedicated kmem_cache for typical/small skb->head, avoid having
to access struct page at kfree time, and improve memory use.
- Introduce sysctl to set default RPS configuration for new netdevs.
- Define Netlink protocol specification format which can be used to
describe messages used by each family and auto-generate parsers.
Add tools for generating kernel data structures and uAPI headers.
- Expose all net/core sysctls inside netns.
- Remove 4s sleep in netpoll if carrier is instantly detected on
boot.
- Add configurable limit of MDB entries per port, and port-vlan.
- Continue populating drop reasons throughout the stack.
- Retire a handful of legacy Qdiscs and classifiers.
Protocols:
- Support IPv4 big TCP (TSO frames larger than 64kB).
- Add IP_LOCAL_PORT_RANGE socket option, to control local port range
on socket by socket basis.
- Track and report in procfs number of MPTCP sockets used.
- Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path
manager.
- IPv6: don't check net.ipv6.route.max_size and rely on garbage
collection to free memory (similarly to IPv4).
- Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).
- ICMP: add per-rate limit counters.
- Add support for user scanning requests in ieee802154.
- Remove static WEP support.
- Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
reporting.
- WiFi 7 EHT channel puncturing support (client & AP).
BPF:
- Add a rbtree data structure following the "next-gen data structure"
precedent set by recently added linked list, that is, by using
kfunc + kptr instead of adding a new BPF map type.
- Expose XDP hints via kfuncs with initial support for RX hash and
timestamp metadata.
- Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to
better support decap on GRE tunnel devices not operating in collect
metadata.
- Improve x86 JIT's codegen for PROBE_MEM runtime error checks.
- Remove the need for trace_printk_lock for bpf_trace_printk and
bpf_trace_vprintk helpers.
- Extend libbpf's bpf_tracing.h support for tracing arguments of
kprobes/uprobes and syscall as a special case.
- Significantly reduce the search time for module symbols by
livepatch and BPF.
- Enable cpumasks to be used as kptrs, which is useful for tracing
programs tracking which tasks end up running on which CPUs in
different time intervals.
- Add support for BPF trampoline on s390x and riscv64.
- Add capability to export the XDP features supported by the NIC.
- Add __bpf_kfunc tag for marking kernel functions as kfuncs.
- Add cgroup.memory=nobpf kernel parameter option to disable BPF
memory accounting for container environments.
Netfilter:
- Remove the CLUSTERIP target. It has been marked as obsolete for
years, and we still have WARN splats wrt races of the out-of-band
/proc interface installed by this target.
- Add 'destroy' commands to nf_tables. They are identical to the
existing 'delete' commands, but do not return an error if the
referenced object (set, chain, rule...) did not exist.
Driver API:
- Improve cpumask_local_spread() locality to help NICs set the right
IRQ affinity on AMD platforms.
- Separate C22 and C45 MDIO bus transactions more clearly.
- Introduce new DCB table to control DSCP rewrite on egress.
- Support configuration of Physical Layer Collision Avoidance (PLCA)
Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
shared medium Ethernet.
- Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
preemption of low priority frames by high priority frames.
- Add support for controlling MACSec offload using netlink SET.
- Rework devlink instance refcounts to allow registration and
de-registration under the instance lock. Split the code into
multiple files, drop some of the unnecessarily granular locks and
factor out common parts of netlink operation handling.
- Add TX frame aggregation parameters (for USB drivers).
- Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
messages with notifications for debug.
- Allow offloading of UDP NEW connections via act_ct.
- Add support for per action HW stats in TC.
- Support hardware miss to TC action (continue processing in SW from
a specific point in the action chain).
- Warn if old Wireless Extension user space interface is used with
modern cfg80211/mac80211 drivers. Do not support Wireless
Extensions for Wi-Fi 7 devices at all. Everyone should switch to
using nl80211 interface instead.
- Improve the CAN bit timing configuration. Use extack to return
error messages directly to user space, update the SJW handling,
including the definition of a new default value that will benefit
CAN-FD controllers, by increasing their oscillator tolerance.
New hardware / drivers:
- Ethernet:
- nVidia BlueField-3 support (control traffic driver)
- Ethernet support for imx93 SoCs
- Motorcomm yt8531 gigabit Ethernet PHY
- onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
- Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
- Amlogic gxl MDIO mux
- WiFi:
- RealTek RTL8188EU (rtl8xxxu)
- Qualcomm Wi-Fi 7 devices (ath12k)
- CAN:
- Renesas R-Car V4H
Drivers:
- Bluetooth:
- Set Per Platform Antenna Gain (PPAG) for Intel controllers.
- Ethernet NICs:
- Intel (1G, igc):
- support TSN / Qbv / packet scheduling features of i226 model
- Intel (100G, ice):
- use GNSS subsystem instead of TTY
- multi-buffer XDP support
- extend support for GPIO pins to E823 devices
- nVidia/Mellanox:
- update the shared buffer configuration on PFC commands
- implement PTP adjphase function for HW offset control
- TC support for Geneve and GRE with VF tunnel offload
- more efficient crypto key management method
- multi-port eswitch support
- Netronome/Corigine:
- add DCB IEEE support
- support IPsec offloading for NFP3800
- Freescale/NXP (enetc):
- support XDP_REDIRECT for XDP non-linear buffers
- improve reconfig, avoid link flap and waiting for idle
- support MAC Merge layer
- Other NICs:
- sfc/ef100: add basic devlink support for ef100
- ionic: rx_push mode operation (writing descriptors via MMIO)
- bnxt: use the auxiliary bus abstraction for RDMA
- r8169: disable ASPM and reset bus in case of tx timeout
- cpsw: support QSGMII mode for J721e CPSW9G
- cpts: support pulse-per-second output
- ngbe: add an mdio bus driver
- usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
- r8152: handle devices with FW with NCM support
- amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
- virtio-net: support multi buffer XDP
- virtio/vsock: replace virtio_vsock_pkt with sk_buff
- tsnep: XDP support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add support for latency TLV (in FW control messages)
- Microchip (sparx5):
- separate explicit and implicit traffic forwarding rules, make
the implicit rules always active
- add support for egress DSCP rewrite
- IS0 VCAP support (Ingress Classification)
- IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS
etc.)
- ES2 VCAP support (Egress Access Control)
- support for Per-Stream Filtering and Policing (802.1Q,
8.6.5.1)
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- add MAB (port auth) offload support
- enable PTP receive for mv88e6390
- NXP (ocelot):
- support MAC Merge layer
- support for the the vsc7512 internal copper phys
- Microchip:
- lan9303: convert to PHYLINK
- lan966x: support TC flower filter statistics
- lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
- lan937x: support Credit Based Shaper configuration
- ksz9477: support Energy Efficient Ethernet
- other:
- qca8k: convert to regmap read/write API, use bulk operations
- rswitch: Improve TX timestamp accuracy
- Intel WiFi (iwlwifi):
- EHT (Wi-Fi 7) rate reporting
- STEP equalizer support: transfer some STEP (connection to radio
on platforms with integrated wifi) related parameters from the
BIOS to the firmware.
- Qualcomm 802.11ax WiFi (ath11k):
- IPQ5018 support
- Fine Timing Measurement (FTM) responder role support
- channel 177 support
- MediaTek WiFi (mt76):
- per-PHY LED support
- mt7996: EHT (Wi-Fi 7) support
- Wireless Ethernet Dispatch (WED) reset support
- switch to using page pool allocator
- RealTek WiFi (rtw89):
- support new version of Bluetooth co-existance
- Mobile:
- rmnet: support TX aggregation"
* tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits)
page_pool: add a comment explaining the fragment counter usage
net: ethtool: fix __ethtool_dev_mm_supported() implementation
ethtool: pse-pd: Fix double word in comments
xsk: add linux/vmalloc.h to xsk.c
sefltests: netdevsim: wait for devlink instance after netns removal
selftest: fib_tests: Always cleanup before exit
net/mlx5e: Align IPsec ASO result memory to be as required by hardware
net/mlx5e: TC, Set CT miss to the specific ct action instance
net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
net/mlx5: Refactor tc miss handling to a single function
net/mlx5: Kconfig: Make tc offload depend on tc skb extension
net/sched: flower: Support hardware miss to tc action
net/sched: flower: Move filter handle initialization earlier
net/sched: cls_api: Support hardware miss to tc action
net/sched: Rename user cookie and act cookie
sfc: fix builds without CONFIG_RTC_LIB
sfc: clean up some inconsistent indentings
net/mlx4_en: Introduce flexible array to silence overflow warning
net: lan966x: Fix possible deadlock inside PTP
net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
...
Diffstat (limited to 'Documentation/driver-api/uio-howto.rst')
-rw-r--r-- | Documentation/driver-api/uio-howto.rst | 730 |
1 files changed, 730 insertions, 0 deletions
diff --git a/Documentation/driver-api/uio-howto.rst b/Documentation/driver-api/uio-howto.rst new file mode 100644 index 000000000..907ffa3b3 --- /dev/null +++ b/Documentation/driver-api/uio-howto.rst @@ -0,0 +1,730 @@ +======================= +The Userspace I/O HOWTO +======================= + +:Author: Hans-Jürgen Koch Linux developer, Linutronix +:Date: 2006-12-11 + +About this document +=================== + +Translations +------------ + +If you know of any translations for this document, or you are interested +in translating it, please email me hjk@hansjkoch.de. + +Preface +------- + +For many types of devices, creating a Linux kernel driver is overkill. +All that is really needed is some way to handle an interrupt and provide +access to the memory space of the device. The logic of controlling the +device does not necessarily have to be within the kernel, as the device +does not need to take advantage of any of other resources that the +kernel provides. One such common class of devices that are like this are +for industrial I/O cards. + +To address this situation, the userspace I/O system (UIO) was designed. +For typical industrial I/O cards, only a very small kernel module is +needed. The main part of the driver will run in user space. This +simplifies development and reduces the risk of serious bugs within a +kernel module. + +Please note that UIO is not an universal driver interface. Devices that +are already handled well by other kernel subsystems (like networking or +serial or USB) are no candidates for an UIO driver. Hardware that is +ideally suited for an UIO driver fulfills all of the following: + +- The device has memory that can be mapped. The device can be + controlled completely by writing to this memory. + +- The device usually generates interrupts. + +- The device does not fit into one of the standard kernel subsystems. + +Acknowledgments +--------------- + +I'd like to thank Thomas Gleixner and Benedikt Spranger of Linutronix, +who have not only written most of the UIO code, but also helped greatly +writing this HOWTO by giving me all kinds of background information. + +Feedback +-------- + +Find something wrong with this document? (Or perhaps something right?) I +would love to hear from you. Please email me at hjk@hansjkoch.de. + +About UIO +========= + +If you use UIO for your card's driver, here's what you get: + +- only one small kernel module to write and maintain. + +- develop the main part of your driver in user space, with all the + tools and libraries you're used to. + +- bugs in your driver won't crash the kernel. + +- updates of your driver can take place without recompiling the kernel. + +How UIO works +------------- + +Each UIO device is accessed through a device file and several sysfs +attribute files. The device file will be called ``/dev/uio0`` for the +first device, and ``/dev/uio1``, ``/dev/uio2`` and so on for subsequent +devices. + +``/dev/uioX`` is used to access the address space of the card. Just use +:c:func:`mmap()` to access registers or RAM locations of your card. + +Interrupts are handled by reading from ``/dev/uioX``. A blocking +:c:func:`read()` from ``/dev/uioX`` will return as soon as an +interrupt occurs. You can also use :c:func:`select()` on +``/dev/uioX`` to wait for an interrupt. The integer value read from +``/dev/uioX`` represents the total interrupt count. You can use this +number to figure out if you missed some interrupts. + +For some hardware that has more than one interrupt source internally, +but not separate IRQ mask and status registers, there might be +situations where userspace cannot determine what the interrupt source +was if the kernel handler disables them by writing to the chip's IRQ +register. In such a case, the kernel has to disable the IRQ completely +to leave the chip's register untouched. Now the userspace part can +determine the cause of the interrupt, but it cannot re-enable +interrupts. Another cornercase is chips where re-enabling interrupts is +a read-modify-write operation to a combined IRQ status/acknowledge +register. This would be racy if a new interrupt occurred simultaneously. + +To address these problems, UIO also implements a write() function. It is +normally not used and can be ignored for hardware that has only a single +interrupt source or has separate IRQ mask and status registers. If you +need it, however, a write to ``/dev/uioX`` will call the +:c:func:`irqcontrol()` function implemented by the driver. You have +to write a 32-bit value that is usually either 0 or 1 to disable or +enable interrupts. If a driver does not implement +:c:func:`irqcontrol()`, :c:func:`write()` will return with +``-ENOSYS``. + +To handle interrupts properly, your custom kernel module can provide its +own interrupt handler. It will automatically be called by the built-in +handler. + +For cards that don't generate interrupts but need to be polled, there is +the possibility to set up a timer that triggers the interrupt handler at +configurable time intervals. This interrupt simulation is done by +calling :c:func:`uio_event_notify()` from the timer's event +handler. + +Each driver provides attributes that are used to read or write +variables. These attributes are accessible through sysfs files. A custom +kernel driver module can add its own attributes to the device owned by +the uio driver, but not added to the UIO device itself at this time. +This might change in the future if it would be found to be useful. + +The following standard attributes are provided by the UIO framework: + +- ``name``: The name of your device. It is recommended to use the name + of your kernel module for this. + +- ``version``: A version string defined by your driver. This allows the + user space part of your driver to deal with different versions of the + kernel module. + +- ``event``: The total number of interrupts handled by the driver since + the last time the device node was read. + +These attributes appear under the ``/sys/class/uio/uioX`` directory. +Please note that this directory might be a symlink, and not a real +directory. Any userspace code that accesses it must be able to handle +this. + +Each UIO device can make one or more memory regions available for memory +mapping. This is necessary because some industrial I/O cards require +access to more than one PCI memory region in a driver. + +Each mapping has its own directory in sysfs, the first mapping appears +as ``/sys/class/uio/uioX/maps/map0/``. Subsequent mappings create +directories ``map1/``, ``map2/``, and so on. These directories will only +appear if the size of the mapping is not 0. + +Each ``mapX/`` directory contains four read-only files that show +attributes of the memory: + +- ``name``: A string identifier for this mapping. This is optional, the + string can be empty. Drivers can set this to make it easier for + userspace to find the correct mapping. + +- ``addr``: The address of memory that can be mapped. + +- ``size``: The size, in bytes, of the memory pointed to by addr. + +- ``offset``: The offset, in bytes, that has to be added to the pointer + returned by :c:func:`mmap()` to get to the actual device memory. + This is important if the device's memory is not page aligned. + Remember that pointers returned by :c:func:`mmap()` are always + page aligned, so it is good style to always add this offset. + +From userspace, the different mappings are distinguished by adjusting +the ``offset`` parameter of the :c:func:`mmap()` call. To map the +memory of mapping N, you have to use N times the page size as your +offset:: + + offset = N * getpagesize(); + +Sometimes there is hardware with memory-like regions that can not be +mapped with the technique described here, but there are still ways to +access them from userspace. The most common example are x86 ioports. On +x86 systems, userspace can access these ioports using +:c:func:`ioperm()`, :c:func:`iopl()`, :c:func:`inb()`, +:c:func:`outb()`, and similar functions. + +Since these ioport regions can not be mapped, they will not appear under +``/sys/class/uio/uioX/maps/`` like the normal memory described above. +Without information about the port regions a hardware has to offer, it +becomes difficult for the userspace part of the driver to find out which +ports belong to which UIO device. + +To address this situation, the new directory +``/sys/class/uio/uioX/portio/`` was added. It only exists if the driver +wants to pass information about one or more port regions to userspace. +If that is the case, subdirectories named ``port0``, ``port1``, and so +on, will appear underneath ``/sys/class/uio/uioX/portio/``. + +Each ``portX/`` directory contains four read-only files that show name, +start, size, and type of the port region: + +- ``name``: A string identifier for this port region. The string is + optional and can be empty. Drivers can set it to make it easier for + userspace to find a certain port region. + +- ``start``: The first port of this region. + +- ``size``: The number of ports in this region. + +- ``porttype``: A string describing the type of port. + +Writing your own kernel module +============================== + +Please have a look at ``uio_cif.c`` as an example. The following +paragraphs explain the different sections of this file. + +struct uio_info +--------------- + +This structure tells the framework the details of your driver, Some of +the members are required, others are optional. + +- ``const char *name``: Required. The name of your driver as it will + appear in sysfs. I recommend using the name of your module for this. + +- ``const char *version``: Required. This string appears in + ``/sys/class/uio/uioX/version``. + +- ``struct uio_mem mem[ MAX_UIO_MAPS ]``: Required if you have memory + that can be mapped with :c:func:`mmap()`. For each mapping you + need to fill one of the ``uio_mem`` structures. See the description + below for details. + +- ``struct uio_port port[ MAX_UIO_PORTS_REGIONS ]``: Required if you + want to pass information about ioports to userspace. For each port + region you need to fill one of the ``uio_port`` structures. See the + description below for details. + +- ``long irq``: Required. If your hardware generates an interrupt, it's + your modules task to determine the irq number during initialization. + If you don't have a hardware generated interrupt but want to trigger + the interrupt handler in some other way, set ``irq`` to + ``UIO_IRQ_CUSTOM``. If you had no interrupt at all, you could set + ``irq`` to ``UIO_IRQ_NONE``, though this rarely makes sense. + +- ``unsigned long irq_flags``: Required if you've set ``irq`` to a + hardware interrupt number. The flags given here will be used in the + call to :c:func:`request_irq()`. + +- ``int (*mmap)(struct uio_info *info, struct vm_area_struct *vma)``: + Optional. If you need a special :c:func:`mmap()` + function, you can set it here. If this pointer is not NULL, your + :c:func:`mmap()` will be called instead of the built-in one. + +- ``int (*open)(struct uio_info *info, struct inode *inode)``: + Optional. You might want to have your own :c:func:`open()`, + e.g. to enable interrupts only when your device is actually used. + +- ``int (*release)(struct uio_info *info, struct inode *inode)``: + Optional. If you define your own :c:func:`open()`, you will + probably also want a custom :c:func:`release()` function. + +- ``int (*irqcontrol)(struct uio_info *info, s32 irq_on)``: + Optional. If you need to be able to enable or disable interrupts + from userspace by writing to ``/dev/uioX``, you can implement this + function. The parameter ``irq_on`` will be 0 to disable interrupts + and 1 to enable them. + +Usually, your device will have one or more memory regions that can be +mapped to user space. For each region, you have to set up a +``struct uio_mem`` in the ``mem[]`` array. Here's a description of the +fields of ``struct uio_mem``: + +- ``const char *name``: Optional. Set this to help identify the memory + region, it will show up in the corresponding sysfs node. + +- ``int memtype``: Required if the mapping is used. Set this to + ``UIO_MEM_PHYS`` if you have physical memory on your card to be + mapped. Use ``UIO_MEM_LOGICAL`` for logical memory (e.g. allocated + with :c:func:`__get_free_pages()` but not kmalloc()). There's also + ``UIO_MEM_VIRTUAL`` for virtual memory. + +- ``phys_addr_t addr``: Required if the mapping is used. Fill in the + address of your memory block. This address is the one that appears in + sysfs. + +- ``resource_size_t size``: Fill in the size of the memory block that + ``addr`` points to. If ``size`` is zero, the mapping is considered + unused. Note that you *must* initialize ``size`` with zero for all + unused mappings. + +- ``void *internal_addr``: If you have to access this memory region + from within your kernel module, you will want to map it internally by + using something like :c:func:`ioremap()`. Addresses returned by + this function cannot be mapped to user space, so you must not store + it in ``addr``. Use ``internal_addr`` instead to remember such an + address. + +Please do not touch the ``map`` element of ``struct uio_mem``! It is +used by the UIO framework to set up sysfs files for this mapping. Simply +leave it alone. + +Sometimes, your device can have one or more port regions which can not +be mapped to userspace. But if there are other possibilities for +userspace to access these ports, it makes sense to make information +about the ports available in sysfs. For each region, you have to set up +a ``struct uio_port`` in the ``port[]`` array. Here's a description of +the fields of ``struct uio_port``: + +- ``char *porttype``: Required. Set this to one of the predefined + constants. Use ``UIO_PORT_X86`` for the ioports found in x86 + architectures. + +- ``unsigned long start``: Required if the port region is used. Fill in + the number of the first port of this region. + +- ``unsigned long size``: Fill in the number of ports in this region. + If ``size`` is zero, the region is considered unused. Note that you + *must* initialize ``size`` with zero for all unused regions. + +Please do not touch the ``portio`` element of ``struct uio_port``! It is +used internally by the UIO framework to set up sysfs files for this +region. Simply leave it alone. + +Adding an interrupt handler +--------------------------- + +What you need to do in your interrupt handler depends on your hardware +and on how you want to handle it. You should try to keep the amount of +code in your kernel interrupt handler low. If your hardware requires no +action that you *have* to perform after each interrupt, then your +handler can be empty. + +If, on the other hand, your hardware *needs* some action to be performed +after each interrupt, then you *must* do it in your kernel module. Note +that you cannot rely on the userspace part of your driver. Your +userspace program can terminate at any time, possibly leaving your +hardware in a state where proper interrupt handling is still required. + +There might also be applications where you want to read data from your +hardware at each interrupt and buffer it in a piece of kernel memory +you've allocated for that purpose. With this technique you could avoid +loss of data if your userspace program misses an interrupt. + +A note on shared interrupts: Your driver should support interrupt +sharing whenever this is possible. It is possible if and only if your +driver can detect whether your hardware has triggered the interrupt or +not. This is usually done by looking at an interrupt status register. If +your driver sees that the IRQ bit is actually set, it will perform its +actions, and the handler returns IRQ_HANDLED. If the driver detects +that it was not your hardware that caused the interrupt, it will do +nothing and return IRQ_NONE, allowing the kernel to call the next +possible interrupt handler. + +If you decide not to support shared interrupts, your card won't work in +computers with no free interrupts. As this frequently happens on the PC +platform, you can save yourself a lot of trouble by supporting interrupt +sharing. + +Using uio_pdrv for platform devices +----------------------------------- + +In many cases, UIO drivers for platform devices can be handled in a +generic way. In the same place where you define your +``struct platform_device``, you simply also implement your interrupt +handler and fill your ``struct uio_info``. A pointer to this +``struct uio_info`` is then used as ``platform_data`` for your platform +device. + +You also need to set up an array of ``struct resource`` containing +addresses and sizes of your memory mappings. This information is passed +to the driver using the ``.resource`` and ``.num_resources`` elements of +``struct platform_device``. + +You now have to set the ``.name`` element of ``struct platform_device`` +to ``"uio_pdrv"`` to use the generic UIO platform device driver. This +driver will fill the ``mem[]`` array according to the resources given, +and register the device. + +The advantage of this approach is that you only have to edit a file you +need to edit anyway. You do not have to create an extra driver. + +Using uio_pdrv_genirq for platform devices +------------------------------------------ + +Especially in embedded devices, you frequently find chips where the irq +pin is tied to its own dedicated interrupt line. In such cases, where +you can be really sure the interrupt is not shared, we can take the +concept of ``uio_pdrv`` one step further and use a generic interrupt +handler. That's what ``uio_pdrv_genirq`` does. + +The setup for this driver is the same as described above for +``uio_pdrv``, except that you do not implement an interrupt handler. The +``.handler`` element of ``struct uio_info`` must remain ``NULL``. The +``.irq_flags`` element must not contain ``IRQF_SHARED``. + +You will set the ``.name`` element of ``struct platform_device`` to +``"uio_pdrv_genirq"`` to use this driver. + +The generic interrupt handler of ``uio_pdrv_genirq`` will simply disable +the interrupt line using :c:func:`disable_irq_nosync()`. After +doing its work, userspace can reenable the interrupt by writing +0x00000001 to the UIO device file. The driver already implements an +:c:func:`irq_control()` to make this possible, you must not +implement your own. + +Using ``uio_pdrv_genirq`` not only saves a few lines of interrupt +handler code. You also do not need to know anything about the chip's +internal registers to create the kernel part of the driver. All you need +to know is the irq number of the pin the chip is connected to. + +When used in a device-tree enabled system, the driver needs to be +probed with the ``"of_id"`` module parameter set to the ``"compatible"`` +string of the node the driver is supposed to handle. By default, the +node's name (without the unit address) is exposed as name for the +UIO device in userspace. To set a custom name, a property named +``"linux,uio-name"`` may be specified in the DT node. + +Using uio_dmem_genirq for platform devices +------------------------------------------ + +In addition to statically allocated memory ranges, they may also be a +desire to use dynamically allocated regions in a user space driver. In +particular, being able to access memory made available through the +dma-mapping API, may be particularly useful. The ``uio_dmem_genirq`` +driver provides a way to accomplish this. + +This driver is used in a similar manner to the ``"uio_pdrv_genirq"`` +driver with respect to interrupt configuration and handling. + +Set the ``.name`` element of ``struct platform_device`` to +``"uio_dmem_genirq"`` to use this driver. + +When using this driver, fill in the ``.platform_data`` element of +``struct platform_device``, which is of type +``struct uio_dmem_genirq_pdata`` and which contains the following +elements: + +- ``struct uio_info uioinfo``: The same structure used as the + ``uio_pdrv_genirq`` platform data + +- ``unsigned int *dynamic_region_sizes``: Pointer to list of sizes of + dynamic memory regions to be mapped into user space. + +- ``unsigned int num_dynamic_regions``: Number of elements in + ``dynamic_region_sizes`` array. + +The dynamic regions defined in the platform data will be appended to the +`` mem[] `` array after the platform device resources, which implies +that the total number of static and dynamic memory regions cannot exceed +``MAX_UIO_MAPS``. + +The dynamic memory regions will be allocated when the UIO device file, +``/dev/uioX`` is opened. Similar to static memory resources, the memory +region information for dynamic regions is then visible via sysfs at +``/sys/class/uio/uioX/maps/mapY/*``. The dynamic memory regions will be +freed when the UIO device file is closed. When no processes are holding +the device file open, the address returned to userspace is ~0. + +Writing a driver in userspace +============================= + +Once you have a working kernel module for your hardware, you can write +the userspace part of your driver. You don't need any special libraries, +your driver can be written in any reasonable language, you can use +floating point numbers and so on. In short, you can use all the tools +and libraries you'd normally use for writing a userspace application. + +Getting information about your UIO device +----------------------------------------- + +Information about all UIO devices is available in sysfs. The first thing +you should do in your driver is check ``name`` and ``version`` to make +sure you're talking to the right device and that its kernel driver has +the version you expect. + +You should also make sure that the memory mapping you need exists and +has the size you expect. + +There is a tool called ``lsuio`` that lists UIO devices and their +attributes. It is available here: + +http://www.osadl.org/projects/downloads/UIO/user/ + +With ``lsuio`` you can quickly check if your kernel module is loaded and +which attributes it exports. Have a look at the manpage for details. + +The source code of ``lsuio`` can serve as an example for getting +information about an UIO device. The file ``uio_helper.c`` contains a +lot of functions you could use in your userspace driver code. + +mmap() device memory +-------------------- + +After you made sure you've got the right device with the memory mappings +you need, all you have to do is to call :c:func:`mmap()` to map the +device's memory to userspace. + +The parameter ``offset`` of the :c:func:`mmap()` call has a special +meaning for UIO devices: It is used to select which mapping of your +device you want to map. To map the memory of mapping N, you have to use +N times the page size as your offset:: + + offset = N * getpagesize(); + +N starts from zero, so if you've got only one memory range to map, set +``offset = 0``. A drawback of this technique is that memory is always +mapped beginning with its start address. + +Waiting for interrupts +---------------------- + +After you successfully mapped your devices memory, you can access it +like an ordinary array. Usually, you will perform some initialization. +After that, your hardware starts working and will generate an interrupt +as soon as it's finished, has some data available, or needs your +attention because an error occurred. + +``/dev/uioX`` is a read-only file. A :c:func:`read()` will always +block until an interrupt occurs. There is only one legal value for the +``count`` parameter of :c:func:`read()`, and that is the size of a +signed 32 bit integer (4). Any other value for ``count`` causes +:c:func:`read()` to fail. The signed 32 bit integer read is the +interrupt count of your device. If the value is one more than the value +you read the last time, everything is OK. If the difference is greater +than one, you missed interrupts. + +You can also use :c:func:`select()` on ``/dev/uioX``. + +Generic PCI UIO driver +====================== + +The generic driver is a kernel module named uio_pci_generic. It can +work with any device compliant to PCI 2.3 (circa 2002) and any compliant +PCI Express device. Using this, you only need to write the userspace +driver, removing the need to write a hardware-specific kernel module. + +Making the driver recognize the device +-------------------------------------- + +Since the driver does not declare any device ids, it will not get loaded +automatically and will not automatically bind to any devices, you must +load it and allocate id to the driver yourself. For example:: + + modprobe uio_pci_generic + echo "8086 10f5" > /sys/bus/pci/drivers/uio_pci_generic/new_id + +If there already is a hardware specific kernel driver for your device, +the generic driver still won't bind to it, in this case if you want to +use the generic driver (why would you?) you'll have to manually unbind +the hardware specific driver and bind the generic driver, like this:: + + echo -n 0000:00:19.0 > /sys/bus/pci/drivers/e1000e/unbind + echo -n 0000:00:19.0 > /sys/bus/pci/drivers/uio_pci_generic/bind + +You can verify that the device has been bound to the driver by looking +for it in sysfs, for example like the following:: + + ls -l /sys/bus/pci/devices/0000:00:19.0/driver + +Which if successful should print:: + + .../0000:00:19.0/driver -> ../../../bus/pci/drivers/uio_pci_generic + +Note that the generic driver will not bind to old PCI 2.2 devices. If +binding the device failed, run the following command:: + + dmesg + +and look in the output for failure reasons. + +Things to know about uio_pci_generic +------------------------------------ + +Interrupts are handled using the Interrupt Disable bit in the PCI +command register and Interrupt Status bit in the PCI status register. +All devices compliant to PCI 2.3 (circa 2002) and all compliant PCI +Express devices should support these bits. uio_pci_generic detects +this support, and won't bind to devices which do not support the +Interrupt Disable Bit in the command register. + +On each interrupt, uio_pci_generic sets the Interrupt Disable bit. +This prevents the device from generating further interrupts until the +bit is cleared. The userspace driver should clear this bit before +blocking and waiting for more interrupts. + +Writing userspace driver using uio_pci_generic +------------------------------------------------ + +Userspace driver can use pci sysfs interface, or the libpci library that +wraps it, to talk to the device and to re-enable interrupts by writing +to the command register. + +Example code using uio_pci_generic +---------------------------------- + +Here is some sample userspace driver code using uio_pci_generic:: + + #include <stdlib.h> + #include <stdio.h> + #include <unistd.h> + #include <sys/types.h> + #include <sys/stat.h> + #include <fcntl.h> + #include <errno.h> + + int main() + { + int uiofd; + int configfd; + int err; + int i; + unsigned icount; + unsigned char command_high; + + uiofd = open("/dev/uio0", O_RDONLY); + if (uiofd < 0) { + perror("uio open:"); + return errno; + } + configfd = open("/sys/class/uio/uio0/device/config", O_RDWR); + if (configfd < 0) { + perror("config open:"); + return errno; + } + + /* Read and cache command value */ + err = pread(configfd, &command_high, 1, 5); + if (err != 1) { + perror("command config read:"); + return errno; + } + command_high &= ~0x4; + + for(i = 0;; ++i) { + /* Print out a message, for debugging. */ + if (i == 0) + fprintf(stderr, "Started uio test driver.\n"); + else + fprintf(stderr, "Interrupts: %d\n", icount); + + /****************************************/ + /* Here we got an interrupt from the + device. Do something to it. */ + /****************************************/ + + /* Re-enable interrupts. */ + err = pwrite(configfd, &command_high, 1, 5); + if (err != 1) { + perror("config write:"); + break; + } + + /* Wait for next interrupt. */ + err = read(uiofd, &icount, 4); + if (err != 4) { + perror("uio read:"); + break; + } + + } + return errno; + } + +Generic Hyper-V UIO driver +========================== + +The generic driver is a kernel module named uio_hv_generic. It +supports devices on the Hyper-V VMBus similar to uio_pci_generic on +PCI bus. + +Making the driver recognize the device +-------------------------------------- + +Since the driver does not declare any device GUID's, it will not get +loaded automatically and will not automatically bind to any devices, you +must load it and allocate id to the driver yourself. For example, to use +the network device class GUID:: + + modprobe uio_hv_generic + echo "f8615163-df3e-46c5-913f-f2d2f965ed0e" > /sys/bus/vmbus/drivers/uio_hv_generic/new_id + +If there already is a hardware specific kernel driver for the device, +the generic driver still won't bind to it, in this case if you want to +use the generic driver for a userspace library you'll have to manually unbind +the hardware specific driver and bind the generic driver, using the device specific GUID +like this:: + + echo -n ed963694-e847-4b2a-85af-bc9cfc11d6f3 > /sys/bus/vmbus/drivers/hv_netvsc/unbind + echo -n ed963694-e847-4b2a-85af-bc9cfc11d6f3 > /sys/bus/vmbus/drivers/uio_hv_generic/bind + +You can verify that the device has been bound to the driver by looking +for it in sysfs, for example like the following:: + + ls -l /sys/bus/vmbus/devices/ed963694-e847-4b2a-85af-bc9cfc11d6f3/driver + +Which if successful should print:: + + .../ed963694-e847-4b2a-85af-bc9cfc11d6f3/driver -> ../../../bus/vmbus/drivers/uio_hv_generic + +Things to know about uio_hv_generic +----------------------------------- + +On each interrupt, uio_hv_generic sets the Interrupt Disable bit. This +prevents the device from generating further interrupts until the bit is +cleared. The userspace driver should clear this bit before blocking and +waiting for more interrupts. + +When host rescinds a device, the interrupt file descriptor is marked down +and any reads of the interrupt file descriptor will return -EIO. Similar +to a closed socket or disconnected serial device. + +The vmbus device regions are mapped into uio device resources: + 0) Channel ring buffers: guest to host and host to guest + 1) Guest to host interrupt signalling pages + 2) Guest to host monitor page + 3) Network receive buffer region + 4) Network send buffer region + +If a subchannel is created by a request to host, then the uio_hv_generic +device driver will create a sysfs binary file for the per-channel ring buffer. +For example:: + + /sys/bus/vmbus/devices/3811fe4d-0fa0-4b62-981a-74fc1084c757/channels/21/ring + +Further information +=================== + +- `OSADL homepage. <http://www.osadl.org>`_ + +- `Linutronix homepage. <http://www.linutronix.de>`_ |