diff options
author | 2023-02-21 18:24:12 -0800 | |
---|---|---|
committer | 2023-02-21 18:24:12 -0800 | |
commit | 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 (patch) | |
tree | cc5c2d0a898769fd59549594fedb3ee6f84e59a0 /drivers/gpu/drm/v3d/v3d_drv.h | |
download | linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.tar.gz linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.zip |
Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextgrafted
Pull networking updates from Jakub Kicinski:
"Core:
- Add dedicated kmem_cache for typical/small skb->head, avoid having
to access struct page at kfree time, and improve memory use.
- Introduce sysctl to set default RPS configuration for new netdevs.
- Define Netlink protocol specification format which can be used to
describe messages used by each family and auto-generate parsers.
Add tools for generating kernel data structures and uAPI headers.
- Expose all net/core sysctls inside netns.
- Remove 4s sleep in netpoll if carrier is instantly detected on
boot.
- Add configurable limit of MDB entries per port, and port-vlan.
- Continue populating drop reasons throughout the stack.
- Retire a handful of legacy Qdiscs and classifiers.
Protocols:
- Support IPv4 big TCP (TSO frames larger than 64kB).
- Add IP_LOCAL_PORT_RANGE socket option, to control local port range
on socket by socket basis.
- Track and report in procfs number of MPTCP sockets used.
- Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path
manager.
- IPv6: don't check net.ipv6.route.max_size and rely on garbage
collection to free memory (similarly to IPv4).
- Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).
- ICMP: add per-rate limit counters.
- Add support for user scanning requests in ieee802154.
- Remove static WEP support.
- Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
reporting.
- WiFi 7 EHT channel puncturing support (client & AP).
BPF:
- Add a rbtree data structure following the "next-gen data structure"
precedent set by recently added linked list, that is, by using
kfunc + kptr instead of adding a new BPF map type.
- Expose XDP hints via kfuncs with initial support for RX hash and
timestamp metadata.
- Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to
better support decap on GRE tunnel devices not operating in collect
metadata.
- Improve x86 JIT's codegen for PROBE_MEM runtime error checks.
- Remove the need for trace_printk_lock for bpf_trace_printk and
bpf_trace_vprintk helpers.
- Extend libbpf's bpf_tracing.h support for tracing arguments of
kprobes/uprobes and syscall as a special case.
- Significantly reduce the search time for module symbols by
livepatch and BPF.
- Enable cpumasks to be used as kptrs, which is useful for tracing
programs tracking which tasks end up running on which CPUs in
different time intervals.
- Add support for BPF trampoline on s390x and riscv64.
- Add capability to export the XDP features supported by the NIC.
- Add __bpf_kfunc tag for marking kernel functions as kfuncs.
- Add cgroup.memory=nobpf kernel parameter option to disable BPF
memory accounting for container environments.
Netfilter:
- Remove the CLUSTERIP target. It has been marked as obsolete for
years, and we still have WARN splats wrt races of the out-of-band
/proc interface installed by this target.
- Add 'destroy' commands to nf_tables. They are identical to the
existing 'delete' commands, but do not return an error if the
referenced object (set, chain, rule...) did not exist.
Driver API:
- Improve cpumask_local_spread() locality to help NICs set the right
IRQ affinity on AMD platforms.
- Separate C22 and C45 MDIO bus transactions more clearly.
- Introduce new DCB table to control DSCP rewrite on egress.
- Support configuration of Physical Layer Collision Avoidance (PLCA)
Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
shared medium Ethernet.
- Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
preemption of low priority frames by high priority frames.
- Add support for controlling MACSec offload using netlink SET.
- Rework devlink instance refcounts to allow registration and
de-registration under the instance lock. Split the code into
multiple files, drop some of the unnecessarily granular locks and
factor out common parts of netlink operation handling.
- Add TX frame aggregation parameters (for USB drivers).
- Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
messages with notifications for debug.
- Allow offloading of UDP NEW connections via act_ct.
- Add support for per action HW stats in TC.
- Support hardware miss to TC action (continue processing in SW from
a specific point in the action chain).
- Warn if old Wireless Extension user space interface is used with
modern cfg80211/mac80211 drivers. Do not support Wireless
Extensions for Wi-Fi 7 devices at all. Everyone should switch to
using nl80211 interface instead.
- Improve the CAN bit timing configuration. Use extack to return
error messages directly to user space, update the SJW handling,
including the definition of a new default value that will benefit
CAN-FD controllers, by increasing their oscillator tolerance.
New hardware / drivers:
- Ethernet:
- nVidia BlueField-3 support (control traffic driver)
- Ethernet support for imx93 SoCs
- Motorcomm yt8531 gigabit Ethernet PHY
- onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
- Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
- Amlogic gxl MDIO mux
- WiFi:
- RealTek RTL8188EU (rtl8xxxu)
- Qualcomm Wi-Fi 7 devices (ath12k)
- CAN:
- Renesas R-Car V4H
Drivers:
- Bluetooth:
- Set Per Platform Antenna Gain (PPAG) for Intel controllers.
- Ethernet NICs:
- Intel (1G, igc):
- support TSN / Qbv / packet scheduling features of i226 model
- Intel (100G, ice):
- use GNSS subsystem instead of TTY
- multi-buffer XDP support
- extend support for GPIO pins to E823 devices
- nVidia/Mellanox:
- update the shared buffer configuration on PFC commands
- implement PTP adjphase function for HW offset control
- TC support for Geneve and GRE with VF tunnel offload
- more efficient crypto key management method
- multi-port eswitch support
- Netronome/Corigine:
- add DCB IEEE support
- support IPsec offloading for NFP3800
- Freescale/NXP (enetc):
- support XDP_REDIRECT for XDP non-linear buffers
- improve reconfig, avoid link flap and waiting for idle
- support MAC Merge layer
- Other NICs:
- sfc/ef100: add basic devlink support for ef100
- ionic: rx_push mode operation (writing descriptors via MMIO)
- bnxt: use the auxiliary bus abstraction for RDMA
- r8169: disable ASPM and reset bus in case of tx timeout
- cpsw: support QSGMII mode for J721e CPSW9G
- cpts: support pulse-per-second output
- ngbe: add an mdio bus driver
- usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
- r8152: handle devices with FW with NCM support
- amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
- virtio-net: support multi buffer XDP
- virtio/vsock: replace virtio_vsock_pkt with sk_buff
- tsnep: XDP support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add support for latency TLV (in FW control messages)
- Microchip (sparx5):
- separate explicit and implicit traffic forwarding rules, make
the implicit rules always active
- add support for egress DSCP rewrite
- IS0 VCAP support (Ingress Classification)
- IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS
etc.)
- ES2 VCAP support (Egress Access Control)
- support for Per-Stream Filtering and Policing (802.1Q,
8.6.5.1)
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- add MAB (port auth) offload support
- enable PTP receive for mv88e6390
- NXP (ocelot):
- support MAC Merge layer
- support for the the vsc7512 internal copper phys
- Microchip:
- lan9303: convert to PHYLINK
- lan966x: support TC flower filter statistics
- lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
- lan937x: support Credit Based Shaper configuration
- ksz9477: support Energy Efficient Ethernet
- other:
- qca8k: convert to regmap read/write API, use bulk operations
- rswitch: Improve TX timestamp accuracy
- Intel WiFi (iwlwifi):
- EHT (Wi-Fi 7) rate reporting
- STEP equalizer support: transfer some STEP (connection to radio
on platforms with integrated wifi) related parameters from the
BIOS to the firmware.
- Qualcomm 802.11ax WiFi (ath11k):
- IPQ5018 support
- Fine Timing Measurement (FTM) responder role support
- channel 177 support
- MediaTek WiFi (mt76):
- per-PHY LED support
- mt7996: EHT (Wi-Fi 7) support
- Wireless Ethernet Dispatch (WED) reset support
- switch to using page pool allocator
- RealTek WiFi (rtw89):
- support new version of Bluetooth co-existance
- Mobile:
- rmnet: support TX aggregation"
* tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits)
page_pool: add a comment explaining the fragment counter usage
net: ethtool: fix __ethtool_dev_mm_supported() implementation
ethtool: pse-pd: Fix double word in comments
xsk: add linux/vmalloc.h to xsk.c
sefltests: netdevsim: wait for devlink instance after netns removal
selftest: fib_tests: Always cleanup before exit
net/mlx5e: Align IPsec ASO result memory to be as required by hardware
net/mlx5e: TC, Set CT miss to the specific ct action instance
net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
net/mlx5: Refactor tc miss handling to a single function
net/mlx5: Kconfig: Make tc offload depend on tc skb extension
net/sched: flower: Support hardware miss to tc action
net/sched: flower: Move filter handle initialization earlier
net/sched: cls_api: Support hardware miss to tc action
net/sched: Rename user cookie and act cookie
sfc: fix builds without CONFIG_RTC_LIB
sfc: clean up some inconsistent indentings
net/mlx4_en: Introduce flexible array to silence overflow warning
net: lan966x: Fix possible deadlock inside PTP
net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
...
Diffstat (limited to 'drivers/gpu/drm/v3d/v3d_drv.h')
-rw-r--r-- | drivers/gpu/drm/v3d/v3d_drv.h | 420 |
1 files changed, 420 insertions, 0 deletions
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h new file mode 100644 index 000000000..b74b1351b --- /dev/null +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -0,0 +1,420 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* Copyright (C) 2015-2018 Broadcom */ + +#include <linux/delay.h> +#include <linux/mutex.h> +#include <linux/spinlock_types.h> +#include <linux/workqueue.h> + +#include <drm/drm_encoder.h> +#include <drm/drm_gem.h> +#include <drm/drm_gem_shmem_helper.h> +#include <drm/gpu_scheduler.h> + +#include "uapi/drm/v3d_drm.h" + +struct clk; +struct platform_device; +struct reset_control; + +#define GMP_GRANULARITY (128 * 1024) + +#define V3D_MAX_QUEUES (V3D_CACHE_CLEAN + 1) + +struct v3d_queue_state { + struct drm_gpu_scheduler sched; + + u64 fence_context; + u64 emit_seqno; +}; + +/* Performance monitor object. The perform lifetime is controlled by userspace + * using perfmon related ioctls. A perfmon can be attached to a submit_cl + * request, and when this is the case, HW perf counters will be activated just + * before the submit_cl is submitted to the GPU and disabled when the job is + * done. This way, only events related to a specific job will be counted. + */ +struct v3d_perfmon { + /* Tracks the number of users of the perfmon, when this counter reaches + * zero the perfmon is destroyed. + */ + refcount_t refcnt; + + /* Protects perfmon stop, as it can be invoked from multiple places. */ + struct mutex lock; + + /* Number of counters activated in this perfmon instance + * (should be less than DRM_V3D_MAX_PERF_COUNTERS). + */ + u8 ncounters; + + /* Events counted by the HW perf counters. */ + u8 counters[DRM_V3D_MAX_PERF_COUNTERS]; + + /* Storage for counter values. Counters are incremented by the + * HW perf counter values every time the perfmon is attached + * to a GPU job. This way, perfmon users don't have to + * retrieve the results after each job if they want to track + * events covering several submissions. Note that counter + * values can't be reset, but you can fake a reset by + * destroying the perfmon and creating a new one. + */ + u64 values[]; +}; + +struct v3d_dev { + struct drm_device drm; + + /* Short representation (e.g. 33, 41) of the V3D tech version + * and revision. + */ + int ver; + bool single_irq_line; + + void __iomem *hub_regs; + void __iomem *core_regs[3]; + void __iomem *bridge_regs; + void __iomem *gca_regs; + struct clk *clk; + struct reset_control *reset; + + /* Virtual and DMA addresses of the single shared page table. */ + volatile u32 *pt; + dma_addr_t pt_paddr; + + /* Virtual and DMA addresses of the MMU's scratch page. When + * a read or write is invalid in the MMU, it will be + * redirected here. + */ + void *mmu_scratch; + dma_addr_t mmu_scratch_paddr; + /* virtual address bits from V3D to the MMU. */ + int va_width; + + /* Number of V3D cores. */ + u32 cores; + + /* Allocator managing the address space. All units are in + * number of pages. + */ + struct drm_mm mm; + spinlock_t mm_lock; + + struct work_struct overflow_mem_work; + + struct v3d_bin_job *bin_job; + struct v3d_render_job *render_job; + struct v3d_tfu_job *tfu_job; + struct v3d_csd_job *csd_job; + + struct v3d_queue_state queue[V3D_MAX_QUEUES]; + + /* Spinlock used to synchronize the overflow memory + * management against bin job submission. + */ + spinlock_t job_lock; + + /* Used to track the active perfmon if any. */ + struct v3d_perfmon *active_perfmon; + + /* Protects bo_stats */ + struct mutex bo_lock; + + /* Lock taken when resetting the GPU, to keep multiple + * processes from trying to park the scheduler threads and + * reset at once. + */ + struct mutex reset_lock; + + /* Lock taken when creating and pushing the GPU scheduler + * jobs, to keep the sched-fence seqnos in order. + */ + struct mutex sched_lock; + + /* Lock taken during a cache clean and when initiating an L2 + * flush, to keep L2 flushes from interfering with the + * synchronous L2 cleans. + */ + struct mutex cache_clean_lock; + + struct { + u32 num_allocated; + u32 pages_allocated; + } bo_stats; +}; + +static inline struct v3d_dev * +to_v3d_dev(struct drm_device *dev) +{ + return container_of(dev, struct v3d_dev, drm); +} + +static inline bool +v3d_has_csd(struct v3d_dev *v3d) +{ + return v3d->ver >= 41; +} + +#define v3d_to_pdev(v3d) to_platform_device((v3d)->drm.dev) + +/* The per-fd struct, which tracks the MMU mappings. */ +struct v3d_file_priv { + struct v3d_dev *v3d; + + struct { + struct idr idr; + struct mutex lock; + } perfmon; + + struct drm_sched_entity sched_entity[V3D_MAX_QUEUES]; +}; + +struct v3d_bo { + struct drm_gem_shmem_object base; + + struct drm_mm_node node; + + /* List entry for the BO's position in + * v3d_render_job->unref_list + */ + struct list_head unref_head; +}; + +static inline struct v3d_bo * +to_v3d_bo(struct drm_gem_object *bo) +{ + return (struct v3d_bo *)bo; +} + +struct v3d_fence { + struct dma_fence base; + struct drm_device *dev; + /* v3d seqno for signaled() test */ + u64 seqno; + enum v3d_queue queue; +}; + +static inline struct v3d_fence * +to_v3d_fence(struct dma_fence *fence) +{ + return (struct v3d_fence *)fence; +} + +#define V3D_READ(offset) readl(v3d->hub_regs + offset) +#define V3D_WRITE(offset, val) writel(val, v3d->hub_regs + offset) + +#define V3D_BRIDGE_READ(offset) readl(v3d->bridge_regs + offset) +#define V3D_BRIDGE_WRITE(offset, val) writel(val, v3d->bridge_regs + offset) + +#define V3D_GCA_READ(offset) readl(v3d->gca_regs + offset) +#define V3D_GCA_WRITE(offset, val) writel(val, v3d->gca_regs + offset) + +#define V3D_CORE_READ(core, offset) readl(v3d->core_regs[core] + offset) +#define V3D_CORE_WRITE(core, offset, val) writel(val, v3d->core_regs[core] + offset) + +struct v3d_job { + struct drm_sched_job base; + + struct kref refcount; + + struct v3d_dev *v3d; + + /* This is the array of BOs that were looked up at the start + * of submission. + */ + struct drm_gem_object **bo; + u32 bo_count; + + /* v3d fence to be signaled by IRQ handler when the job is complete. */ + struct dma_fence *irq_fence; + + /* scheduler fence for when the job is considered complete and + * the BO reservations can be released. + */ + struct dma_fence *done_fence; + + /* Pointer to a performance monitor object if the user requested it, + * NULL otherwise. + */ + struct v3d_perfmon *perfmon; + + /* Callback for the freeing of the job on refcount going to 0. */ + void (*free)(struct kref *ref); +}; + +struct v3d_bin_job { + struct v3d_job base; + + /* GPU virtual addresses of the start/end of the CL job. */ + u32 start, end; + + u32 timedout_ctca, timedout_ctra; + + /* Corresponding render job, for attaching our overflow memory. */ + struct v3d_render_job *render; + + /* Submitted tile memory allocation start/size, tile state. */ + u32 qma, qms, qts; +}; + +struct v3d_render_job { + struct v3d_job base; + + /* GPU virtual addresses of the start/end of the CL job. */ + u32 start, end; + + u32 timedout_ctca, timedout_ctra; + + /* List of overflow BOs used in the job that need to be + * released once the job is complete. + */ + struct list_head unref_list; +}; + +struct v3d_tfu_job { + struct v3d_job base; + + struct drm_v3d_submit_tfu args; +}; + +struct v3d_csd_job { + struct v3d_job base; + + u32 timedout_batches; + + struct drm_v3d_submit_csd args; +}; + +struct v3d_submit_outsync { + struct drm_syncobj *syncobj; +}; + +struct v3d_submit_ext { + u32 flags; + u32 wait_stage; + + u32 in_sync_count; + u64 in_syncs; + + u32 out_sync_count; + struct v3d_submit_outsync *out_syncs; +}; + +/** + * __wait_for - magic wait macro + * + * Macro to help avoid open coding check/wait/timeout patterns. Note that it's + * important that we check the condition again after having timed out, since the + * timeout could be due to preemption or similar and we've never had a chance to + * check the condition before the timeout. + */ +#define __wait_for(OP, COND, US, Wmin, Wmax) ({ \ + const ktime_t end__ = ktime_add_ns(ktime_get_raw(), 1000ll * (US)); \ + long wait__ = (Wmin); /* recommended min for usleep is 10 us */ \ + int ret__; \ + might_sleep(); \ + for (;;) { \ + const bool expired__ = ktime_after(ktime_get_raw(), end__); \ + OP; \ + /* Guarantee COND check prior to timeout */ \ + barrier(); \ + if (COND) { \ + ret__ = 0; \ + break; \ + } \ + if (expired__) { \ + ret__ = -ETIMEDOUT; \ + break; \ + } \ + usleep_range(wait__, wait__ * 2); \ + if (wait__ < (Wmax)) \ + wait__ <<= 1; \ + } \ + ret__; \ +}) + +#define _wait_for(COND, US, Wmin, Wmax) __wait_for(, (COND), (US), (Wmin), \ + (Wmax)) +#define wait_for(COND, MS) _wait_for((COND), (MS) * 1000, 10, 1000) + +static inline unsigned long nsecs_to_jiffies_timeout(const u64 n) +{ + /* nsecs_to_jiffies64() does not guard against overflow */ + if (NSEC_PER_SEC % HZ && + div_u64(n, NSEC_PER_SEC) >= MAX_JIFFY_OFFSET / HZ) + return MAX_JIFFY_OFFSET; + + return min_t(u64, MAX_JIFFY_OFFSET, nsecs_to_jiffies64(n) + 1); +} + +/* v3d_bo.c */ +struct drm_gem_object *v3d_create_object(struct drm_device *dev, size_t size); +void v3d_free_object(struct drm_gem_object *gem_obj); +struct v3d_bo *v3d_bo_create(struct drm_device *dev, struct drm_file *file_priv, + size_t size); +int v3d_create_bo_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_priv); +int v3d_mmap_bo_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_priv); +int v3d_get_bo_offset_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_priv); +struct drm_gem_object *v3d_prime_import_sg_table(struct drm_device *dev, + struct dma_buf_attachment *attach, + struct sg_table *sgt); + +/* v3d_debugfs.c */ +void v3d_debugfs_init(struct drm_minor *minor); + +/* v3d_fence.c */ +extern const struct dma_fence_ops v3d_fence_ops; +struct dma_fence *v3d_fence_create(struct v3d_dev *v3d, enum v3d_queue queue); + +/* v3d_gem.c */ +int v3d_gem_init(struct drm_device *dev); +void v3d_gem_destroy(struct drm_device *dev); +int v3d_submit_cl_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_priv); +int v3d_submit_tfu_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_priv); +int v3d_submit_csd_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_priv); +int v3d_wait_bo_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_priv); +void v3d_job_cleanup(struct v3d_job *job); +void v3d_job_put(struct v3d_job *job); +void v3d_reset(struct v3d_dev *v3d); +void v3d_invalidate_caches(struct v3d_dev *v3d); +void v3d_clean_caches(struct v3d_dev *v3d); + +/* v3d_irq.c */ +int v3d_irq_init(struct v3d_dev *v3d); +void v3d_irq_enable(struct v3d_dev *v3d); +void v3d_irq_disable(struct v3d_dev *v3d); +void v3d_irq_reset(struct v3d_dev *v3d); + +/* v3d_mmu.c */ +int v3d_mmu_get_offset(struct drm_file *file_priv, struct v3d_bo *bo, + u32 *offset); +int v3d_mmu_set_page_table(struct v3d_dev *v3d); +void v3d_mmu_insert_ptes(struct v3d_bo *bo); +void v3d_mmu_remove_ptes(struct v3d_bo *bo); + +/* v3d_sched.c */ +int v3d_sched_init(struct v3d_dev *v3d); +void v3d_sched_fini(struct v3d_dev *v3d); + +/* v3d_perfmon.c */ +void v3d_perfmon_get(struct v3d_perfmon *perfmon); +void v3d_perfmon_put(struct v3d_perfmon *perfmon); +void v3d_perfmon_start(struct v3d_dev *v3d, struct v3d_perfmon *perfmon); +void v3d_perfmon_stop(struct v3d_dev *v3d, struct v3d_perfmon *perfmon, + bool capture); +struct v3d_perfmon *v3d_perfmon_find(struct v3d_file_priv *v3d_priv, int id); +void v3d_perfmon_open_file(struct v3d_file_priv *v3d_priv); +void v3d_perfmon_close_file(struct v3d_file_priv *v3d_priv); +int v3d_perfmon_create_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_priv); +int v3d_perfmon_destroy_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_priv); +int v3d_perfmon_get_values_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_priv); |