aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
diff options
context:
space:
mode:
authorLibravatar Linus Torvalds <torvalds@linux-foundation.org>2023-02-21 18:24:12 -0800
committerLibravatar Linus Torvalds <torvalds@linux-foundation.org>2023-02-21 18:24:12 -0800
commit5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 (patch)
treecc5c2d0a898769fd59549594fedb3ee6f84e59a0 /drivers/gpu/drm/i915/gt/intel_gt_mcr.c
downloadlinux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.tar.gz
linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.zip
Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextgrafted
Pull networking updates from Jakub Kicinski: "Core: - Add dedicated kmem_cache for typical/small skb->head, avoid having to access struct page at kfree time, and improve memory use. - Introduce sysctl to set default RPS configuration for new netdevs. - Define Netlink protocol specification format which can be used to describe messages used by each family and auto-generate parsers. Add tools for generating kernel data structures and uAPI headers. - Expose all net/core sysctls inside netns. - Remove 4s sleep in netpoll if carrier is instantly detected on boot. - Add configurable limit of MDB entries per port, and port-vlan. - Continue populating drop reasons throughout the stack. - Retire a handful of legacy Qdiscs and classifiers. Protocols: - Support IPv4 big TCP (TSO frames larger than 64kB). - Add IP_LOCAL_PORT_RANGE socket option, to control local port range on socket by socket basis. - Track and report in procfs number of MPTCP sockets used. - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path manager. - IPv6: don't check net.ipv6.route.max_size and rely on garbage collection to free memory (similarly to IPv4). - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986). - ICMP: add per-rate limit counters. - Add support for user scanning requests in ieee802154. - Remove static WEP support. - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate reporting. - WiFi 7 EHT channel puncturing support (client & AP). BPF: - Add a rbtree data structure following the "next-gen data structure" precedent set by recently added linked list, that is, by using kfunc + kptr instead of adding a new BPF map type. - Expose XDP hints via kfuncs with initial support for RX hash and timestamp metadata. - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better support decap on GRE tunnel devices not operating in collect metadata. - Improve x86 JIT's codegen for PROBE_MEM runtime error checks. - Remove the need for trace_printk_lock for bpf_trace_printk and bpf_trace_vprintk helpers. - Extend libbpf's bpf_tracing.h support for tracing arguments of kprobes/uprobes and syscall as a special case. - Significantly reduce the search time for module symbols by livepatch and BPF. - Enable cpumasks to be used as kptrs, which is useful for tracing programs tracking which tasks end up running on which CPUs in different time intervals. - Add support for BPF trampoline on s390x and riscv64. - Add capability to export the XDP features supported by the NIC. - Add __bpf_kfunc tag for marking kernel functions as kfuncs. - Add cgroup.memory=nobpf kernel parameter option to disable BPF memory accounting for container environments. Netfilter: - Remove the CLUSTERIP target. It has been marked as obsolete for years, and we still have WARN splats wrt races of the out-of-band /proc interface installed by this target. - Add 'destroy' commands to nf_tables. They are identical to the existing 'delete' commands, but do not return an error if the referenced object (set, chain, rule...) did not exist. Driver API: - Improve cpumask_local_spread() locality to help NICs set the right IRQ affinity on AMD platforms. - Separate C22 and C45 MDIO bus transactions more clearly. - Introduce new DCB table to control DSCP rewrite on egress. - Support configuration of Physical Layer Collision Avoidance (PLCA) Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of shared medium Ethernet. - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing preemption of low priority frames by high priority frames. - Add support for controlling MACSec offload using netlink SET. - Rework devlink instance refcounts to allow registration and de-registration under the instance lock. Split the code into multiple files, drop some of the unnecessarily granular locks and factor out common parts of netlink operation handling. - Add TX frame aggregation parameters (for USB drivers). - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning messages with notifications for debug. - Allow offloading of UDP NEW connections via act_ct. - Add support for per action HW stats in TC. - Support hardware miss to TC action (continue processing in SW from a specific point in the action chain). - Warn if old Wireless Extension user space interface is used with modern cfg80211/mac80211 drivers. Do not support Wireless Extensions for Wi-Fi 7 devices at all. Everyone should switch to using nl80211 interface instead. - Improve the CAN bit timing configuration. Use extack to return error messages directly to user space, update the SJW handling, including the definition of a new default value that will benefit CAN-FD controllers, by increasing their oscillator tolerance. New hardware / drivers: - Ethernet: - nVidia BlueField-3 support (control traffic driver) - Ethernet support for imx93 SoCs - Motorcomm yt8531 gigabit Ethernet PHY - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA) - Microchip LAN8841 PHY (incl. cable diagnostics and PTP) - Amlogic gxl MDIO mux - WiFi: - RealTek RTL8188EU (rtl8xxxu) - Qualcomm Wi-Fi 7 devices (ath12k) - CAN: - Renesas R-Car V4H Drivers: - Bluetooth: - Set Per Platform Antenna Gain (PPAG) for Intel controllers. - Ethernet NICs: - Intel (1G, igc): - support TSN / Qbv / packet scheduling features of i226 model - Intel (100G, ice): - use GNSS subsystem instead of TTY - multi-buffer XDP support - extend support for GPIO pins to E823 devices - nVidia/Mellanox: - update the shared buffer configuration on PFC commands - implement PTP adjphase function for HW offset control - TC support for Geneve and GRE with VF tunnel offload - more efficient crypto key management method - multi-port eswitch support - Netronome/Corigine: - add DCB IEEE support - support IPsec offloading for NFP3800 - Freescale/NXP (enetc): - support XDP_REDIRECT for XDP non-linear buffers - improve reconfig, avoid link flap and waiting for idle - support MAC Merge layer - Other NICs: - sfc/ef100: add basic devlink support for ef100 - ionic: rx_push mode operation (writing descriptors via MMIO) - bnxt: use the auxiliary bus abstraction for RDMA - r8169: disable ASPM and reset bus in case of tx timeout - cpsw: support QSGMII mode for J721e CPSW9G - cpts: support pulse-per-second output - ngbe: add an mdio bus driver - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing - r8152: handle devices with FW with NCM support - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation - virtio-net: support multi buffer XDP - virtio/vsock: replace virtio_vsock_pkt with sk_buff - tsnep: XDP support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add support for latency TLV (in FW control messages) - Microchip (sparx5): - separate explicit and implicit traffic forwarding rules, make the implicit rules always active - add support for egress DSCP rewrite - IS0 VCAP support (Ingress Classification) - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS etc.) - ES2 VCAP support (Egress Access Control) - support for Per-Stream Filtering and Policing (802.1Q, 8.6.5.1) - Ethernet embedded switches: - Marvell (mv88e6xxx): - add MAB (port auth) offload support - enable PTP receive for mv88e6390 - NXP (ocelot): - support MAC Merge layer - support for the the vsc7512 internal copper phys - Microchip: - lan9303: convert to PHYLINK - lan966x: support TC flower filter statistics - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x - lan937x: support Credit Based Shaper configuration - ksz9477: support Energy Efficient Ethernet - other: - qca8k: convert to regmap read/write API, use bulk operations - rswitch: Improve TX timestamp accuracy - Intel WiFi (iwlwifi): - EHT (Wi-Fi 7) rate reporting - STEP equalizer support: transfer some STEP (connection to radio on platforms with integrated wifi) related parameters from the BIOS to the firmware. - Qualcomm 802.11ax WiFi (ath11k): - IPQ5018 support - Fine Timing Measurement (FTM) responder role support - channel 177 support - MediaTek WiFi (mt76): - per-PHY LED support - mt7996: EHT (Wi-Fi 7) support - Wireless Ethernet Dispatch (WED) reset support - switch to using page pool allocator - RealTek WiFi (rtw89): - support new version of Bluetooth co-existance - Mobile: - rmnet: support TX aggregation" * tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits) page_pool: add a comment explaining the fragment counter usage net: ethtool: fix __ethtool_dev_mm_supported() implementation ethtool: pse-pd: Fix double word in comments xsk: add linux/vmalloc.h to xsk.c sefltests: netdevsim: wait for devlink instance after netns removal selftest: fib_tests: Always cleanup before exit net/mlx5e: Align IPsec ASO result memory to be as required by hardware net/mlx5e: TC, Set CT miss to the specific ct action instance net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG net/mlx5: Refactor tc miss handling to a single function net/mlx5: Kconfig: Make tc offload depend on tc skb extension net/sched: flower: Support hardware miss to tc action net/sched: flower: Move filter handle initialization earlier net/sched: cls_api: Support hardware miss to tc action net/sched: Rename user cookie and act cookie sfc: fix builds without CONFIG_RTC_LIB sfc: clean up some inconsistent indentings net/mlx4_en: Introduce flexible array to silence overflow warning net: lan966x: Fix possible deadlock inside PTP net/ulp: Remove redundant ->clone() test in inet_clone_ulp(). ...
Diffstat (limited to 'drivers/gpu/drm/i915/gt/intel_gt_mcr.c')
-rw-r--r--drivers/gpu/drm/i915/gt/intel_gt_mcr.c759
1 files changed, 759 insertions, 0 deletions
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
new file mode 100644
index 000000000..ea86c1ab5
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
@@ -0,0 +1,759 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+#include "i915_drv.h"
+
+#include "intel_gt_mcr.h"
+#include "intel_gt_regs.h"
+
+/**
+ * DOC: GT Multicast/Replicated (MCR) Register Support
+ *
+ * Some GT registers are designed as "multicast" or "replicated" registers:
+ * multiple instances of the same register share a single MMIO offset. MCR
+ * registers are generally used when the hardware needs to potentially track
+ * independent values of a register per hardware unit (e.g., per-subslice,
+ * per-L3bank, etc.). The specific types of replication that exist vary
+ * per-platform.
+ *
+ * MMIO accesses to MCR registers are controlled according to the settings
+ * programmed in the platform's MCR_SELECTOR register(s). MMIO writes to MCR
+ * registers can be done in either a (i.e., a single write updates all
+ * instances of the register to the same value) or unicast (a write updates only
+ * one specific instance). Reads of MCR registers always operate in a unicast
+ * manner regardless of how the multicast/unicast bit is set in MCR_SELECTOR.
+ * Selection of a specific MCR instance for unicast operations is referred to
+ * as "steering."
+ *
+ * If MCR register operations are steered toward a hardware unit that is
+ * fused off or currently powered down due to power gating, the MMIO operation
+ * is "terminated" by the hardware. Terminated read operations will return a
+ * value of zero and terminated unicast write operations will be silently
+ * ignored.
+ */
+
+#define HAS_MSLICE_STEERING(dev_priv) (INTEL_INFO(dev_priv)->has_mslice_steering)
+
+static const char * const intel_steering_types[] = {
+ "L3BANK",
+ "MSLICE",
+ "LNCF",
+ "GAM",
+ "DSS",
+ "OADDRM",
+ "INSTANCE 0",
+};
+
+static const struct intel_mmio_range icl_l3bank_steering_table[] = {
+ { 0x00B100, 0x00B3FF },
+ {},
+};
+
+/*
+ * Although the bspec lists more "MSLICE" ranges than shown here, some of those
+ * are of a "GAM" subclass that has special rules. Thus we use a separate
+ * GAM table farther down for those.
+ */
+static const struct intel_mmio_range xehpsdv_mslice_steering_table[] = {
+ { 0x00DD00, 0x00DDFF },
+ { 0x00E900, 0x00FFFF }, /* 0xEA00 - OxEFFF is unused */
+ {},
+};
+
+static const struct intel_mmio_range xehpsdv_gam_steering_table[] = {
+ { 0x004000, 0x004AFF },
+ { 0x00C800, 0x00CFFF },
+ {},
+};
+
+static const struct intel_mmio_range xehpsdv_lncf_steering_table[] = {
+ { 0x00B000, 0x00B0FF },
+ { 0x00D800, 0x00D8FF },
+ {},
+};
+
+static const struct intel_mmio_range dg2_lncf_steering_table[] = {
+ { 0x00B000, 0x00B0FF },
+ { 0x00D880, 0x00D8FF },
+ {},
+};
+
+/*
+ * We have several types of MCR registers on PVC where steering to (0,0)
+ * will always provide us with a non-terminated value. We'll stick them
+ * all in the same table for simplicity.
+ */
+static const struct intel_mmio_range pvc_instance0_steering_table[] = {
+ { 0x004000, 0x004AFF }, /* HALF-BSLICE */
+ { 0x008800, 0x00887F }, /* CC */
+ { 0x008A80, 0x008AFF }, /* TILEPSMI */
+ { 0x00B000, 0x00B0FF }, /* HALF-BSLICE */
+ { 0x00B100, 0x00B3FF }, /* L3BANK */
+ { 0x00C800, 0x00CFFF }, /* HALF-BSLICE */
+ { 0x00D800, 0x00D8FF }, /* HALF-BSLICE */
+ { 0x00DD00, 0x00DDFF }, /* BSLICE */
+ { 0x00E900, 0x00E9FF }, /* HALF-BSLICE */
+ { 0x00EC00, 0x00EEFF }, /* HALF-BSLICE */
+ { 0x00F000, 0x00FFFF }, /* HALF-BSLICE */
+ { 0x024180, 0x0241FF }, /* HALF-BSLICE */
+ {},
+};
+
+static const struct intel_mmio_range xelpg_instance0_steering_table[] = {
+ { 0x000B00, 0x000BFF }, /* SQIDI */
+ { 0x001000, 0x001FFF }, /* SQIDI */
+ { 0x004000, 0x0048FF }, /* GAM */
+ { 0x008700, 0x0087FF }, /* SQIDI */
+ { 0x00B000, 0x00B0FF }, /* NODE */
+ { 0x00C800, 0x00CFFF }, /* GAM */
+ { 0x00D880, 0x00D8FF }, /* NODE */
+ { 0x00DD00, 0x00DDFF }, /* OAAL2 */
+ {},
+};
+
+static const struct intel_mmio_range xelpg_l3bank_steering_table[] = {
+ { 0x00B100, 0x00B3FF },
+ {},
+};
+
+/* DSS steering is used for SLICE ranges as well */
+static const struct intel_mmio_range xelpg_dss_steering_table[] = {
+ { 0x005200, 0x0052FF }, /* SLICE */
+ { 0x005500, 0x007FFF }, /* SLICE */
+ { 0x008140, 0x00815F }, /* SLICE (0x8140-0x814F), DSS (0x8150-0x815F) */
+ { 0x0094D0, 0x00955F }, /* SLICE (0x94D0-0x951F), DSS (0x9520-0x955F) */
+ { 0x009680, 0x0096FF }, /* DSS */
+ { 0x00D800, 0x00D87F }, /* SLICE */
+ { 0x00DC00, 0x00DCFF }, /* SLICE */
+ { 0x00DE80, 0x00E8FF }, /* DSS (0xE000-0xE0FF reserved) */
+ {},
+};
+
+static const struct intel_mmio_range xelpmp_oaddrm_steering_table[] = {
+ { 0x393200, 0x39323F },
+ { 0x393400, 0x3934FF },
+ {},
+};
+
+void intel_gt_mcr_init(struct intel_gt *gt)
+{
+ struct drm_i915_private *i915 = gt->i915;
+ unsigned long fuse;
+ int i;
+
+ /*
+ * An mslice is unavailable only if both the meml3 for the slice is
+ * disabled *and* all of the DSS in the slice (quadrant) are disabled.
+ */
+ if (HAS_MSLICE_STEERING(i915)) {
+ gt->info.mslice_mask =
+ intel_slicemask_from_xehp_dssmask(gt->info.sseu.subslice_mask,
+ GEN_DSS_PER_MSLICE);
+ gt->info.mslice_mask |=
+ (intel_uncore_read(gt->uncore, GEN10_MIRROR_FUSE3) &
+ GEN12_MEML3_EN_MASK);
+
+ if (!gt->info.mslice_mask) /* should be impossible! */
+ drm_warn(&i915->drm, "mslice mask all zero!\n");
+ }
+
+ if (MEDIA_VER(i915) >= 13 && gt->type == GT_MEDIA) {
+ gt->steering_table[OADDRM] = xelpmp_oaddrm_steering_table;
+ } else if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70)) {
+ fuse = REG_FIELD_GET(GT_L3_EXC_MASK,
+ intel_uncore_read(gt->uncore, XEHP_FUSE4));
+
+ /*
+ * Despite the register field being named "exclude mask" the
+ * bits actually represent enabled banks (two banks per bit).
+ */
+ for_each_set_bit(i, &fuse, 3)
+ gt->info.l3bank_mask |= 0x3 << 2 * i;
+
+ gt->steering_table[INSTANCE0] = xelpg_instance0_steering_table;
+ gt->steering_table[L3BANK] = xelpg_l3bank_steering_table;
+ gt->steering_table[DSS] = xelpg_dss_steering_table;
+ } else if (IS_PONTEVECCHIO(i915)) {
+ gt->steering_table[INSTANCE0] = pvc_instance0_steering_table;
+ } else if (IS_DG2(i915)) {
+ gt->steering_table[MSLICE] = xehpsdv_mslice_steering_table;
+ gt->steering_table[LNCF] = dg2_lncf_steering_table;
+ /*
+ * No need to hook up the GAM table since it has a dedicated
+ * steering control register on DG2 and can use implicit
+ * steering.
+ */
+ } else if (IS_XEHPSDV(i915)) {
+ gt->steering_table[MSLICE] = xehpsdv_mslice_steering_table;
+ gt->steering_table[LNCF] = xehpsdv_lncf_steering_table;
+ gt->steering_table[GAM] = xehpsdv_gam_steering_table;
+ } else if (GRAPHICS_VER(i915) >= 11 &&
+ GRAPHICS_VER_FULL(i915) < IP_VER(12, 50)) {
+ gt->steering_table[L3BANK] = icl_l3bank_steering_table;
+ gt->info.l3bank_mask =
+ ~intel_uncore_read(gt->uncore, GEN10_MIRROR_FUSE3) &
+ GEN10_L3BANK_MASK;
+ if (!gt->info.l3bank_mask) /* should be impossible! */
+ drm_warn(&i915->drm, "L3 bank mask is all zero!\n");
+ } else if (GRAPHICS_VER(i915) >= 11) {
+ /*
+ * We expect all modern platforms to have at least some
+ * type of steering that needs to be initialized.
+ */
+ MISSING_CASE(INTEL_INFO(i915)->platform);
+ }
+}
+
+/*
+ * Although the rest of the driver should use MCR-specific functions to
+ * read/write MCR registers, we still use the regular intel_uncore_* functions
+ * internally to implement those, so we need a way for the functions in this
+ * file to "cast" an i915_mcr_reg_t into an i915_reg_t.
+ */
+static i915_reg_t mcr_reg_cast(const i915_mcr_reg_t mcr)
+{
+ i915_reg_t r = { .reg = mcr.reg };
+
+ return r;
+}
+
+/*
+ * rw_with_mcr_steering_fw - Access a register with specific MCR steering
+ * @uncore: pointer to struct intel_uncore
+ * @reg: register being accessed
+ * @rw_flag: FW_REG_READ for read access or FW_REG_WRITE for write access
+ * @group: group number (documented as "sliceid" on older platforms)
+ * @instance: instance number (documented as "subsliceid" on older platforms)
+ * @value: register value to be written (ignored for read)
+ *
+ * Return: 0 for write access. register value for read access.
+ *
+ * Caller needs to make sure the relevant forcewake wells are up.
+ */
+static u32 rw_with_mcr_steering_fw(struct intel_uncore *uncore,
+ i915_mcr_reg_t reg, u8 rw_flag,
+ int group, int instance, u32 value)
+{
+ u32 mcr_mask, mcr_ss, mcr, old_mcr, val = 0;
+
+ lockdep_assert_held(&uncore->lock);
+
+ if (GRAPHICS_VER_FULL(uncore->i915) >= IP_VER(12, 70)) {
+ /*
+ * Always leave the hardware in multicast mode when doing reads
+ * (see comment about Wa_22013088509 below) and only change it
+ * to unicast mode when doing writes of a specific instance.
+ *
+ * No need to save old steering reg value.
+ */
+ intel_uncore_write_fw(uncore, MTL_MCR_SELECTOR,
+ REG_FIELD_PREP(MTL_MCR_GROUPID, group) |
+ REG_FIELD_PREP(MTL_MCR_INSTANCEID, instance) |
+ (rw_flag == FW_REG_READ ? GEN11_MCR_MULTICAST : 0));
+ } else if (GRAPHICS_VER(uncore->i915) >= 11) {
+ mcr_mask = GEN11_MCR_SLICE_MASK | GEN11_MCR_SUBSLICE_MASK;
+ mcr_ss = GEN11_MCR_SLICE(group) | GEN11_MCR_SUBSLICE(instance);
+
+ /*
+ * Wa_22013088509
+ *
+ * The setting of the multicast/unicast bit usually wouldn't
+ * matter for read operations (which always return the value
+ * from a single register instance regardless of how that bit
+ * is set), but some platforms have a workaround requiring us
+ * to remain in multicast mode for reads. There's no real
+ * downside to this, so we'll just go ahead and do so on all
+ * platforms; we'll only clear the multicast bit from the mask
+ * when exlicitly doing a write operation.
+ */
+ if (rw_flag == FW_REG_WRITE)
+ mcr_mask |= GEN11_MCR_MULTICAST;
+
+ mcr = intel_uncore_read_fw(uncore, GEN8_MCR_SELECTOR);
+ old_mcr = mcr;
+
+ mcr &= ~mcr_mask;
+ mcr |= mcr_ss;
+ intel_uncore_write_fw(uncore, GEN8_MCR_SELECTOR, mcr);
+ } else {
+ mcr_mask = GEN8_MCR_SLICE_MASK | GEN8_MCR_SUBSLICE_MASK;
+ mcr_ss = GEN8_MCR_SLICE(group) | GEN8_MCR_SUBSLICE(instance);
+
+ mcr = intel_uncore_read_fw(uncore, GEN8_MCR_SELECTOR);
+ old_mcr = mcr;
+
+ mcr &= ~mcr_mask;
+ mcr |= mcr_ss;
+ intel_uncore_write_fw(uncore, GEN8_MCR_SELECTOR, mcr);
+ }
+
+ if (rw_flag == FW_REG_READ)
+ val = intel_uncore_read_fw(uncore, mcr_reg_cast(reg));
+ else
+ intel_uncore_write_fw(uncore, mcr_reg_cast(reg), value);
+
+ /*
+ * For pre-MTL platforms, we need to restore the old value of the
+ * steering control register to ensure that implicit steering continues
+ * to behave as expected. For MTL and beyond, we need only reinstate
+ * the 'multicast' bit (and only if we did a write that cleared it).
+ */
+ if (GRAPHICS_VER_FULL(uncore->i915) >= IP_VER(12, 70) && rw_flag == FW_REG_WRITE)
+ intel_uncore_write_fw(uncore, MTL_MCR_SELECTOR, GEN11_MCR_MULTICAST);
+ else if (GRAPHICS_VER_FULL(uncore->i915) < IP_VER(12, 70))
+ intel_uncore_write_fw(uncore, GEN8_MCR_SELECTOR, old_mcr);
+
+ return val;
+}
+
+static u32 rw_with_mcr_steering(struct intel_uncore *uncore,
+ i915_mcr_reg_t reg, u8 rw_flag,
+ int group, int instance,
+ u32 value)
+{
+ enum forcewake_domains fw_domains;
+ u32 val;
+
+ fw_domains = intel_uncore_forcewake_for_reg(uncore, mcr_reg_cast(reg),
+ rw_flag);
+ fw_domains |= intel_uncore_forcewake_for_reg(uncore,
+ GEN8_MCR_SELECTOR,
+ FW_REG_READ | FW_REG_WRITE);
+
+ spin_lock_irq(&uncore->lock);
+ intel_uncore_forcewake_get__locked(uncore, fw_domains);
+
+ val = rw_with_mcr_steering_fw(uncore, reg, rw_flag, group, instance, value);
+
+ intel_uncore_forcewake_put__locked(uncore, fw_domains);
+ spin_unlock_irq(&uncore->lock);
+
+ return val;
+}
+
+/**
+ * intel_gt_mcr_read - read a specific instance of an MCR register
+ * @gt: GT structure
+ * @reg: the MCR register to read
+ * @group: the MCR group
+ * @instance: the MCR instance
+ *
+ * Returns the value read from an MCR register after steering toward a specific
+ * group/instance.
+ */
+u32 intel_gt_mcr_read(struct intel_gt *gt,
+ i915_mcr_reg_t reg,
+ int group, int instance)
+{
+ return rw_with_mcr_steering(gt->uncore, reg, FW_REG_READ, group, instance, 0);
+}
+
+/**
+ * intel_gt_mcr_unicast_write - write a specific instance of an MCR register
+ * @gt: GT structure
+ * @reg: the MCR register to write
+ * @value: value to write
+ * @group: the MCR group
+ * @instance: the MCR instance
+ *
+ * Write an MCR register in unicast mode after steering toward a specific
+ * group/instance.
+ */
+void intel_gt_mcr_unicast_write(struct intel_gt *gt, i915_mcr_reg_t reg, u32 value,
+ int group, int instance)
+{
+ rw_with_mcr_steering(gt->uncore, reg, FW_REG_WRITE, group, instance, value);
+}
+
+/**
+ * intel_gt_mcr_multicast_write - write a value to all instances of an MCR register
+ * @gt: GT structure
+ * @reg: the MCR register to write
+ * @value: value to write
+ *
+ * Write an MCR register in multicast mode to update all instances.
+ */
+void intel_gt_mcr_multicast_write(struct intel_gt *gt,
+ i915_mcr_reg_t reg, u32 value)
+{
+ /*
+ * Ensure we have multicast behavior, just in case some non-i915 agent
+ * left the hardware in unicast mode.
+ */
+ if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70))
+ intel_uncore_write_fw(gt->uncore, MTL_MCR_SELECTOR, GEN11_MCR_MULTICAST);
+
+ intel_uncore_write(gt->uncore, mcr_reg_cast(reg), value);
+}
+
+/**
+ * intel_gt_mcr_multicast_write_fw - write a value to all instances of an MCR register
+ * @gt: GT structure
+ * @reg: the MCR register to write
+ * @value: value to write
+ *
+ * Write an MCR register in multicast mode to update all instances. This
+ * function assumes the caller is already holding any necessary forcewake
+ * domains; use intel_gt_mcr_multicast_write() in cases where forcewake should
+ * be obtained automatically.
+ */
+void intel_gt_mcr_multicast_write_fw(struct intel_gt *gt, i915_mcr_reg_t reg, u32 value)
+{
+ /*
+ * Ensure we have multicast behavior, just in case some non-i915 agent
+ * left the hardware in unicast mode.
+ */
+ if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70))
+ intel_uncore_write_fw(gt->uncore, MTL_MCR_SELECTOR, GEN11_MCR_MULTICAST);
+
+ intel_uncore_write_fw(gt->uncore, mcr_reg_cast(reg), value);
+}
+
+/**
+ * intel_gt_mcr_multicast_rmw - Performs a multicast RMW operations
+ * @gt: GT structure
+ * @reg: the MCR register to read and write
+ * @clear: bits to clear during RMW
+ * @set: bits to set during RMW
+ *
+ * Performs a read-modify-write on an MCR register in a multicast manner.
+ * This operation only makes sense on MCR registers where all instances are
+ * expected to have the same value. The read will target any non-terminated
+ * instance and the write will be applied to all instances.
+ *
+ * This function assumes the caller is already holding any necessary forcewake
+ * domains; use intel_gt_mcr_multicast_rmw() in cases where forcewake should
+ * be obtained automatically.
+ *
+ * Returns the old (unmodified) value read.
+ */
+u32 intel_gt_mcr_multicast_rmw(struct intel_gt *gt, i915_mcr_reg_t reg,
+ u32 clear, u32 set)
+{
+ u32 val = intel_gt_mcr_read_any(gt, reg);
+
+ intel_gt_mcr_multicast_write(gt, reg, (val & ~clear) | set);
+
+ return val;
+}
+
+/*
+ * reg_needs_read_steering - determine whether a register read requires
+ * explicit steering
+ * @gt: GT structure
+ * @reg: the register to check steering requirements for
+ * @type: type of multicast steering to check
+ *
+ * Determines whether @reg needs explicit steering of a specific type for
+ * reads.
+ *
+ * Returns false if @reg does not belong to a register range of the given
+ * steering type, or if the default (subslice-based) steering IDs are suitable
+ * for @type steering too.
+ */
+static bool reg_needs_read_steering(struct intel_gt *gt,
+ i915_mcr_reg_t reg,
+ enum intel_steering_type type)
+{
+ const u32 offset = i915_mmio_reg_offset(reg);
+ const struct intel_mmio_range *entry;
+
+ if (likely(!gt->steering_table[type]))
+ return false;
+
+ for (entry = gt->steering_table[type]; entry->end; entry++) {
+ if (offset >= entry->start && offset <= entry->end)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * get_nonterminated_steering - determines valid IDs for a class of MCR steering
+ * @gt: GT structure
+ * @type: multicast register type
+ * @group: Group ID returned
+ * @instance: Instance ID returned
+ *
+ * Determines group and instance values that will steer reads of the specified
+ * MCR class to a non-terminated instance.
+ */
+static void get_nonterminated_steering(struct intel_gt *gt,
+ enum intel_steering_type type,
+ u8 *group, u8 *instance)
+{
+ u32 dss;
+
+ switch (type) {
+ case L3BANK:
+ *group = 0; /* unused */
+ *instance = __ffs(gt->info.l3bank_mask);
+ break;
+ case MSLICE:
+ GEM_WARN_ON(!HAS_MSLICE_STEERING(gt->i915));
+ *group = __ffs(gt->info.mslice_mask);
+ *instance = 0; /* unused */
+ break;
+ case LNCF:
+ /*
+ * An LNCF is always present if its mslice is present, so we
+ * can safely just steer to LNCF 0 in all cases.
+ */
+ GEM_WARN_ON(!HAS_MSLICE_STEERING(gt->i915));
+ *group = __ffs(gt->info.mslice_mask) << 1;
+ *instance = 0; /* unused */
+ break;
+ case GAM:
+ *group = IS_DG2(gt->i915) ? 1 : 0;
+ *instance = 0;
+ break;
+ case DSS:
+ dss = intel_sseu_find_first_xehp_dss(&gt->info.sseu, 0, 0);
+ *group = dss / GEN_DSS_PER_GSLICE;
+ *instance = dss % GEN_DSS_PER_GSLICE;
+ break;
+ case INSTANCE0:
+ /*
+ * There are a lot of MCR types for which instance (0, 0)
+ * will always provide a non-terminated value.
+ */
+ *group = 0;
+ *instance = 0;
+ break;
+ case OADDRM:
+ if ((VDBOX_MASK(gt) | VEBOX_MASK(gt) | gt->info.sfc_mask) & BIT(0))
+ *group = 0;
+ else
+ *group = 1;
+ *instance = 0;
+ break;
+ default:
+ MISSING_CASE(type);
+ *group = 0;
+ *instance = 0;
+ }
+}
+
+/**
+ * intel_gt_mcr_get_nonterminated_steering - find group/instance values that
+ * will steer a register to a non-terminated instance
+ * @gt: GT structure
+ * @reg: register for which the steering is required
+ * @group: return variable for group steering
+ * @instance: return variable for instance steering
+ *
+ * This function returns a group/instance pair that is guaranteed to work for
+ * read steering of the given register. Note that a value will be returned even
+ * if the register is not replicated and therefore does not actually require
+ * steering.
+ */
+void intel_gt_mcr_get_nonterminated_steering(struct intel_gt *gt,
+ i915_mcr_reg_t reg,
+ u8 *group, u8 *instance)
+{
+ int type;
+
+ for (type = 0; type < NUM_STEERING_TYPES; type++) {
+ if (reg_needs_read_steering(gt, reg, type)) {
+ get_nonterminated_steering(gt, type, group, instance);
+ return;
+ }
+ }
+
+ *group = gt->default_steering.groupid;
+ *instance = gt->default_steering.instanceid;
+}
+
+/**
+ * intel_gt_mcr_read_any_fw - reads one instance of an MCR register
+ * @gt: GT structure
+ * @reg: register to read
+ *
+ * Reads a GT MCR register. The read will be steered to a non-terminated
+ * instance (i.e., one that isn't fused off or powered down by power gating).
+ * This function assumes the caller is already holding any necessary forcewake
+ * domains; use intel_gt_mcr_read_any() in cases where forcewake should be
+ * obtained automatically.
+ *
+ * Returns the value from a non-terminated instance of @reg.
+ */
+u32 intel_gt_mcr_read_any_fw(struct intel_gt *gt, i915_mcr_reg_t reg)
+{
+ int type;
+ u8 group, instance;
+
+ for (type = 0; type < NUM_STEERING_TYPES; type++) {
+ if (reg_needs_read_steering(gt, reg, type)) {
+ get_nonterminated_steering(gt, type, &group, &instance);
+ return rw_with_mcr_steering_fw(gt->uncore, reg,
+ FW_REG_READ,
+ group, instance, 0);
+ }
+ }
+
+ return intel_uncore_read_fw(gt->uncore, mcr_reg_cast(reg));
+}
+
+/**
+ * intel_gt_mcr_read_any - reads one instance of an MCR register
+ * @gt: GT structure
+ * @reg: register to read
+ *
+ * Reads a GT MCR register. The read will be steered to a non-terminated
+ * instance (i.e., one that isn't fused off or powered down by power gating).
+ *
+ * Returns the value from a non-terminated instance of @reg.
+ */
+u32 intel_gt_mcr_read_any(struct intel_gt *gt, i915_mcr_reg_t reg)
+{
+ int type;
+ u8 group, instance;
+
+ for (type = 0; type < NUM_STEERING_TYPES; type++) {
+ if (reg_needs_read_steering(gt, reg, type)) {
+ get_nonterminated_steering(gt, type, &group, &instance);
+ return rw_with_mcr_steering(gt->uncore, reg,
+ FW_REG_READ,
+ group, instance, 0);
+ }
+ }
+
+ return intel_uncore_read(gt->uncore, mcr_reg_cast(reg));
+}
+
+static void report_steering_type(struct drm_printer *p,
+ struct intel_gt *gt,
+ enum intel_steering_type type,
+ bool dump_table)
+{
+ const struct intel_mmio_range *entry;
+ u8 group, instance;
+
+ BUILD_BUG_ON(ARRAY_SIZE(intel_steering_types) != NUM_STEERING_TYPES);
+
+ if (!gt->steering_table[type]) {
+ drm_printf(p, "%s steering: uses default steering\n",
+ intel_steering_types[type]);
+ return;
+ }
+
+ get_nonterminated_steering(gt, type, &group, &instance);
+ drm_printf(p, "%s steering: group=0x%x, instance=0x%x\n",
+ intel_steering_types[type], group, instance);
+
+ if (!dump_table)
+ return;
+
+ for (entry = gt->steering_table[type]; entry->end; entry++)
+ drm_printf(p, "\t0x%06x - 0x%06x\n", entry->start, entry->end);
+}
+
+void intel_gt_mcr_report_steering(struct drm_printer *p, struct intel_gt *gt,
+ bool dump_table)
+{
+ /*
+ * Starting with MTL we no longer have default steering;
+ * all ranges are explicitly steered.
+ */
+ if (GRAPHICS_VER_FULL(gt->i915) < IP_VER(12, 70))
+ drm_printf(p, "Default steering: group=0x%x, instance=0x%x\n",
+ gt->default_steering.groupid,
+ gt->default_steering.instanceid);
+
+ if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70)) {
+ for (int i = 0; i < NUM_STEERING_TYPES; i++)
+ if (gt->steering_table[i])
+ report_steering_type(p, gt, i, dump_table);
+ } else if (IS_PONTEVECCHIO(gt->i915)) {
+ report_steering_type(p, gt, INSTANCE0, dump_table);
+ } else if (HAS_MSLICE_STEERING(gt->i915)) {
+ report_steering_type(p, gt, MSLICE, dump_table);
+ report_steering_type(p, gt, LNCF, dump_table);
+ }
+}
+
+/**
+ * intel_gt_mcr_get_ss_steering - returns the group/instance steering for a SS
+ * @gt: GT structure
+ * @dss: DSS ID to obtain steering for
+ * @group: pointer to storage for steering group ID
+ * @instance: pointer to storage for steering instance ID
+ *
+ * Returns the steering IDs (via the @group and @instance parameters) that
+ * correspond to a specific subslice/DSS ID.
+ */
+void intel_gt_mcr_get_ss_steering(struct intel_gt *gt, unsigned int dss,
+ unsigned int *group, unsigned int *instance)
+{
+ if (IS_PONTEVECCHIO(gt->i915)) {
+ *group = dss / GEN_DSS_PER_CSLICE;
+ *instance = dss % GEN_DSS_PER_CSLICE;
+ } else if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 50)) {
+ *group = dss / GEN_DSS_PER_GSLICE;
+ *instance = dss % GEN_DSS_PER_GSLICE;
+ } else {
+ *group = dss / GEN_MAX_SS_PER_HSW_SLICE;
+ *instance = dss % GEN_MAX_SS_PER_HSW_SLICE;
+ return;
+ }
+}
+
+/**
+ * intel_gt_mcr_wait_for_reg - wait until MCR register matches expected state
+ * @gt: GT structure
+ * @reg: the register to read
+ * @mask: mask to apply to register value
+ * @value: value to wait for
+ * @fast_timeout_us: fast timeout in microsecond for atomic/tight wait
+ * @slow_timeout_ms: slow timeout in millisecond
+ *
+ * This routine waits until the target register @reg contains the expected
+ * @value after applying the @mask, i.e. it waits until ::
+ *
+ * (intel_gt_mcr_read_any_fw(gt, reg) & mask) == value
+ *
+ * Otherwise, the wait will timeout after @slow_timeout_ms milliseconds.
+ * For atomic context @slow_timeout_ms must be zero and @fast_timeout_us
+ * must be not larger than 20,0000 microseconds.
+ *
+ * This function is basically an MCR-friendly version of
+ * __intel_wait_for_register_fw(). Generally this function will only be used
+ * on GAM registers which are a bit special --- although they're MCR registers,
+ * reads (e.g., waiting for status updates) are always directed to the primary
+ * instance.
+ *
+ * Note that this routine assumes the caller holds forcewake asserted, it is
+ * not suitable for very long waits.
+ *
+ * Return: 0 if the register matches the desired condition, or -ETIMEDOUT.
+ */
+int intel_gt_mcr_wait_for_reg(struct intel_gt *gt,
+ i915_mcr_reg_t reg,
+ u32 mask,
+ u32 value,
+ unsigned int fast_timeout_us,
+ unsigned int slow_timeout_ms)
+{
+ int ret;
+
+ lockdep_assert_not_held(&gt->uncore->lock);
+
+#define done ((intel_gt_mcr_read_any(gt, reg) & mask) == value)
+
+ /* Catch any overuse of this function */
+ might_sleep_if(slow_timeout_ms);
+ GEM_BUG_ON(fast_timeout_us > 20000);
+ GEM_BUG_ON(!fast_timeout_us && !slow_timeout_ms);
+
+ ret = -ETIMEDOUT;
+ if (fast_timeout_us && fast_timeout_us <= 20000)
+ ret = _wait_for_atomic(done, fast_timeout_us, 0);
+ if (ret && slow_timeout_ms)
+ ret = wait_for(done, slow_timeout_ms);
+
+ return ret;
+#undef done
+}