From 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Tue, 21 Feb 2023 18:24:12 -0800 Subject: Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - Add dedicated kmem_cache for typical/small skb->head, avoid having to access struct page at kfree time, and improve memory use. - Introduce sysctl to set default RPS configuration for new netdevs. - Define Netlink protocol specification format which can be used to describe messages used by each family and auto-generate parsers. Add tools for generating kernel data structures and uAPI headers. - Expose all net/core sysctls inside netns. - Remove 4s sleep in netpoll if carrier is instantly detected on boot. - Add configurable limit of MDB entries per port, and port-vlan. - Continue populating drop reasons throughout the stack. - Retire a handful of legacy Qdiscs and classifiers. Protocols: - Support IPv4 big TCP (TSO frames larger than 64kB). - Add IP_LOCAL_PORT_RANGE socket option, to control local port range on socket by socket basis. - Track and report in procfs number of MPTCP sockets used. - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path manager. - IPv6: don't check net.ipv6.route.max_size and rely on garbage collection to free memory (similarly to IPv4). - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986). - ICMP: add per-rate limit counters. - Add support for user scanning requests in ieee802154. - Remove static WEP support. - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate reporting. - WiFi 7 EHT channel puncturing support (client & AP). BPF: - Add a rbtree data structure following the "next-gen data structure" precedent set by recently added linked list, that is, by using kfunc + kptr instead of adding a new BPF map type. - Expose XDP hints via kfuncs with initial support for RX hash and timestamp metadata. - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better support decap on GRE tunnel devices not operating in collect metadata. - Improve x86 JIT's codegen for PROBE_MEM runtime error checks. - Remove the need for trace_printk_lock for bpf_trace_printk and bpf_trace_vprintk helpers. - Extend libbpf's bpf_tracing.h support for tracing arguments of kprobes/uprobes and syscall as a special case. - Significantly reduce the search time for module symbols by livepatch and BPF. - Enable cpumasks to be used as kptrs, which is useful for tracing programs tracking which tasks end up running on which CPUs in different time intervals. - Add support for BPF trampoline on s390x and riscv64. - Add capability to export the XDP features supported by the NIC. - Add __bpf_kfunc tag for marking kernel functions as kfuncs. - Add cgroup.memory=nobpf kernel parameter option to disable BPF memory accounting for container environments. Netfilter: - Remove the CLUSTERIP target. It has been marked as obsolete for years, and we still have WARN splats wrt races of the out-of-band /proc interface installed by this target. - Add 'destroy' commands to nf_tables. They are identical to the existing 'delete' commands, but do not return an error if the referenced object (set, chain, rule...) did not exist. Driver API: - Improve cpumask_local_spread() locality to help NICs set the right IRQ affinity on AMD platforms. - Separate C22 and C45 MDIO bus transactions more clearly. - Introduce new DCB table to control DSCP rewrite on egress. - Support configuration of Physical Layer Collision Avoidance (PLCA) Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of shared medium Ethernet. - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing preemption of low priority frames by high priority frames. - Add support for controlling MACSec offload using netlink SET. - Rework devlink instance refcounts to allow registration and de-registration under the instance lock. Split the code into multiple files, drop some of the unnecessarily granular locks and factor out common parts of netlink operation handling. - Add TX frame aggregation parameters (for USB drivers). - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning messages with notifications for debug. - Allow offloading of UDP NEW connections via act_ct. - Add support for per action HW stats in TC. - Support hardware miss to TC action (continue processing in SW from a specific point in the action chain). - Warn if old Wireless Extension user space interface is used with modern cfg80211/mac80211 drivers. Do not support Wireless Extensions for Wi-Fi 7 devices at all. Everyone should switch to using nl80211 interface instead. - Improve the CAN bit timing configuration. Use extack to return error messages directly to user space, update the SJW handling, including the definition of a new default value that will benefit CAN-FD controllers, by increasing their oscillator tolerance. New hardware / drivers: - Ethernet: - nVidia BlueField-3 support (control traffic driver) - Ethernet support for imx93 SoCs - Motorcomm yt8531 gigabit Ethernet PHY - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA) - Microchip LAN8841 PHY (incl. cable diagnostics and PTP) - Amlogic gxl MDIO mux - WiFi: - RealTek RTL8188EU (rtl8xxxu) - Qualcomm Wi-Fi 7 devices (ath12k) - CAN: - Renesas R-Car V4H Drivers: - Bluetooth: - Set Per Platform Antenna Gain (PPAG) for Intel controllers. - Ethernet NICs: - Intel (1G, igc): - support TSN / Qbv / packet scheduling features of i226 model - Intel (100G, ice): - use GNSS subsystem instead of TTY - multi-buffer XDP support - extend support for GPIO pins to E823 devices - nVidia/Mellanox: - update the shared buffer configuration on PFC commands - implement PTP adjphase function for HW offset control - TC support for Geneve and GRE with VF tunnel offload - more efficient crypto key management method - multi-port eswitch support - Netronome/Corigine: - add DCB IEEE support - support IPsec offloading for NFP3800 - Freescale/NXP (enetc): - support XDP_REDIRECT for XDP non-linear buffers - improve reconfig, avoid link flap and waiting for idle - support MAC Merge layer - Other NICs: - sfc/ef100: add basic devlink support for ef100 - ionic: rx_push mode operation (writing descriptors via MMIO) - bnxt: use the auxiliary bus abstraction for RDMA - r8169: disable ASPM and reset bus in case of tx timeout - cpsw: support QSGMII mode for J721e CPSW9G - cpts: support pulse-per-second output - ngbe: add an mdio bus driver - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing - r8152: handle devices with FW with NCM support - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation - virtio-net: support multi buffer XDP - virtio/vsock: replace virtio_vsock_pkt with sk_buff - tsnep: XDP support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add support for latency TLV (in FW control messages) - Microchip (sparx5): - separate explicit and implicit traffic forwarding rules, make the implicit rules always active - add support for egress DSCP rewrite - IS0 VCAP support (Ingress Classification) - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS etc.) - ES2 VCAP support (Egress Access Control) - support for Per-Stream Filtering and Policing (802.1Q, 8.6.5.1) - Ethernet embedded switches: - Marvell (mv88e6xxx): - add MAB (port auth) offload support - enable PTP receive for mv88e6390 - NXP (ocelot): - support MAC Merge layer - support for the the vsc7512 internal copper phys - Microchip: - lan9303: convert to PHYLINK - lan966x: support TC flower filter statistics - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x - lan937x: support Credit Based Shaper configuration - ksz9477: support Energy Efficient Ethernet - other: - qca8k: convert to regmap read/write API, use bulk operations - rswitch: Improve TX timestamp accuracy - Intel WiFi (iwlwifi): - EHT (Wi-Fi 7) rate reporting - STEP equalizer support: transfer some STEP (connection to radio on platforms with integrated wifi) related parameters from the BIOS to the firmware. - Qualcomm 802.11ax WiFi (ath11k): - IPQ5018 support - Fine Timing Measurement (FTM) responder role support - channel 177 support - MediaTek WiFi (mt76): - per-PHY LED support - mt7996: EHT (Wi-Fi 7) support - Wireless Ethernet Dispatch (WED) reset support - switch to using page pool allocator - RealTek WiFi (rtw89): - support new version of Bluetooth co-existance - Mobile: - rmnet: support TX aggregation" * tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits) page_pool: add a comment explaining the fragment counter usage net: ethtool: fix __ethtool_dev_mm_supported() implementation ethtool: pse-pd: Fix double word in comments xsk: add linux/vmalloc.h to xsk.c sefltests: netdevsim: wait for devlink instance after netns removal selftest: fib_tests: Always cleanup before exit net/mlx5e: Align IPsec ASO result memory to be as required by hardware net/mlx5e: TC, Set CT miss to the specific ct action instance net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG net/mlx5: Refactor tc miss handling to a single function net/mlx5: Kconfig: Make tc offload depend on tc skb extension net/sched: flower: Support hardware miss to tc action net/sched: flower: Move filter handle initialization earlier net/sched: cls_api: Support hardware miss to tc action net/sched: Rename user cookie and act cookie sfc: fix builds without CONFIG_RTC_LIB sfc: clean up some inconsistent indentings net/mlx4_en: Introduce flexible array to silence overflow warning net: lan966x: Fix possible deadlock inside PTP net/ulp: Remove redundant ->clone() test in inet_clone_ulp(). ... --- arch/s390/kvm/guestdbg.c | 626 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 626 insertions(+) create mode 100644 arch/s390/kvm/guestdbg.c (limited to 'arch/s390/kvm/guestdbg.c') diff --git a/arch/s390/kvm/guestdbg.c b/arch/s390/kvm/guestdbg.c new file mode 100644 index 000000000..3765c4223 --- /dev/null +++ b/arch/s390/kvm/guestdbg.c @@ -0,0 +1,626 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * kvm guest debug support + * + * Copyright IBM Corp. 2014 + * + * Author(s): David Hildenbrand + */ +#include +#include +#include "kvm-s390.h" +#include "gaccess.h" + +/* + * Extends the address range given by *start and *stop to include the address + * range starting with estart and the length len. Takes care of overflowing + * intervals and tries to minimize the overall interval size. + */ +static void extend_address_range(u64 *start, u64 *stop, u64 estart, int len) +{ + u64 estop; + + if (len > 0) + len--; + else + len = 0; + + estop = estart + len; + + /* 0-0 range represents "not set" */ + if ((*start == 0) && (*stop == 0)) { + *start = estart; + *stop = estop; + } else if (*start <= *stop) { + /* increase the existing range */ + if (estart < *start) + *start = estart; + if (estop > *stop) + *stop = estop; + } else { + /* "overflowing" interval, whereby *stop > *start */ + if (estart <= *stop) { + if (estop > *stop) + *stop = estop; + } else if (estop > *start) { + if (estart < *start) + *start = estart; + } + /* minimize the range */ + else if ((estop - *stop) < (*start - estart)) + *stop = estop; + else + *start = estart; + } +} + +#define MAX_INST_SIZE 6 + +static void enable_all_hw_bp(struct kvm_vcpu *vcpu) +{ + unsigned long start, len; + u64 *cr9 = &vcpu->arch.sie_block->gcr[9]; + u64 *cr10 = &vcpu->arch.sie_block->gcr[10]; + u64 *cr11 = &vcpu->arch.sie_block->gcr[11]; + int i; + + if (vcpu->arch.guestdbg.nr_hw_bp <= 0 || + vcpu->arch.guestdbg.hw_bp_info == NULL) + return; + + /* + * If the guest is not interested in branching events, we can safely + * limit them to the PER address range. + */ + if (!(*cr9 & PER_EVENT_BRANCH)) + *cr9 |= PER_CONTROL_BRANCH_ADDRESS; + *cr9 |= PER_EVENT_IFETCH | PER_EVENT_BRANCH; + + for (i = 0; i < vcpu->arch.guestdbg.nr_hw_bp; i++) { + start = vcpu->arch.guestdbg.hw_bp_info[i].addr; + len = vcpu->arch.guestdbg.hw_bp_info[i].len; + + /* + * The instruction in front of the desired bp has to + * report instruction-fetching events + */ + if (start < MAX_INST_SIZE) { + len += start; + start = 0; + } else { + start -= MAX_INST_SIZE; + len += MAX_INST_SIZE; + } + + extend_address_range(cr10, cr11, start, len); + } +} + +static void enable_all_hw_wp(struct kvm_vcpu *vcpu) +{ + unsigned long start, len; + u64 *cr9 = &vcpu->arch.sie_block->gcr[9]; + u64 *cr10 = &vcpu->arch.sie_block->gcr[10]; + u64 *cr11 = &vcpu->arch.sie_block->gcr[11]; + int i; + + if (vcpu->arch.guestdbg.nr_hw_wp <= 0 || + vcpu->arch.guestdbg.hw_wp_info == NULL) + return; + + /* if host uses storage alternation for special address + * spaces, enable all events and give all to the guest */ + if (*cr9 & PER_EVENT_STORE && *cr9 & PER_CONTROL_ALTERATION) { + *cr9 &= ~PER_CONTROL_ALTERATION; + *cr10 = 0; + *cr11 = -1UL; + } else { + *cr9 &= ~PER_CONTROL_ALTERATION; + *cr9 |= PER_EVENT_STORE; + + for (i = 0; i < vcpu->arch.guestdbg.nr_hw_wp; i++) { + start = vcpu->arch.guestdbg.hw_wp_info[i].addr; + len = vcpu->arch.guestdbg.hw_wp_info[i].len; + + extend_address_range(cr10, cr11, start, len); + } + } +} + +void kvm_s390_backup_guest_per_regs(struct kvm_vcpu *vcpu) +{ + vcpu->arch.guestdbg.cr0 = vcpu->arch.sie_block->gcr[0]; + vcpu->arch.guestdbg.cr9 = vcpu->arch.sie_block->gcr[9]; + vcpu->arch.guestdbg.cr10 = vcpu->arch.sie_block->gcr[10]; + vcpu->arch.guestdbg.cr11 = vcpu->arch.sie_block->gcr[11]; +} + +void kvm_s390_restore_guest_per_regs(struct kvm_vcpu *vcpu) +{ + vcpu->arch.sie_block->gcr[0] = vcpu->arch.guestdbg.cr0; + vcpu->arch.sie_block->gcr[9] = vcpu->arch.guestdbg.cr9; + vcpu->arch.sie_block->gcr[10] = vcpu->arch.guestdbg.cr10; + vcpu->arch.sie_block->gcr[11] = vcpu->arch.guestdbg.cr11; +} + +void kvm_s390_patch_guest_per_regs(struct kvm_vcpu *vcpu) +{ + /* + * TODO: if guest psw has per enabled, otherwise 0s! + * This reduces the amount of reported events. + * Need to intercept all psw changes! + */ + + if (guestdbg_sstep_enabled(vcpu)) { + /* disable timer (clock-comparator) interrupts */ + vcpu->arch.sie_block->gcr[0] &= ~CR0_CLOCK_COMPARATOR_SUBMASK; + vcpu->arch.sie_block->gcr[9] |= PER_EVENT_IFETCH; + vcpu->arch.sie_block->gcr[10] = 0; + vcpu->arch.sie_block->gcr[11] = -1UL; + } + + if (guestdbg_hw_bp_enabled(vcpu)) { + enable_all_hw_bp(vcpu); + enable_all_hw_wp(vcpu); + } + + /* TODO: Instruction-fetching-nullification not allowed for now */ + if (vcpu->arch.sie_block->gcr[9] & PER_EVENT_NULLIFICATION) + vcpu->arch.sie_block->gcr[9] &= ~PER_EVENT_NULLIFICATION; +} + +#define MAX_WP_SIZE 100 + +static int __import_wp_info(struct kvm_vcpu *vcpu, + struct kvm_hw_breakpoint *bp_data, + struct kvm_hw_wp_info_arch *wp_info) +{ + int ret = 0; + wp_info->len = bp_data->len; + wp_info->addr = bp_data->addr; + wp_info->phys_addr = bp_data->phys_addr; + wp_info->old_data = NULL; + + if (wp_info->len < 0 || wp_info->len > MAX_WP_SIZE) + return -EINVAL; + + wp_info->old_data = kmalloc(bp_data->len, GFP_KERNEL_ACCOUNT); + if (!wp_info->old_data) + return -ENOMEM; + /* try to backup the original value */ + ret = read_guest_abs(vcpu, wp_info->phys_addr, wp_info->old_data, + wp_info->len); + if (ret) { + kfree(wp_info->old_data); + wp_info->old_data = NULL; + } + + return ret; +} + +#define MAX_BP_COUNT 50 + +int kvm_s390_import_bp_data(struct kvm_vcpu *vcpu, + struct kvm_guest_debug *dbg) +{ + int ret = 0, nr_wp = 0, nr_bp = 0, i; + struct kvm_hw_breakpoint *bp_data = NULL; + struct kvm_hw_wp_info_arch *wp_info = NULL; + struct kvm_hw_bp_info_arch *bp_info = NULL; + + if (dbg->arch.nr_hw_bp <= 0 || !dbg->arch.hw_bp) + return 0; + else if (dbg->arch.nr_hw_bp > MAX_BP_COUNT) + return -EINVAL; + + bp_data = memdup_user(dbg->arch.hw_bp, + sizeof(*bp_data) * dbg->arch.nr_hw_bp); + if (IS_ERR(bp_data)) + return PTR_ERR(bp_data); + + for (i = 0; i < dbg->arch.nr_hw_bp; i++) { + switch (bp_data[i].type) { + case KVM_HW_WP_WRITE: + nr_wp++; + break; + case KVM_HW_BP: + nr_bp++; + break; + default: + break; + } + } + + if (nr_wp > 0) { + wp_info = kmalloc_array(nr_wp, + sizeof(*wp_info), + GFP_KERNEL_ACCOUNT); + if (!wp_info) { + ret = -ENOMEM; + goto error; + } + } + if (nr_bp > 0) { + bp_info = kmalloc_array(nr_bp, + sizeof(*bp_info), + GFP_KERNEL_ACCOUNT); + if (!bp_info) { + ret = -ENOMEM; + goto error; + } + } + + for (nr_wp = 0, nr_bp = 0, i = 0; i < dbg->arch.nr_hw_bp; i++) { + switch (bp_data[i].type) { + case KVM_HW_WP_WRITE: + ret = __import_wp_info(vcpu, &bp_data[i], + &wp_info[nr_wp]); + if (ret) + goto error; + nr_wp++; + break; + case KVM_HW_BP: + bp_info[nr_bp].len = bp_data[i].len; + bp_info[nr_bp].addr = bp_data[i].addr; + nr_bp++; + break; + } + } + + vcpu->arch.guestdbg.nr_hw_bp = nr_bp; + vcpu->arch.guestdbg.hw_bp_info = bp_info; + vcpu->arch.guestdbg.nr_hw_wp = nr_wp; + vcpu->arch.guestdbg.hw_wp_info = wp_info; + return 0; +error: + kfree(bp_data); + kfree(wp_info); + kfree(bp_info); + return ret; +} + +void kvm_s390_clear_bp_data(struct kvm_vcpu *vcpu) +{ + int i; + struct kvm_hw_wp_info_arch *hw_wp_info = NULL; + + for (i = 0; i < vcpu->arch.guestdbg.nr_hw_wp; i++) { + hw_wp_info = &vcpu->arch.guestdbg.hw_wp_info[i]; + kfree(hw_wp_info->old_data); + hw_wp_info->old_data = NULL; + } + kfree(vcpu->arch.guestdbg.hw_wp_info); + vcpu->arch.guestdbg.hw_wp_info = NULL; + + kfree(vcpu->arch.guestdbg.hw_bp_info); + vcpu->arch.guestdbg.hw_bp_info = NULL; + + vcpu->arch.guestdbg.nr_hw_wp = 0; + vcpu->arch.guestdbg.nr_hw_bp = 0; +} + +static inline int in_addr_range(u64 addr, u64 a, u64 b) +{ + if (a <= b) + return (addr >= a) && (addr <= b); + else + /* "overflowing" interval */ + return (addr >= a) || (addr <= b); +} + +#define end_of_range(bp_info) (bp_info->addr + bp_info->len - 1) + +static struct kvm_hw_bp_info_arch *find_hw_bp(struct kvm_vcpu *vcpu, + unsigned long addr) +{ + struct kvm_hw_bp_info_arch *bp_info = vcpu->arch.guestdbg.hw_bp_info; + int i; + + if (vcpu->arch.guestdbg.nr_hw_bp == 0) + return NULL; + + for (i = 0; i < vcpu->arch.guestdbg.nr_hw_bp; i++) { + /* addr is directly the start or in the range of a bp */ + if (addr == bp_info->addr) + goto found; + if (bp_info->len > 0 && + in_addr_range(addr, bp_info->addr, end_of_range(bp_info))) + goto found; + + bp_info++; + } + + return NULL; +found: + return bp_info; +} + +static struct kvm_hw_wp_info_arch *any_wp_changed(struct kvm_vcpu *vcpu) +{ + int i; + struct kvm_hw_wp_info_arch *wp_info = NULL; + void *temp = NULL; + + if (vcpu->arch.guestdbg.nr_hw_wp == 0) + return NULL; + + for (i = 0; i < vcpu->arch.guestdbg.nr_hw_wp; i++) { + wp_info = &vcpu->arch.guestdbg.hw_wp_info[i]; + if (!wp_info || !wp_info->old_data || wp_info->len <= 0) + continue; + + temp = kmalloc(wp_info->len, GFP_KERNEL_ACCOUNT); + if (!temp) + continue; + + /* refetch the wp data and compare it to the old value */ + if (!read_guest_abs(vcpu, wp_info->phys_addr, temp, + wp_info->len)) { + if (memcmp(temp, wp_info->old_data, wp_info->len)) { + kfree(temp); + return wp_info; + } + } + kfree(temp); + temp = NULL; + } + + return NULL; +} + +void kvm_s390_prepare_debug_exit(struct kvm_vcpu *vcpu) +{ + vcpu->run->exit_reason = KVM_EXIT_DEBUG; + vcpu->guest_debug &= ~KVM_GUESTDBG_EXIT_PENDING; +} + +#define PER_CODE_MASK (PER_EVENT_MASK >> 24) +#define PER_CODE_BRANCH (PER_EVENT_BRANCH >> 24) +#define PER_CODE_IFETCH (PER_EVENT_IFETCH >> 24) +#define PER_CODE_STORE (PER_EVENT_STORE >> 24) +#define PER_CODE_STORE_REAL (PER_EVENT_STORE_REAL >> 24) + +#define per_bp_event(code) \ + (code & (PER_CODE_IFETCH | PER_CODE_BRANCH)) +#define per_write_wp_event(code) \ + (code & (PER_CODE_STORE | PER_CODE_STORE_REAL)) + +static int debug_exit_required(struct kvm_vcpu *vcpu, u8 perc, + unsigned long peraddr) +{ + struct kvm_debug_exit_arch *debug_exit = &vcpu->run->debug.arch; + struct kvm_hw_wp_info_arch *wp_info = NULL; + struct kvm_hw_bp_info_arch *bp_info = NULL; + unsigned long addr = vcpu->arch.sie_block->gpsw.addr; + + if (guestdbg_hw_bp_enabled(vcpu)) { + if (per_write_wp_event(perc) && + vcpu->arch.guestdbg.nr_hw_wp > 0) { + wp_info = any_wp_changed(vcpu); + if (wp_info) { + debug_exit->addr = wp_info->addr; + debug_exit->type = KVM_HW_WP_WRITE; + goto exit_required; + } + } + if (per_bp_event(perc) && + vcpu->arch.guestdbg.nr_hw_bp > 0) { + bp_info = find_hw_bp(vcpu, addr); + /* remove duplicate events if PC==PER address */ + if (bp_info && (addr != peraddr)) { + debug_exit->addr = addr; + debug_exit->type = KVM_HW_BP; + vcpu->arch.guestdbg.last_bp = addr; + goto exit_required; + } + /* breakpoint missed */ + bp_info = find_hw_bp(vcpu, peraddr); + if (bp_info && vcpu->arch.guestdbg.last_bp != peraddr) { + debug_exit->addr = peraddr; + debug_exit->type = KVM_HW_BP; + goto exit_required; + } + } + } + if (guestdbg_sstep_enabled(vcpu) && per_bp_event(perc)) { + debug_exit->addr = addr; + debug_exit->type = KVM_SINGLESTEP; + goto exit_required; + } + + return 0; +exit_required: + return 1; +} + +static int per_fetched_addr(struct kvm_vcpu *vcpu, unsigned long *addr) +{ + u8 exec_ilen = 0; + u16 opcode[3]; + int rc; + + if (vcpu->arch.sie_block->icptcode == ICPT_PROGI) { + /* PER address references the fetched or the execute instr */ + *addr = vcpu->arch.sie_block->peraddr; + /* + * Manually detect if we have an EXECUTE instruction. As + * instructions are always 2 byte aligned we can read the + * first two bytes unconditionally + */ + rc = read_guest_instr(vcpu, *addr, &opcode, 2); + if (rc) + return rc; + if (opcode[0] >> 8 == 0x44) + exec_ilen = 4; + if ((opcode[0] & 0xff0f) == 0xc600) + exec_ilen = 6; + } else { + /* instr was suppressed, calculate the responsible instr */ + *addr = __rewind_psw(vcpu->arch.sie_block->gpsw, + kvm_s390_get_ilen(vcpu)); + if (vcpu->arch.sie_block->icptstatus & 0x01) { + exec_ilen = (vcpu->arch.sie_block->icptstatus & 0x60) >> 4; + if (!exec_ilen) + exec_ilen = 4; + } + } + + if (exec_ilen) { + /* read the complete EXECUTE instr to detect the fetched addr */ + rc = read_guest_instr(vcpu, *addr, &opcode, exec_ilen); + if (rc) + return rc; + if (exec_ilen == 6) { + /* EXECUTE RELATIVE LONG - RIL-b format */ + s32 rl = *((s32 *) (opcode + 1)); + + /* rl is a _signed_ 32 bit value specifying halfwords */ + *addr += (u64)(s64) rl * 2; + } else { + /* EXECUTE - RX-a format */ + u32 base = (opcode[1] & 0xf000) >> 12; + u32 disp = opcode[1] & 0x0fff; + u32 index = opcode[0] & 0x000f; + + *addr = base ? vcpu->run->s.regs.gprs[base] : 0; + *addr += index ? vcpu->run->s.regs.gprs[index] : 0; + *addr += disp; + } + *addr = kvm_s390_logical_to_effective(vcpu, *addr); + } + return 0; +} + +#define guest_per_enabled(vcpu) \ + (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PER) + +int kvm_s390_handle_per_ifetch_icpt(struct kvm_vcpu *vcpu) +{ + const u64 cr10 = vcpu->arch.sie_block->gcr[10]; + const u64 cr11 = vcpu->arch.sie_block->gcr[11]; + const u8 ilen = kvm_s390_get_ilen(vcpu); + struct kvm_s390_pgm_info pgm_info = { + .code = PGM_PER, + .per_code = PER_CODE_IFETCH, + .per_address = __rewind_psw(vcpu->arch.sie_block->gpsw, ilen), + }; + unsigned long fetched_addr; + int rc; + + /* + * The PSW points to the next instruction, therefore the intercepted + * instruction generated a PER i-fetch event. PER address therefore + * points at the previous PSW address (could be an EXECUTE function). + */ + if (!guestdbg_enabled(vcpu)) + return kvm_s390_inject_prog_irq(vcpu, &pgm_info); + + if (debug_exit_required(vcpu, pgm_info.per_code, pgm_info.per_address)) + vcpu->guest_debug |= KVM_GUESTDBG_EXIT_PENDING; + + if (!guest_per_enabled(vcpu) || + !(vcpu->arch.sie_block->gcr[9] & PER_EVENT_IFETCH)) + return 0; + + rc = per_fetched_addr(vcpu, &fetched_addr); + if (rc < 0) + return rc; + if (rc) + /* instruction-fetching exceptions */ + return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); + + if (in_addr_range(fetched_addr, cr10, cr11)) + return kvm_s390_inject_prog_irq(vcpu, &pgm_info); + return 0; +} + +static int filter_guest_per_event(struct kvm_vcpu *vcpu) +{ + const u8 perc = vcpu->arch.sie_block->perc; + u64 addr = vcpu->arch.sie_block->gpsw.addr; + u64 cr9 = vcpu->arch.sie_block->gcr[9]; + u64 cr10 = vcpu->arch.sie_block->gcr[10]; + u64 cr11 = vcpu->arch.sie_block->gcr[11]; + /* filter all events, demanded by the guest */ + u8 guest_perc = perc & (cr9 >> 24) & PER_CODE_MASK; + unsigned long fetched_addr; + int rc; + + if (!guest_per_enabled(vcpu)) + guest_perc = 0; + + /* filter "successful-branching" events */ + if (guest_perc & PER_CODE_BRANCH && + cr9 & PER_CONTROL_BRANCH_ADDRESS && + !in_addr_range(addr, cr10, cr11)) + guest_perc &= ~PER_CODE_BRANCH; + + /* filter "instruction-fetching" events */ + if (guest_perc & PER_CODE_IFETCH) { + rc = per_fetched_addr(vcpu, &fetched_addr); + if (rc < 0) + return rc; + /* + * Don't inject an irq on exceptions. This would make handling + * on icpt code 8 very complex (as PSW was already rewound). + */ + if (rc || !in_addr_range(fetched_addr, cr10, cr11)) + guest_perc &= ~PER_CODE_IFETCH; + } + + /* All other PER events will be given to the guest */ + /* TODO: Check altered address/address space */ + + vcpu->arch.sie_block->perc = guest_perc; + + if (!guest_perc) + vcpu->arch.sie_block->iprcc &= ~PGM_PER; + return 0; +} + +#define pssec(vcpu) (vcpu->arch.sie_block->gcr[1] & _ASCE_SPACE_SWITCH) +#define hssec(vcpu) (vcpu->arch.sie_block->gcr[13] & _ASCE_SPACE_SWITCH) +#define old_ssec(vcpu) ((vcpu->arch.sie_block->tecmc >> 31) & 0x1) +#define old_as_is_home(vcpu) !(vcpu->arch.sie_block->tecmc & 0xffff) + +int kvm_s390_handle_per_event(struct kvm_vcpu *vcpu) +{ + int rc, new_as; + + if (debug_exit_required(vcpu, vcpu->arch.sie_block->perc, + vcpu->arch.sie_block->peraddr)) + vcpu->guest_debug |= KVM_GUESTDBG_EXIT_PENDING; + + rc = filter_guest_per_event(vcpu); + if (rc) + return rc; + + /* + * Only RP, SAC, SACF, PT, PTI, PR, PC instructions can trigger + * a space-switch event. PER events enforce space-switch events + * for these instructions. So if no PER event for the guest is left, + * we might have to filter the space-switch element out, too. + */ + if (vcpu->arch.sie_block->iprcc == PGM_SPACE_SWITCH) { + vcpu->arch.sie_block->iprcc = 0; + new_as = psw_bits(vcpu->arch.sie_block->gpsw).as; + + /* + * If the AS changed from / to home, we had RP, SAC or SACF + * instruction. Check primary and home space-switch-event + * controls. (theoretically home -> home produced no event) + */ + if (((new_as == PSW_BITS_AS_HOME) ^ old_as_is_home(vcpu)) && + (pssec(vcpu) || hssec(vcpu))) + vcpu->arch.sie_block->iprcc = PGM_SPACE_SWITCH; + + /* + * PT, PTI, PR, PC instruction operate on primary AS only. Check + * if the primary-space-switch-event control was or got set. + */ + if (new_as == PSW_BITS_AS_PRIMARY && !old_as_is_home(vcpu) && + (pssec(vcpu) || old_ssec(vcpu))) + vcpu->arch.sie_block->iprcc = PGM_SPACE_SWITCH; + } + return 0; +} -- cgit v1.2.3