diff options
author | 2023-02-21 18:24:12 -0800 | |
---|---|---|
committer | 2023-02-21 18:24:12 -0800 | |
commit | 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 (patch) | |
tree | cc5c2d0a898769fd59549594fedb3ee6f84e59a0 /arch/arm/xen | |
download | linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.tar.gz linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.zip |
Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextgrafted
Pull networking updates from Jakub Kicinski:
"Core:
- Add dedicated kmem_cache for typical/small skb->head, avoid having
to access struct page at kfree time, and improve memory use.
- Introduce sysctl to set default RPS configuration for new netdevs.
- Define Netlink protocol specification format which can be used to
describe messages used by each family and auto-generate parsers.
Add tools for generating kernel data structures and uAPI headers.
- Expose all net/core sysctls inside netns.
- Remove 4s sleep in netpoll if carrier is instantly detected on
boot.
- Add configurable limit of MDB entries per port, and port-vlan.
- Continue populating drop reasons throughout the stack.
- Retire a handful of legacy Qdiscs and classifiers.
Protocols:
- Support IPv4 big TCP (TSO frames larger than 64kB).
- Add IP_LOCAL_PORT_RANGE socket option, to control local port range
on socket by socket basis.
- Track and report in procfs number of MPTCP sockets used.
- Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path
manager.
- IPv6: don't check net.ipv6.route.max_size and rely on garbage
collection to free memory (similarly to IPv4).
- Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).
- ICMP: add per-rate limit counters.
- Add support for user scanning requests in ieee802154.
- Remove static WEP support.
- Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
reporting.
- WiFi 7 EHT channel puncturing support (client & AP).
BPF:
- Add a rbtree data structure following the "next-gen data structure"
precedent set by recently added linked list, that is, by using
kfunc + kptr instead of adding a new BPF map type.
- Expose XDP hints via kfuncs with initial support for RX hash and
timestamp metadata.
- Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to
better support decap on GRE tunnel devices not operating in collect
metadata.
- Improve x86 JIT's codegen for PROBE_MEM runtime error checks.
- Remove the need for trace_printk_lock for bpf_trace_printk and
bpf_trace_vprintk helpers.
- Extend libbpf's bpf_tracing.h support for tracing arguments of
kprobes/uprobes and syscall as a special case.
- Significantly reduce the search time for module symbols by
livepatch and BPF.
- Enable cpumasks to be used as kptrs, which is useful for tracing
programs tracking which tasks end up running on which CPUs in
different time intervals.
- Add support for BPF trampoline on s390x and riscv64.
- Add capability to export the XDP features supported by the NIC.
- Add __bpf_kfunc tag for marking kernel functions as kfuncs.
- Add cgroup.memory=nobpf kernel parameter option to disable BPF
memory accounting for container environments.
Netfilter:
- Remove the CLUSTERIP target. It has been marked as obsolete for
years, and we still have WARN splats wrt races of the out-of-band
/proc interface installed by this target.
- Add 'destroy' commands to nf_tables. They are identical to the
existing 'delete' commands, but do not return an error if the
referenced object (set, chain, rule...) did not exist.
Driver API:
- Improve cpumask_local_spread() locality to help NICs set the right
IRQ affinity on AMD platforms.
- Separate C22 and C45 MDIO bus transactions more clearly.
- Introduce new DCB table to control DSCP rewrite on egress.
- Support configuration of Physical Layer Collision Avoidance (PLCA)
Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
shared medium Ethernet.
- Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
preemption of low priority frames by high priority frames.
- Add support for controlling MACSec offload using netlink SET.
- Rework devlink instance refcounts to allow registration and
de-registration under the instance lock. Split the code into
multiple files, drop some of the unnecessarily granular locks and
factor out common parts of netlink operation handling.
- Add TX frame aggregation parameters (for USB drivers).
- Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
messages with notifications for debug.
- Allow offloading of UDP NEW connections via act_ct.
- Add support for per action HW stats in TC.
- Support hardware miss to TC action (continue processing in SW from
a specific point in the action chain).
- Warn if old Wireless Extension user space interface is used with
modern cfg80211/mac80211 drivers. Do not support Wireless
Extensions for Wi-Fi 7 devices at all. Everyone should switch to
using nl80211 interface instead.
- Improve the CAN bit timing configuration. Use extack to return
error messages directly to user space, update the SJW handling,
including the definition of a new default value that will benefit
CAN-FD controllers, by increasing their oscillator tolerance.
New hardware / drivers:
- Ethernet:
- nVidia BlueField-3 support (control traffic driver)
- Ethernet support for imx93 SoCs
- Motorcomm yt8531 gigabit Ethernet PHY
- onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
- Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
- Amlogic gxl MDIO mux
- WiFi:
- RealTek RTL8188EU (rtl8xxxu)
- Qualcomm Wi-Fi 7 devices (ath12k)
- CAN:
- Renesas R-Car V4H
Drivers:
- Bluetooth:
- Set Per Platform Antenna Gain (PPAG) for Intel controllers.
- Ethernet NICs:
- Intel (1G, igc):
- support TSN / Qbv / packet scheduling features of i226 model
- Intel (100G, ice):
- use GNSS subsystem instead of TTY
- multi-buffer XDP support
- extend support for GPIO pins to E823 devices
- nVidia/Mellanox:
- update the shared buffer configuration on PFC commands
- implement PTP adjphase function for HW offset control
- TC support for Geneve and GRE with VF tunnel offload
- more efficient crypto key management method
- multi-port eswitch support
- Netronome/Corigine:
- add DCB IEEE support
- support IPsec offloading for NFP3800
- Freescale/NXP (enetc):
- support XDP_REDIRECT for XDP non-linear buffers
- improve reconfig, avoid link flap and waiting for idle
- support MAC Merge layer
- Other NICs:
- sfc/ef100: add basic devlink support for ef100
- ionic: rx_push mode operation (writing descriptors via MMIO)
- bnxt: use the auxiliary bus abstraction for RDMA
- r8169: disable ASPM and reset bus in case of tx timeout
- cpsw: support QSGMII mode for J721e CPSW9G
- cpts: support pulse-per-second output
- ngbe: add an mdio bus driver
- usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
- r8152: handle devices with FW with NCM support
- amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
- virtio-net: support multi buffer XDP
- virtio/vsock: replace virtio_vsock_pkt with sk_buff
- tsnep: XDP support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add support for latency TLV (in FW control messages)
- Microchip (sparx5):
- separate explicit and implicit traffic forwarding rules, make
the implicit rules always active
- add support for egress DSCP rewrite
- IS0 VCAP support (Ingress Classification)
- IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS
etc.)
- ES2 VCAP support (Egress Access Control)
- support for Per-Stream Filtering and Policing (802.1Q,
8.6.5.1)
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- add MAB (port auth) offload support
- enable PTP receive for mv88e6390
- NXP (ocelot):
- support MAC Merge layer
- support for the the vsc7512 internal copper phys
- Microchip:
- lan9303: convert to PHYLINK
- lan966x: support TC flower filter statistics
- lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
- lan937x: support Credit Based Shaper configuration
- ksz9477: support Energy Efficient Ethernet
- other:
- qca8k: convert to regmap read/write API, use bulk operations
- rswitch: Improve TX timestamp accuracy
- Intel WiFi (iwlwifi):
- EHT (Wi-Fi 7) rate reporting
- STEP equalizer support: transfer some STEP (connection to radio
on platforms with integrated wifi) related parameters from the
BIOS to the firmware.
- Qualcomm 802.11ax WiFi (ath11k):
- IPQ5018 support
- Fine Timing Measurement (FTM) responder role support
- channel 177 support
- MediaTek WiFi (mt76):
- per-PHY LED support
- mt7996: EHT (Wi-Fi 7) support
- Wireless Ethernet Dispatch (WED) reset support
- switch to using page pool allocator
- RealTek WiFi (rtw89):
- support new version of Bluetooth co-existance
- Mobile:
- rmnet: support TX aggregation"
* tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits)
page_pool: add a comment explaining the fragment counter usage
net: ethtool: fix __ethtool_dev_mm_supported() implementation
ethtool: pse-pd: Fix double word in comments
xsk: add linux/vmalloc.h to xsk.c
sefltests: netdevsim: wait for devlink instance after netns removal
selftest: fib_tests: Always cleanup before exit
net/mlx5e: Align IPsec ASO result memory to be as required by hardware
net/mlx5e: TC, Set CT miss to the specific ct action instance
net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
net/mlx5: Refactor tc miss handling to a single function
net/mlx5: Kconfig: Make tc offload depend on tc skb extension
net/sched: flower: Support hardware miss to tc action
net/sched: flower: Move filter handle initialization earlier
net/sched: cls_api: Support hardware miss to tc action
net/sched: Rename user cookie and act cookie
sfc: fix builds without CONFIG_RTC_LIB
sfc: clean up some inconsistent indentings
net/mlx4_en: Introduce flexible array to silence overflow warning
net: lan966x: Fix possible deadlock inside PTP
net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
...
Diffstat (limited to 'arch/arm/xen')
-rw-r--r-- | arch/arm/xen/Makefile | 2 | ||||
-rw-r--r-- | arch/arm/xen/enlighten.c | 578 | ||||
-rw-r--r-- | arch/arm/xen/grant-table.c | 58 | ||||
-rw-r--r-- | arch/arm/xen/hypercall.S | 121 | ||||
-rw-r--r-- | arch/arm/xen/mm.c | 143 | ||||
-rw-r--r-- | arch/arm/xen/p2m.c | 210 |
6 files changed, 1112 insertions, 0 deletions
diff --git a/arch/arm/xen/Makefile b/arch/arm/xen/Makefile new file mode 100644 index 000000000..c32d04713 --- /dev/null +++ b/arch/arm/xen/Makefile @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0-only +obj-y := enlighten.o hypercall.o grant-table.o p2m.o mm.o diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c new file mode 100644 index 000000000..7d59765ae --- /dev/null +++ b/arch/arm/xen/enlighten.c @@ -0,0 +1,578 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include <xen/xen.h> +#include <xen/events.h> +#include <xen/grant_table.h> +#include <xen/hvm.h> +#include <xen/interface/vcpu.h> +#include <xen/interface/xen.h> +#include <xen/interface/memory.h> +#include <xen/interface/hvm/params.h> +#include <xen/features.h> +#include <xen/platform_pci.h> +#include <xen/xenbus.h> +#include <xen/page.h> +#include <xen/interface/sched.h> +#include <xen/xen-ops.h> +#include <asm/xen/hypervisor.h> +#include <asm/xen/hypercall.h> +#include <asm/system_misc.h> +#include <asm/efi.h> +#include <linux/interrupt.h> +#include <linux/irqreturn.h> +#include <linux/module.h> +#include <linux/of.h> +#include <linux/of_fdt.h> +#include <linux/of_irq.h> +#include <linux/of_address.h> +#include <linux/cpuidle.h> +#include <linux/cpufreq.h> +#include <linux/cpu.h> +#include <linux/console.h> +#include <linux/pvclock_gtod.h> +#include <linux/reboot.h> +#include <linux/time64.h> +#include <linux/timekeeping.h> +#include <linux/timekeeper_internal.h> +#include <linux/acpi.h> +#include <linux/virtio_anchor.h> + +#include <linux/mm.h> + +static struct start_info _xen_start_info; +struct start_info *xen_start_info = &_xen_start_info; +EXPORT_SYMBOL(xen_start_info); + +enum xen_domain_type xen_domain_type = XEN_NATIVE; +EXPORT_SYMBOL(xen_domain_type); + +struct shared_info xen_dummy_shared_info; +struct shared_info *HYPERVISOR_shared_info = (void *)&xen_dummy_shared_info; + +DEFINE_PER_CPU(struct vcpu_info *, xen_vcpu); +static struct vcpu_info __percpu *xen_vcpu_info; + +/* Linux <-> Xen vCPU id mapping */ +DEFINE_PER_CPU(uint32_t, xen_vcpu_id); +EXPORT_PER_CPU_SYMBOL(xen_vcpu_id); + +/* These are unused until we support booting "pre-ballooned" */ +unsigned long xen_released_pages; +struct xen_memory_region xen_extra_mem[XEN_EXTRA_MEM_MAX_REGIONS] __initdata; + +static __read_mostly unsigned int xen_events_irq; +static __read_mostly phys_addr_t xen_grant_frames; + +#define GRANT_TABLE_INDEX 0 +#define EXT_REGION_INDEX 1 + +uint32_t xen_start_flags; +EXPORT_SYMBOL(xen_start_flags); + +int xen_unmap_domain_gfn_range(struct vm_area_struct *vma, + int nr, struct page **pages) +{ + return xen_xlate_unmap_gfn_range(vma, nr, pages); +} +EXPORT_SYMBOL_GPL(xen_unmap_domain_gfn_range); + +static void xen_read_wallclock(struct timespec64 *ts) +{ + u32 version; + struct timespec64 now, ts_monotonic; + struct shared_info *s = HYPERVISOR_shared_info; + struct pvclock_wall_clock *wall_clock = &(s->wc); + + /* get wallclock at system boot */ + do { + version = wall_clock->version; + rmb(); /* fetch version before time */ + now.tv_sec = ((uint64_t)wall_clock->sec_hi << 32) | wall_clock->sec; + now.tv_nsec = wall_clock->nsec; + rmb(); /* fetch time before checking version */ + } while ((wall_clock->version & 1) || (version != wall_clock->version)); + + /* time since system boot */ + ktime_get_ts64(&ts_monotonic); + *ts = timespec64_add(now, ts_monotonic); +} + +static int xen_pvclock_gtod_notify(struct notifier_block *nb, + unsigned long was_set, void *priv) +{ + /* Protected by the calling core code serialization */ + static struct timespec64 next_sync; + + struct xen_platform_op op; + struct timespec64 now, system_time; + struct timekeeper *tk = priv; + + now.tv_sec = tk->xtime_sec; + now.tv_nsec = (long)(tk->tkr_mono.xtime_nsec >> tk->tkr_mono.shift); + system_time = timespec64_add(now, tk->wall_to_monotonic); + + /* + * We only take the expensive HV call when the clock was set + * or when the 11 minutes RTC synchronization time elapsed. + */ + if (!was_set && timespec64_compare(&now, &next_sync) < 0) + return NOTIFY_OK; + + op.cmd = XENPF_settime64; + op.u.settime64.mbz = 0; + op.u.settime64.secs = now.tv_sec; + op.u.settime64.nsecs = now.tv_nsec; + op.u.settime64.system_time = timespec64_to_ns(&system_time); + (void)HYPERVISOR_platform_op(&op); + + /* + * Move the next drift compensation time 11 minutes + * ahead. That's emulating the sync_cmos_clock() update for + * the hardware RTC. + */ + next_sync = now; + next_sync.tv_sec += 11 * 60; + + return NOTIFY_OK; +} + +static struct notifier_block xen_pvclock_gtod_notifier = { + .notifier_call = xen_pvclock_gtod_notify, +}; + +static int xen_starting_cpu(unsigned int cpu) +{ + struct vcpu_register_vcpu_info info; + struct vcpu_info *vcpup; + int err; + + /* + * VCPUOP_register_vcpu_info cannot be called twice for the same + * vcpu, so if vcpu_info is already registered, just get out. This + * can happen with cpu-hotplug. + */ + if (per_cpu(xen_vcpu, cpu) != NULL) + goto after_register_vcpu_info; + + pr_info("Xen: initializing cpu%d\n", cpu); + vcpup = per_cpu_ptr(xen_vcpu_info, cpu); + + info.mfn = percpu_to_gfn(vcpup); + info.offset = xen_offset_in_page(vcpup); + + err = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info, xen_vcpu_nr(cpu), + &info); + BUG_ON(err); + per_cpu(xen_vcpu, cpu) = vcpup; + + if (!xen_kernel_unmapped_at_usr()) + xen_setup_runstate_info(cpu); + +after_register_vcpu_info: + enable_percpu_irq(xen_events_irq, 0); + return 0; +} + +static int xen_dying_cpu(unsigned int cpu) +{ + disable_percpu_irq(xen_events_irq); + return 0; +} + +void xen_reboot(int reason) +{ + struct sched_shutdown r = { .reason = reason }; + int rc; + + rc = HYPERVISOR_sched_op(SCHEDOP_shutdown, &r); + BUG_ON(rc); +} + +static int xen_restart(struct notifier_block *nb, unsigned long action, + void *data) +{ + xen_reboot(SHUTDOWN_reboot); + + return NOTIFY_DONE; +} + +static struct notifier_block xen_restart_nb = { + .notifier_call = xen_restart, + .priority = 192, +}; + +static void xen_power_off(void) +{ + xen_reboot(SHUTDOWN_poweroff); +} + +static irqreturn_t xen_arm_callback(int irq, void *arg) +{ + xen_hvm_evtchn_do_upcall(); + return IRQ_HANDLED; +} + +static __initdata struct { + const char *compat; + const char *prefix; + const char *version; + bool found; +} hyper_node = {"xen,xen", "xen,xen-", NULL, false}; + +static int __init fdt_find_hyper_node(unsigned long node, const char *uname, + int depth, void *data) +{ + const void *s = NULL; + int len; + + if (depth != 1 || strcmp(uname, "hypervisor") != 0) + return 0; + + if (of_flat_dt_is_compatible(node, hyper_node.compat)) + hyper_node.found = true; + + s = of_get_flat_dt_prop(node, "compatible", &len); + if (strlen(hyper_node.prefix) + 3 < len && + !strncmp(hyper_node.prefix, s, strlen(hyper_node.prefix))) + hyper_node.version = s + strlen(hyper_node.prefix); + + /* + * Check if Xen supports EFI by checking whether there is the + * "/hypervisor/uefi" node in DT. If so, runtime services are available + * through proxy functions (e.g. in case of Xen dom0 EFI implementation + * they call special hypercall which executes relevant EFI functions) + * and that is why they are always enabled. + */ + if (IS_ENABLED(CONFIG_XEN_EFI)) { + if ((of_get_flat_dt_subnode_by_name(node, "uefi") > 0) && + !efi_runtime_disabled()) + set_bit(EFI_RUNTIME_SERVICES, &efi.flags); + } + + return 0; +} + +/* + * see Documentation/devicetree/bindings/arm/xen.txt for the + * documentation of the Xen Device Tree format. + */ +void __init xen_early_init(void) +{ + of_scan_flat_dt(fdt_find_hyper_node, NULL); + if (!hyper_node.found) { + pr_debug("No Xen support\n"); + return; + } + + if (hyper_node.version == NULL) { + pr_debug("Xen version not found\n"); + return; + } + + pr_info("Xen %s support found\n", hyper_node.version); + + xen_domain_type = XEN_HVM_DOMAIN; + + xen_setup_features(); + + if (xen_feature(XENFEAT_dom0)) + xen_start_flags |= SIF_INITDOMAIN|SIF_PRIVILEGED; + + if (!console_set_on_cmdline && !xen_initial_domain()) + add_preferred_console("hvc", 0, NULL); +} + +static void __init xen_acpi_guest_init(void) +{ +#ifdef CONFIG_ACPI + struct xen_hvm_param a; + int interrupt, trigger, polarity; + + a.domid = DOMID_SELF; + a.index = HVM_PARAM_CALLBACK_IRQ; + + if (HYPERVISOR_hvm_op(HVMOP_get_param, &a) + || (a.value >> 56) != HVM_PARAM_CALLBACK_TYPE_PPI) { + xen_events_irq = 0; + return; + } + + interrupt = a.value & 0xff; + trigger = ((a.value >> 8) & 0x1) ? ACPI_EDGE_SENSITIVE + : ACPI_LEVEL_SENSITIVE; + polarity = ((a.value >> 8) & 0x2) ? ACPI_ACTIVE_LOW + : ACPI_ACTIVE_HIGH; + xen_events_irq = acpi_register_gsi(NULL, interrupt, trigger, polarity); +#endif +} + +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC +/* + * A type-less specific Xen resource which contains extended regions + * (unused regions of guest physical address space provided by the hypervisor). + */ +static struct resource xen_resource = { + .name = "Xen unused space", +}; + +int __init arch_xen_unpopulated_init(struct resource **res) +{ + struct device_node *np; + struct resource *regs, *tmp_res; + uint64_t min_gpaddr = -1, max_gpaddr = 0; + unsigned int i, nr_reg = 0; + int rc; + + if (!xen_domain()) + return -ENODEV; + + if (!acpi_disabled) + return -ENODEV; + + np = of_find_compatible_node(NULL, NULL, "xen,xen"); + if (WARN_ON(!np)) + return -ENODEV; + + /* Skip region 0 which is reserved for grant table space */ + while (of_get_address(np, nr_reg + EXT_REGION_INDEX, NULL, NULL)) + nr_reg++; + + if (!nr_reg) { + pr_err("No extended regions are found\n"); + of_node_put(np); + return -EINVAL; + } + + regs = kcalloc(nr_reg, sizeof(*regs), GFP_KERNEL); + if (!regs) { + of_node_put(np); + return -ENOMEM; + } + + /* + * Create resource from extended regions provided by the hypervisor to be + * used as unused address space for Xen scratch pages. + */ + for (i = 0; i < nr_reg; i++) { + rc = of_address_to_resource(np, i + EXT_REGION_INDEX, ®s[i]); + if (rc) + goto err; + + if (max_gpaddr < regs[i].end) + max_gpaddr = regs[i].end; + if (min_gpaddr > regs[i].start) + min_gpaddr = regs[i].start; + } + + xen_resource.start = min_gpaddr; + xen_resource.end = max_gpaddr; + + /* + * Mark holes between extended regions as unavailable. The rest of that + * address space will be available for the allocation. + */ + for (i = 1; i < nr_reg; i++) { + resource_size_t start, end; + + /* There is an overlap between regions */ + if (regs[i - 1].end + 1 > regs[i].start) { + rc = -EINVAL; + goto err; + } + + /* There is no hole between regions */ + if (regs[i - 1].end + 1 == regs[i].start) + continue; + + start = regs[i - 1].end + 1; + end = regs[i].start - 1; + + tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL); + if (!tmp_res) { + rc = -ENOMEM; + goto err; + } + + tmp_res->name = "Unavailable space"; + tmp_res->start = start; + tmp_res->end = end; + + rc = insert_resource(&xen_resource, tmp_res); + if (rc) { + pr_err("Cannot insert resource %pR (%d)\n", tmp_res, rc); + kfree(tmp_res); + goto err; + } + } + + *res = &xen_resource; + +err: + of_node_put(np); + kfree(regs); + return rc; +} +#endif + +static void __init xen_dt_guest_init(void) +{ + struct device_node *xen_node; + struct resource res; + + xen_node = of_find_compatible_node(NULL, NULL, "xen,xen"); + if (!xen_node) { + pr_err("Xen support was detected before, but it has disappeared\n"); + return; + } + + xen_events_irq = irq_of_parse_and_map(xen_node, 0); + + if (of_address_to_resource(xen_node, GRANT_TABLE_INDEX, &res)) { + pr_err("Xen grant table region is not found\n"); + of_node_put(xen_node); + return; + } + of_node_put(xen_node); + xen_grant_frames = res.start; +} + +static int __init xen_guest_init(void) +{ + struct xen_add_to_physmap xatp; + struct shared_info *shared_info_page = NULL; + int rc, cpu; + + if (!xen_domain()) + return 0; + + if (IS_ENABLED(CONFIG_XEN_VIRTIO)) + virtio_set_mem_acc_cb(xen_virtio_restricted_mem_acc); + + if (!acpi_disabled) + xen_acpi_guest_init(); + else + xen_dt_guest_init(); + + if (!xen_events_irq) { + pr_err("Xen event channel interrupt not found\n"); + return -ENODEV; + } + + /* + * The fdt parsing codes have set EFI_RUNTIME_SERVICES if Xen EFI + * parameters are found. Force enable runtime services. + */ + if (efi_enabled(EFI_RUNTIME_SERVICES)) + xen_efi_runtime_setup(); + + shared_info_page = (struct shared_info *)get_zeroed_page(GFP_KERNEL); + + if (!shared_info_page) { + pr_err("not enough memory\n"); + return -ENOMEM; + } + xatp.domid = DOMID_SELF; + xatp.idx = 0; + xatp.space = XENMAPSPACE_shared_info; + xatp.gpfn = virt_to_gfn(shared_info_page); + if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp)) + BUG(); + + HYPERVISOR_shared_info = (struct shared_info *)shared_info_page; + + /* xen_vcpu is a pointer to the vcpu_info struct in the shared_info + * page, we use it in the event channel upcall and in some pvclock + * related functions. + * The shared info contains exactly 1 CPU (the boot CPU). The guest + * is required to use VCPUOP_register_vcpu_info to place vcpu info + * for secondary CPUs as they are brought up. + * For uniformity we use VCPUOP_register_vcpu_info even on cpu0. + */ + xen_vcpu_info = alloc_percpu(struct vcpu_info); + if (xen_vcpu_info == NULL) + return -ENOMEM; + + /* Direct vCPU id mapping for ARM guests. */ + for_each_possible_cpu(cpu) + per_cpu(xen_vcpu_id, cpu) = cpu; + + if (!xen_grant_frames) { + xen_auto_xlat_grant_frames.count = gnttab_max_grant_frames(); + rc = xen_xlate_map_ballooned_pages(&xen_auto_xlat_grant_frames.pfn, + &xen_auto_xlat_grant_frames.vaddr, + xen_auto_xlat_grant_frames.count); + } else + rc = gnttab_setup_auto_xlat_frames(xen_grant_frames); + if (rc) { + free_percpu(xen_vcpu_info); + return rc; + } + gnttab_init(); + + /* + * Making sure board specific code will not set up ops for + * cpu idle and cpu freq. + */ + disable_cpuidle(); + disable_cpufreq(); + + xen_init_IRQ(); + + if (request_percpu_irq(xen_events_irq, xen_arm_callback, + "events", &xen_vcpu)) { + pr_err("Error request IRQ %d\n", xen_events_irq); + return -EINVAL; + } + + if (!xen_kernel_unmapped_at_usr()) + xen_time_setup_guest(); + + if (xen_initial_domain()) + pvclock_gtod_register_notifier(&xen_pvclock_gtod_notifier); + + return cpuhp_setup_state(CPUHP_AP_ARM_XEN_STARTING, + "arm/xen:starting", xen_starting_cpu, + xen_dying_cpu); +} +early_initcall(xen_guest_init); + +static int __init xen_pm_init(void) +{ + if (!xen_domain()) + return -ENODEV; + + pm_power_off = xen_power_off; + register_restart_handler(&xen_restart_nb); + if (!xen_initial_domain()) { + struct timespec64 ts; + xen_read_wallclock(&ts); + do_settimeofday64(&ts); + } + + return 0; +} +late_initcall(xen_pm_init); + + +/* empty stubs */ +void xen_arch_pre_suspend(void) { } +void xen_arch_post_suspend(int suspend_cancelled) { } +void xen_timer_resume(void) { } +void xen_arch_resume(void) { } +void xen_arch_suspend(void) { } + + +/* In the hypercall.S file. */ +EXPORT_SYMBOL_GPL(HYPERVISOR_event_channel_op); +EXPORT_SYMBOL_GPL(HYPERVISOR_grant_table_op); +EXPORT_SYMBOL_GPL(HYPERVISOR_xen_version); +EXPORT_SYMBOL_GPL(HYPERVISOR_console_io); +EXPORT_SYMBOL_GPL(HYPERVISOR_sched_op); +EXPORT_SYMBOL_GPL(HYPERVISOR_hvm_op); +EXPORT_SYMBOL_GPL(HYPERVISOR_memory_op); +EXPORT_SYMBOL_GPL(HYPERVISOR_physdev_op); +EXPORT_SYMBOL_GPL(HYPERVISOR_vcpu_op); +EXPORT_SYMBOL_GPL(HYPERVISOR_platform_op_raw); +EXPORT_SYMBOL_GPL(HYPERVISOR_multicall); +EXPORT_SYMBOL_GPL(HYPERVISOR_vm_assist); +EXPORT_SYMBOL_GPL(HYPERVISOR_dm_op); +EXPORT_SYMBOL_GPL(privcmd_call); diff --git a/arch/arm/xen/grant-table.c b/arch/arm/xen/grant-table.c new file mode 100644 index 000000000..91cf08ba1 --- /dev/null +++ b/arch/arm/xen/grant-table.c @@ -0,0 +1,58 @@ +/****************************************************************************** + * grant_table.c + * ARM specific part + * + * Granting foreign access to our memory reservation. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License version 2 + * as published by the Free Software Foundation; or, when distributed + * separately from the Linux kernel or incorporated into other + * software packages, subject to the following license: + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this source file (the "Software"), to deal in the Software without + * restriction, including without limitation the rights to use, copy, modify, + * merge, publish, distribute, sublicense, and/or sell copies of the Software, + * and to permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#include <xen/interface/xen.h> +#include <xen/page.h> +#include <xen/grant_table.h> + +int arch_gnttab_map_shared(xen_pfn_t *frames, unsigned long nr_gframes, + unsigned long max_nr_gframes, + void **__shared) +{ + return -ENOSYS; +} + +void arch_gnttab_unmap(void *shared, unsigned long nr_gframes) +{ + return; +} + +int arch_gnttab_map_status(uint64_t *frames, unsigned long nr_gframes, + unsigned long max_nr_gframes, + grant_status_t **__shared) +{ + return -ENOSYS; +} + +int arch_gnttab_init(unsigned long nr_shared, unsigned long nr_status) +{ + return 0; +} diff --git a/arch/arm/xen/hypercall.S b/arch/arm/xen/hypercall.S new file mode 100644 index 000000000..f794dac98 --- /dev/null +++ b/arch/arm/xen/hypercall.S @@ -0,0 +1,121 @@ +/****************************************************************************** + * hypercall.S + * + * Xen hypercall wrappers + * + * Stefano Stabellini <stefano.stabellini@eu.citrix.com>, Citrix, 2012 + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License version 2 + * as published by the Free Software Foundation; or, when distributed + * separately from the Linux kernel or incorporated into other + * software packages, subject to the following license: + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this source file (the "Software"), to deal in the Software without + * restriction, including without limitation the rights to use, copy, modify, + * merge, publish, distribute, sublicense, and/or sell copies of the Software, + * and to permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +/* + * The Xen hypercall calling convention is very similar to the ARM + * procedure calling convention: the first paramter is passed in r0, the + * second in r1, the third in r2 and the fourth in r3. Considering that + * Xen hypercalls have 5 arguments at most, the fifth paramter is passed + * in r4, differently from the procedure calling convention of using the + * stack for that case. + * + * The hypercall number is passed in r12. + * + * The return value is in r0. + * + * The hvc ISS is required to be 0xEA1, that is the Xen specific ARM + * hypercall tag. + */ + +#include <linux/linkage.h> +#include <asm/assembler.h> +#include <asm/opcodes-virt.h> +#include <xen/interface/xen.h> + + +#define XEN_IMM 0xEA1 + +#define HYPERCALL_SIMPLE(hypercall) \ +ENTRY(HYPERVISOR_##hypercall) \ + mov r12, #__HYPERVISOR_##hypercall; \ + __HVC(XEN_IMM); \ + ret lr; \ +ENDPROC(HYPERVISOR_##hypercall) + +#define HYPERCALL0 HYPERCALL_SIMPLE +#define HYPERCALL1 HYPERCALL_SIMPLE +#define HYPERCALL2 HYPERCALL_SIMPLE +#define HYPERCALL3 HYPERCALL_SIMPLE +#define HYPERCALL4 HYPERCALL_SIMPLE + +#define HYPERCALL5(hypercall) \ +ENTRY(HYPERVISOR_##hypercall) \ + stmdb sp!, {r4} \ + ldr r4, [sp, #4] \ + mov r12, #__HYPERVISOR_##hypercall; \ + __HVC(XEN_IMM); \ + ldm sp!, {r4} \ + ret lr \ +ENDPROC(HYPERVISOR_##hypercall) + + .text + +HYPERCALL2(xen_version); +HYPERCALL3(console_io); +HYPERCALL3(grant_table_op); +HYPERCALL2(sched_op); +HYPERCALL2(event_channel_op); +HYPERCALL2(hvm_op); +HYPERCALL2(memory_op); +HYPERCALL2(physdev_op); +HYPERCALL3(vcpu_op); +HYPERCALL1(platform_op_raw); +HYPERCALL2(multicall); +HYPERCALL2(vm_assist); +HYPERCALL3(dm_op); + +ENTRY(privcmd_call) + stmdb sp!, {r4} + mov r12, r0 + mov r0, r1 + mov r1, r2 + mov r2, r3 + ldr r3, [sp, #8] + /* + * Privcmd calls are issued by the userspace. We need to allow the + * kernel to access the userspace memory before issuing the hypercall. + */ + uaccess_enable r4 + + /* r4 is loaded now as we use it as scratch register before */ + ldr r4, [sp, #4] + __HVC(XEN_IMM) + + /* + * Disable userspace access from kernel. This is fine to do it + * unconditionally as no set_fs(KERNEL_DS) is called before. + */ + uaccess_disable r4 + + ldm sp!, {r4} + ret lr +ENDPROC(privcmd_call); diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c new file mode 100644 index 000000000..3d826c0b5 --- /dev/null +++ b/arch/arm/xen/mm.c @@ -0,0 +1,143 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include <linux/cpu.h> +#include <linux/dma-direct.h> +#include <linux/dma-map-ops.h> +#include <linux/gfp.h> +#include <linux/highmem.h> +#include <linux/export.h> +#include <linux/memblock.h> +#include <linux/of_address.h> +#include <linux/slab.h> +#include <linux/types.h> +#include <linux/vmalloc.h> +#include <linux/swiotlb.h> + +#include <xen/xen.h> +#include <xen/interface/grant_table.h> +#include <xen/interface/memory.h> +#include <xen/page.h> +#include <xen/xen-ops.h> +#include <xen/swiotlb-xen.h> + +#include <asm/cacheflush.h> +#include <asm/xen/hypercall.h> +#include <asm/xen/interface.h> + +static gfp_t xen_swiotlb_gfp(void) +{ + phys_addr_t base; + u64 i; + + for_each_mem_range(i, &base, NULL) { + if (base < (phys_addr_t)0xffffffff) { + if (IS_ENABLED(CONFIG_ZONE_DMA32)) + return __GFP_DMA32; + return __GFP_DMA; + } + } + + return GFP_KERNEL; +} + +static bool hypercall_cflush = false; + +/* buffers in highmem or foreign pages cannot cross page boundaries */ +static void dma_cache_maint(struct device *dev, dma_addr_t handle, + size_t size, u32 op) +{ + struct gnttab_cache_flush cflush; + + cflush.offset = xen_offset_in_page(handle); + cflush.op = op; + handle &= XEN_PAGE_MASK; + + do { + cflush.a.dev_bus_addr = dma_to_phys(dev, handle); + + if (size + cflush.offset > XEN_PAGE_SIZE) + cflush.length = XEN_PAGE_SIZE - cflush.offset; + else + cflush.length = size; + + HYPERVISOR_grant_table_op(GNTTABOP_cache_flush, &cflush, 1); + + cflush.offset = 0; + handle += cflush.length; + size -= cflush.length; + } while (size); +} + +/* + * Dom0 is mapped 1:1, and while the Linux page can span across multiple Xen + * pages, it is not possible for it to contain a mix of local and foreign Xen + * pages. Calling pfn_valid on a foreign mfn will always return false, so if + * pfn_valid returns true the pages is local and we can use the native + * dma-direct functions, otherwise we call the Xen specific version. + */ +void xen_dma_sync_for_cpu(struct device *dev, dma_addr_t handle, + size_t size, enum dma_data_direction dir) +{ + if (dir != DMA_TO_DEVICE) + dma_cache_maint(dev, handle, size, GNTTAB_CACHE_INVAL); +} + +void xen_dma_sync_for_device(struct device *dev, dma_addr_t handle, + size_t size, enum dma_data_direction dir) +{ + if (dir == DMA_FROM_DEVICE) + dma_cache_maint(dev, handle, size, GNTTAB_CACHE_INVAL); + else + dma_cache_maint(dev, handle, size, GNTTAB_CACHE_CLEAN); +} + +bool xen_arch_need_swiotlb(struct device *dev, + phys_addr_t phys, + dma_addr_t dev_addr) +{ + unsigned int xen_pfn = XEN_PFN_DOWN(phys); + unsigned int bfn = XEN_PFN_DOWN(dma_to_phys(dev, dev_addr)); + + /* + * The swiotlb buffer should be used if + * - Xen doesn't have the cache flush hypercall + * - The Linux page refers to foreign memory + * - The device doesn't support coherent DMA request + * + * The Linux page may be spanned acrros multiple Xen page, although + * it's not possible to have a mix of local and foreign Xen page. + * Furthermore, range_straddles_page_boundary is already checking + * if buffer is physically contiguous in the host RAM. + * + * Therefore we only need to check the first Xen page to know if we + * require a bounce buffer because the device doesn't support coherent + * memory and we are not able to flush the cache. + */ + return (!hypercall_cflush && (xen_pfn != bfn) && + !dev_is_dma_coherent(dev)); +} + +static int __init xen_mm_init(void) +{ + struct gnttab_cache_flush cflush; + int rc; + + if (!xen_swiotlb_detect()) + return 0; + + /* we can work with the default swiotlb */ + if (!io_tlb_default_mem.nslabs) { + rc = swiotlb_init_late(swiotlb_size_or_default(), + xen_swiotlb_gfp(), NULL); + if (rc < 0) + return rc; + } + + cflush.op = 0; + cflush.a.dev_bus_addr = 0; + cflush.offset = 0; + cflush.length = 0; + if (HYPERVISOR_grant_table_op(GNTTABOP_cache_flush, &cflush, 1) != -ENOSYS) + hypercall_cflush = true; + return 0; +} +arch_initcall(xen_mm_init); diff --git a/arch/arm/xen/p2m.c b/arch/arm/xen/p2m.c new file mode 100644 index 000000000..309648c17 --- /dev/null +++ b/arch/arm/xen/p2m.c @@ -0,0 +1,210 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include <linux/memblock.h> +#include <linux/gfp.h> +#include <linux/export.h> +#include <linux/spinlock.h> +#include <linux/slab.h> +#include <linux/types.h> +#include <linux/dma-mapping.h> +#include <linux/vmalloc.h> +#include <linux/swiotlb.h> + +#include <xen/xen.h> +#include <xen/interface/memory.h> +#include <xen/grant_table.h> +#include <xen/page.h> +#include <xen/swiotlb-xen.h> + +#include <asm/cacheflush.h> +#include <asm/xen/hypercall.h> +#include <asm/xen/interface.h> + +struct xen_p2m_entry { + unsigned long pfn; + unsigned long mfn; + unsigned long nr_pages; + struct rb_node rbnode_phys; +}; + +static rwlock_t p2m_lock; +struct rb_root phys_to_mach = RB_ROOT; +EXPORT_SYMBOL_GPL(phys_to_mach); + +static int xen_add_phys_to_mach_entry(struct xen_p2m_entry *new) +{ + struct rb_node **link = &phys_to_mach.rb_node; + struct rb_node *parent = NULL; + struct xen_p2m_entry *entry; + int rc = 0; + + while (*link) { + parent = *link; + entry = rb_entry(parent, struct xen_p2m_entry, rbnode_phys); + + if (new->pfn == entry->pfn) + goto err_out; + + if (new->pfn < entry->pfn) + link = &(*link)->rb_left; + else + link = &(*link)->rb_right; + } + rb_link_node(&new->rbnode_phys, parent, link); + rb_insert_color(&new->rbnode_phys, &phys_to_mach); + goto out; + +err_out: + rc = -EINVAL; + pr_warn("%s: cannot add pfn=%pa -> mfn=%pa: pfn=%pa -> mfn=%pa already exists\n", + __func__, &new->pfn, &new->mfn, &entry->pfn, &entry->mfn); +out: + return rc; +} + +unsigned long __pfn_to_mfn(unsigned long pfn) +{ + struct rb_node *n; + struct xen_p2m_entry *entry; + unsigned long irqflags; + + read_lock_irqsave(&p2m_lock, irqflags); + n = phys_to_mach.rb_node; + while (n) { + entry = rb_entry(n, struct xen_p2m_entry, rbnode_phys); + if (entry->pfn <= pfn && + entry->pfn + entry->nr_pages > pfn) { + unsigned long mfn = entry->mfn + (pfn - entry->pfn); + read_unlock_irqrestore(&p2m_lock, irqflags); + return mfn; + } + if (pfn < entry->pfn) + n = n->rb_left; + else + n = n->rb_right; + } + read_unlock_irqrestore(&p2m_lock, irqflags); + + return INVALID_P2M_ENTRY; +} +EXPORT_SYMBOL_GPL(__pfn_to_mfn); + +int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops, + struct gnttab_map_grant_ref *kmap_ops, + struct page **pages, unsigned int count) +{ + int i; + + for (i = 0; i < count; i++) { + struct gnttab_unmap_grant_ref unmap; + int rc; + + if (map_ops[i].status) + continue; + if (likely(set_phys_to_machine(map_ops[i].host_addr >> XEN_PAGE_SHIFT, + map_ops[i].dev_bus_addr >> XEN_PAGE_SHIFT))) + continue; + + /* + * Signal an error for this slot. This in turn requires + * immediate unmapping. + */ + map_ops[i].status = GNTST_general_error; + unmap.host_addr = map_ops[i].host_addr, + unmap.handle = map_ops[i].handle; + map_ops[i].handle = INVALID_GRANT_HANDLE; + if (map_ops[i].flags & GNTMAP_device_map) + unmap.dev_bus_addr = map_ops[i].dev_bus_addr; + else + unmap.dev_bus_addr = 0; + + /* + * Pre-populate the status field, to be recognizable in + * the log message below. + */ + unmap.status = 1; + + rc = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, + &unmap, 1); + if (rc || unmap.status != GNTST_okay) + pr_err_once("gnttab unmap failed: rc=%d st=%d\n", + rc, unmap.status); + } + + return 0; +} + +int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops, + struct gnttab_unmap_grant_ref *kunmap_ops, + struct page **pages, unsigned int count) +{ + int i; + + for (i = 0; i < count; i++) { + set_phys_to_machine(unmap_ops[i].host_addr >> XEN_PAGE_SHIFT, + INVALID_P2M_ENTRY); + } + + return 0; +} + +bool __set_phys_to_machine_multi(unsigned long pfn, + unsigned long mfn, unsigned long nr_pages) +{ + int rc; + unsigned long irqflags; + struct xen_p2m_entry *p2m_entry; + struct rb_node *n; + + if (mfn == INVALID_P2M_ENTRY) { + write_lock_irqsave(&p2m_lock, irqflags); + n = phys_to_mach.rb_node; + while (n) { + p2m_entry = rb_entry(n, struct xen_p2m_entry, rbnode_phys); + if (p2m_entry->pfn <= pfn && + p2m_entry->pfn + p2m_entry->nr_pages > pfn) { + rb_erase(&p2m_entry->rbnode_phys, &phys_to_mach); + write_unlock_irqrestore(&p2m_lock, irqflags); + kfree(p2m_entry); + return true; + } + if (pfn < p2m_entry->pfn) + n = n->rb_left; + else + n = n->rb_right; + } + write_unlock_irqrestore(&p2m_lock, irqflags); + return true; + } + + p2m_entry = kzalloc(sizeof(*p2m_entry), GFP_NOWAIT); + if (!p2m_entry) + return false; + + p2m_entry->pfn = pfn; + p2m_entry->nr_pages = nr_pages; + p2m_entry->mfn = mfn; + + write_lock_irqsave(&p2m_lock, irqflags); + rc = xen_add_phys_to_mach_entry(p2m_entry); + if (rc < 0) { + write_unlock_irqrestore(&p2m_lock, irqflags); + kfree(p2m_entry); + return false; + } + write_unlock_irqrestore(&p2m_lock, irqflags); + return true; +} +EXPORT_SYMBOL_GPL(__set_phys_to_machine_multi); + +bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn) +{ + return __set_phys_to_machine_multi(pfn, mfn, 1); +} +EXPORT_SYMBOL_GPL(__set_phys_to_machine); + +static int p2m_init(void) +{ + rwlock_init(&p2m_lock); + return 0; +} +arch_initcall(p2m_init); |