aboutsummaryrefslogtreecommitdiff
path: root/arch/arm64/mm/init.c
diff options
context:
space:
mode:
authorLibravatar Linus Torvalds <torvalds@linux-foundation.org>2023-02-21 18:24:12 -0800
committerLibravatar Linus Torvalds <torvalds@linux-foundation.org>2023-02-21 18:24:12 -0800
commit5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 (patch)
treecc5c2d0a898769fd59549594fedb3ee6f84e59a0 /arch/arm64/mm/init.c
downloadlinux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.tar.gz
linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.zip
Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextgrafted
Pull networking updates from Jakub Kicinski: "Core: - Add dedicated kmem_cache for typical/small skb->head, avoid having to access struct page at kfree time, and improve memory use. - Introduce sysctl to set default RPS configuration for new netdevs. - Define Netlink protocol specification format which can be used to describe messages used by each family and auto-generate parsers. Add tools for generating kernel data structures and uAPI headers. - Expose all net/core sysctls inside netns. - Remove 4s sleep in netpoll if carrier is instantly detected on boot. - Add configurable limit of MDB entries per port, and port-vlan. - Continue populating drop reasons throughout the stack. - Retire a handful of legacy Qdiscs and classifiers. Protocols: - Support IPv4 big TCP (TSO frames larger than 64kB). - Add IP_LOCAL_PORT_RANGE socket option, to control local port range on socket by socket basis. - Track and report in procfs number of MPTCP sockets used. - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path manager. - IPv6: don't check net.ipv6.route.max_size and rely on garbage collection to free memory (similarly to IPv4). - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986). - ICMP: add per-rate limit counters. - Add support for user scanning requests in ieee802154. - Remove static WEP support. - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate reporting. - WiFi 7 EHT channel puncturing support (client & AP). BPF: - Add a rbtree data structure following the "next-gen data structure" precedent set by recently added linked list, that is, by using kfunc + kptr instead of adding a new BPF map type. - Expose XDP hints via kfuncs with initial support for RX hash and timestamp metadata. - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better support decap on GRE tunnel devices not operating in collect metadata. - Improve x86 JIT's codegen for PROBE_MEM runtime error checks. - Remove the need for trace_printk_lock for bpf_trace_printk and bpf_trace_vprintk helpers. - Extend libbpf's bpf_tracing.h support for tracing arguments of kprobes/uprobes and syscall as a special case. - Significantly reduce the search time for module symbols by livepatch and BPF. - Enable cpumasks to be used as kptrs, which is useful for tracing programs tracking which tasks end up running on which CPUs in different time intervals. - Add support for BPF trampoline on s390x and riscv64. - Add capability to export the XDP features supported by the NIC. - Add __bpf_kfunc tag for marking kernel functions as kfuncs. - Add cgroup.memory=nobpf kernel parameter option to disable BPF memory accounting for container environments. Netfilter: - Remove the CLUSTERIP target. It has been marked as obsolete for years, and we still have WARN splats wrt races of the out-of-band /proc interface installed by this target. - Add 'destroy' commands to nf_tables. They are identical to the existing 'delete' commands, but do not return an error if the referenced object (set, chain, rule...) did not exist. Driver API: - Improve cpumask_local_spread() locality to help NICs set the right IRQ affinity on AMD platforms. - Separate C22 and C45 MDIO bus transactions more clearly. - Introduce new DCB table to control DSCP rewrite on egress. - Support configuration of Physical Layer Collision Avoidance (PLCA) Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of shared medium Ethernet. - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing preemption of low priority frames by high priority frames. - Add support for controlling MACSec offload using netlink SET. - Rework devlink instance refcounts to allow registration and de-registration under the instance lock. Split the code into multiple files, drop some of the unnecessarily granular locks and factor out common parts of netlink operation handling. - Add TX frame aggregation parameters (for USB drivers). - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning messages with notifications for debug. - Allow offloading of UDP NEW connections via act_ct. - Add support for per action HW stats in TC. - Support hardware miss to TC action (continue processing in SW from a specific point in the action chain). - Warn if old Wireless Extension user space interface is used with modern cfg80211/mac80211 drivers. Do not support Wireless Extensions for Wi-Fi 7 devices at all. Everyone should switch to using nl80211 interface instead. - Improve the CAN bit timing configuration. Use extack to return error messages directly to user space, update the SJW handling, including the definition of a new default value that will benefit CAN-FD controllers, by increasing their oscillator tolerance. New hardware / drivers: - Ethernet: - nVidia BlueField-3 support (control traffic driver) - Ethernet support for imx93 SoCs - Motorcomm yt8531 gigabit Ethernet PHY - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA) - Microchip LAN8841 PHY (incl. cable diagnostics and PTP) - Amlogic gxl MDIO mux - WiFi: - RealTek RTL8188EU (rtl8xxxu) - Qualcomm Wi-Fi 7 devices (ath12k) - CAN: - Renesas R-Car V4H Drivers: - Bluetooth: - Set Per Platform Antenna Gain (PPAG) for Intel controllers. - Ethernet NICs: - Intel (1G, igc): - support TSN / Qbv / packet scheduling features of i226 model - Intel (100G, ice): - use GNSS subsystem instead of TTY - multi-buffer XDP support - extend support for GPIO pins to E823 devices - nVidia/Mellanox: - update the shared buffer configuration on PFC commands - implement PTP adjphase function for HW offset control - TC support for Geneve and GRE with VF tunnel offload - more efficient crypto key management method - multi-port eswitch support - Netronome/Corigine: - add DCB IEEE support - support IPsec offloading for NFP3800 - Freescale/NXP (enetc): - support XDP_REDIRECT for XDP non-linear buffers - improve reconfig, avoid link flap and waiting for idle - support MAC Merge layer - Other NICs: - sfc/ef100: add basic devlink support for ef100 - ionic: rx_push mode operation (writing descriptors via MMIO) - bnxt: use the auxiliary bus abstraction for RDMA - r8169: disable ASPM and reset bus in case of tx timeout - cpsw: support QSGMII mode for J721e CPSW9G - cpts: support pulse-per-second output - ngbe: add an mdio bus driver - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing - r8152: handle devices with FW with NCM support - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation - virtio-net: support multi buffer XDP - virtio/vsock: replace virtio_vsock_pkt with sk_buff - tsnep: XDP support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add support for latency TLV (in FW control messages) - Microchip (sparx5): - separate explicit and implicit traffic forwarding rules, make the implicit rules always active - add support for egress DSCP rewrite - IS0 VCAP support (Ingress Classification) - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS etc.) - ES2 VCAP support (Egress Access Control) - support for Per-Stream Filtering and Policing (802.1Q, 8.6.5.1) - Ethernet embedded switches: - Marvell (mv88e6xxx): - add MAB (port auth) offload support - enable PTP receive for mv88e6390 - NXP (ocelot): - support MAC Merge layer - support for the the vsc7512 internal copper phys - Microchip: - lan9303: convert to PHYLINK - lan966x: support TC flower filter statistics - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x - lan937x: support Credit Based Shaper configuration - ksz9477: support Energy Efficient Ethernet - other: - qca8k: convert to regmap read/write API, use bulk operations - rswitch: Improve TX timestamp accuracy - Intel WiFi (iwlwifi): - EHT (Wi-Fi 7) rate reporting - STEP equalizer support: transfer some STEP (connection to radio on platforms with integrated wifi) related parameters from the BIOS to the firmware. - Qualcomm 802.11ax WiFi (ath11k): - IPQ5018 support - Fine Timing Measurement (FTM) responder role support - channel 177 support - MediaTek WiFi (mt76): - per-PHY LED support - mt7996: EHT (Wi-Fi 7) support - Wireless Ethernet Dispatch (WED) reset support - switch to using page pool allocator - RealTek WiFi (rtw89): - support new version of Bluetooth co-existance - Mobile: - rmnet: support TX aggregation" * tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits) page_pool: add a comment explaining the fragment counter usage net: ethtool: fix __ethtool_dev_mm_supported() implementation ethtool: pse-pd: Fix double word in comments xsk: add linux/vmalloc.h to xsk.c sefltests: netdevsim: wait for devlink instance after netns removal selftest: fib_tests: Always cleanup before exit net/mlx5e: Align IPsec ASO result memory to be as required by hardware net/mlx5e: TC, Set CT miss to the specific ct action instance net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG net/mlx5: Refactor tc miss handling to a single function net/mlx5: Kconfig: Make tc offload depend on tc skb extension net/sched: flower: Support hardware miss to tc action net/sched: flower: Move filter handle initialization earlier net/sched: cls_api: Support hardware miss to tc action net/sched: Rename user cookie and act cookie sfc: fix builds without CONFIG_RTC_LIB sfc: clean up some inconsistent indentings net/mlx4_en: Introduce flexible array to silence overflow warning net: lan966x: Fix possible deadlock inside PTP net/ulp: Remove redundant ->clone() test in inet_clone_ulp(). ...
Diffstat (limited to 'arch/arm64/mm/init.c')
-rw-r--r--arch/arm64/mm/init.c523
1 files changed, 523 insertions, 0 deletions
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
new file mode 100644
index 000000000..58a0bb2c1
--- /dev/null
+++ b/arch/arm64/mm/init.c
@@ -0,0 +1,523 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Based on arch/arm/mm/init.c
+ *
+ * Copyright (C) 1995-2005 Russell King
+ * Copyright (C) 2012 ARM Ltd.
+ */
+
+#include <linux/kernel.h>
+#include <linux/export.h>
+#include <linux/errno.h>
+#include <linux/swap.h>
+#include <linux/init.h>
+#include <linux/cache.h>
+#include <linux/mman.h>
+#include <linux/nodemask.h>
+#include <linux/initrd.h>
+#include <linux/gfp.h>
+#include <linux/memblock.h>
+#include <linux/sort.h>
+#include <linux/of.h>
+#include <linux/of_fdt.h>
+#include <linux/dma-direct.h>
+#include <linux/dma-map-ops.h>
+#include <linux/efi.h>
+#include <linux/swiotlb.h>
+#include <linux/vmalloc.h>
+#include <linux/mm.h>
+#include <linux/kexec.h>
+#include <linux/crash_dump.h>
+#include <linux/hugetlb.h>
+#include <linux/acpi_iort.h>
+#include <linux/kmemleak.h>
+
+#include <asm/boot.h>
+#include <asm/fixmap.h>
+#include <asm/kasan.h>
+#include <asm/kernel-pgtable.h>
+#include <asm/kvm_host.h>
+#include <asm/memory.h>
+#include <asm/numa.h>
+#include <asm/sections.h>
+#include <asm/setup.h>
+#include <linux/sizes.h>
+#include <asm/tlb.h>
+#include <asm/alternative.h>
+#include <asm/xen/swiotlb-xen.h>
+
+/*
+ * We need to be able to catch inadvertent references to memstart_addr
+ * that occur (potentially in generic code) before arm64_memblock_init()
+ * executes, which assigns it its actual value. So use a default value
+ * that cannot be mistaken for a real physical address.
+ */
+s64 memstart_addr __ro_after_init = -1;
+EXPORT_SYMBOL(memstart_addr);
+
+/*
+ * If the corresponding config options are enabled, we create both ZONE_DMA
+ * and ZONE_DMA32. By default ZONE_DMA covers the 32-bit addressable memory
+ * unless restricted on specific platforms (e.g. 30-bit on Raspberry Pi 4).
+ * In such case, ZONE_DMA32 covers the rest of the 32-bit addressable memory,
+ * otherwise it is empty.
+ *
+ * Memory reservation for crash kernel either done early or deferred
+ * depending on DMA memory zones configs (ZONE_DMA) --
+ *
+ * In absence of ZONE_DMA configs arm64_dma_phys_limit initialized
+ * here instead of max_zone_phys(). This lets early reservation of
+ * crash kernel memory which has a dependency on arm64_dma_phys_limit.
+ * Reserving memory early for crash kernel allows linear creation of block
+ * mappings (greater than page-granularity) for all the memory bank rangs.
+ * In this scheme a comparatively quicker boot is observed.
+ *
+ * If ZONE_DMA configs are defined, crash kernel memory reservation
+ * is delayed until DMA zone memory range size initialization performed in
+ * zone_sizes_init(). The defer is necessary to steer clear of DMA zone
+ * memory range to avoid overlap allocation. So crash kernel memory boundaries
+ * are not known when mapping all bank memory ranges, which otherwise means
+ * not possible to exclude crash kernel range from creating block mappings
+ * so page-granularity mappings are created for the entire memory range.
+ * Hence a slightly slower boot is observed.
+ *
+ * Note: Page-granularity mappings are necessary for crash kernel memory
+ * range for shrinking its size via /sys/kernel/kexec_crash_size interface.
+ */
+#if IS_ENABLED(CONFIG_ZONE_DMA) || IS_ENABLED(CONFIG_ZONE_DMA32)
+phys_addr_t __ro_after_init arm64_dma_phys_limit;
+#else
+phys_addr_t __ro_after_init arm64_dma_phys_limit = PHYS_MASK + 1;
+#endif
+
+/* Current arm64 boot protocol requires 2MB alignment */
+#define CRASH_ALIGN SZ_2M
+
+#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit
+#define CRASH_ADDR_HIGH_MAX (PHYS_MASK + 1)
+
+#define DEFAULT_CRASH_KERNEL_LOW_SIZE (128UL << 20)
+
+static int __init reserve_crashkernel_low(unsigned long long low_size)
+{
+ unsigned long long low_base;
+
+ low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX);
+ if (!low_base) {
+ pr_err("cannot allocate crashkernel low memory (size:0x%llx).\n", low_size);
+ return -ENOMEM;
+ }
+
+ pr_info("crashkernel low memory reserved: 0x%08llx - 0x%08llx (%lld MB)\n",
+ low_base, low_base + low_size, low_size >> 20);
+
+ crashk_low_res.start = low_base;
+ crashk_low_res.end = low_base + low_size - 1;
+ insert_resource(&iomem_resource, &crashk_low_res);
+
+ return 0;
+}
+
+/*
+ * reserve_crashkernel() - reserves memory for crash kernel
+ *
+ * This function reserves memory area given in "crashkernel=" kernel command
+ * line parameter. The memory reserved is used by dump capture kernel when
+ * primary kernel is crashing.
+ */
+static void __init reserve_crashkernel(void)
+{
+ unsigned long long crash_base, crash_size;
+ unsigned long long crash_low_size = 0;
+ unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
+ char *cmdline = boot_command_line;
+ int ret;
+ bool fixed_base = false;
+
+ if (!IS_ENABLED(CONFIG_KEXEC_CORE))
+ return;
+
+ /* crashkernel=X[@offset] */
+ ret = parse_crashkernel(cmdline, memblock_phys_mem_size(),
+ &crash_size, &crash_base);
+ if (ret == -ENOENT) {
+ ret = parse_crashkernel_high(cmdline, 0, &crash_size, &crash_base);
+ if (ret || !crash_size)
+ return;
+
+ /*
+ * crashkernel=Y,low can be specified or not, but invalid value
+ * is not allowed.
+ */
+ ret = parse_crashkernel_low(cmdline, 0, &crash_low_size, &crash_base);
+ if (ret == -ENOENT)
+ crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE;
+ else if (ret)
+ return;
+
+ crash_max = CRASH_ADDR_HIGH_MAX;
+ } else if (ret || !crash_size) {
+ /* The specified value is invalid */
+ return;
+ }
+
+ crash_size = PAGE_ALIGN(crash_size);
+
+ /* User specifies base address explicitly. */
+ if (crash_base) {
+ fixed_base = true;
+ crash_max = crash_base + crash_size;
+ }
+
+retry:
+ crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
+ crash_base, crash_max);
+ if (!crash_base) {
+ /*
+ * If the first attempt was for low memory, fall back to
+ * high memory, the minimum required low memory will be
+ * reserved later.
+ */
+ if (!fixed_base && (crash_max == CRASH_ADDR_LOW_MAX)) {
+ crash_max = CRASH_ADDR_HIGH_MAX;
+ crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE;
+ goto retry;
+ }
+
+ pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
+ crash_size);
+ return;
+ }
+
+ if ((crash_base > CRASH_ADDR_LOW_MAX - crash_low_size) &&
+ crash_low_size && reserve_crashkernel_low(crash_low_size)) {
+ memblock_phys_free(crash_base, crash_size);
+ return;
+ }
+
+ pr_info("crashkernel reserved: 0x%016llx - 0x%016llx (%lld MB)\n",
+ crash_base, crash_base + crash_size, crash_size >> 20);
+
+ /*
+ * The crashkernel memory will be removed from the kernel linear
+ * map. Inform kmemleak so that it won't try to access it.
+ */
+ kmemleak_ignore_phys(crash_base);
+ if (crashk_low_res.end)
+ kmemleak_ignore_phys(crashk_low_res.start);
+
+ crashk_res.start = crash_base;
+ crashk_res.end = crash_base + crash_size - 1;
+ insert_resource(&iomem_resource, &crashk_res);
+}
+
+/*
+ * Return the maximum physical address for a zone accessible by the given bits
+ * limit. If DRAM starts above 32-bit, expand the zone to the maximum
+ * available memory, otherwise cap it at 32-bit.
+ */
+static phys_addr_t __init max_zone_phys(unsigned int zone_bits)
+{
+ phys_addr_t zone_mask = DMA_BIT_MASK(zone_bits);
+ phys_addr_t phys_start = memblock_start_of_DRAM();
+
+ if (phys_start > U32_MAX)
+ zone_mask = PHYS_ADDR_MAX;
+ else if (phys_start > zone_mask)
+ zone_mask = U32_MAX;
+
+ return min(zone_mask, memblock_end_of_DRAM() - 1) + 1;
+}
+
+static void __init zone_sizes_init(void)
+{
+ unsigned long max_zone_pfns[MAX_NR_ZONES] = {0};
+ unsigned int __maybe_unused acpi_zone_dma_bits;
+ unsigned int __maybe_unused dt_zone_dma_bits;
+ phys_addr_t __maybe_unused dma32_phys_limit = max_zone_phys(32);
+
+#ifdef CONFIG_ZONE_DMA
+ acpi_zone_dma_bits = fls64(acpi_iort_dma_get_max_cpu_address());
+ dt_zone_dma_bits = fls64(of_dma_get_max_cpu_address(NULL));
+ zone_dma_bits = min3(32U, dt_zone_dma_bits, acpi_zone_dma_bits);
+ arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
+ max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
+#endif
+#ifdef CONFIG_ZONE_DMA32
+ max_zone_pfns[ZONE_DMA32] = PFN_DOWN(dma32_phys_limit);
+ if (!arm64_dma_phys_limit)
+ arm64_dma_phys_limit = dma32_phys_limit;
+#endif
+ max_zone_pfns[ZONE_NORMAL] = max_pfn;
+
+ free_area_init(max_zone_pfns);
+}
+
+int pfn_is_map_memory(unsigned long pfn)
+{
+ phys_addr_t addr = PFN_PHYS(pfn);
+
+ /* avoid false positives for bogus PFNs, see comment in pfn_valid() */
+ if (PHYS_PFN(addr) != pfn)
+ return 0;
+
+ return memblock_is_map_memory(addr);
+}
+EXPORT_SYMBOL(pfn_is_map_memory);
+
+static phys_addr_t memory_limit __ro_after_init = PHYS_ADDR_MAX;
+
+/*
+ * Limit the memory size that was specified via FDT.
+ */
+static int __init early_mem(char *p)
+{
+ if (!p)
+ return 1;
+
+ memory_limit = memparse(p, &p) & PAGE_MASK;
+ pr_notice("Memory limited to %lldMB\n", memory_limit >> 20);
+
+ return 0;
+}
+early_param("mem", early_mem);
+
+void __init arm64_memblock_init(void)
+{
+ s64 linear_region_size = PAGE_END - _PAGE_OFFSET(vabits_actual);
+
+ /*
+ * Corner case: 52-bit VA capable systems running KVM in nVHE mode may
+ * be limited in their ability to support a linear map that exceeds 51
+ * bits of VA space, depending on the placement of the ID map. Given
+ * that the placement of the ID map may be randomized, let's simply
+ * limit the kernel's linear map to 51 bits as well if we detect this
+ * configuration.
+ */
+ if (IS_ENABLED(CONFIG_KVM) && vabits_actual == 52 &&
+ is_hyp_mode_available() && !is_kernel_in_hyp_mode()) {
+ pr_info("Capping linear region to 51 bits for KVM in nVHE mode on LVA capable hardware.\n");
+ linear_region_size = min_t(u64, linear_region_size, BIT(51));
+ }
+
+ /* Remove memory above our supported physical address size */
+ memblock_remove(1ULL << PHYS_MASK_SHIFT, ULLONG_MAX);
+
+ /*
+ * Select a suitable value for the base of physical memory.
+ */
+ memstart_addr = round_down(memblock_start_of_DRAM(),
+ ARM64_MEMSTART_ALIGN);
+
+ if ((memblock_end_of_DRAM() - memstart_addr) > linear_region_size)
+ pr_warn("Memory doesn't fit in the linear mapping, VA_BITS too small\n");
+
+ /*
+ * Remove the memory that we will not be able to cover with the
+ * linear mapping. Take care not to clip the kernel which may be
+ * high in memory.
+ */
+ memblock_remove(max_t(u64, memstart_addr + linear_region_size,
+ __pa_symbol(_end)), ULLONG_MAX);
+ if (memstart_addr + linear_region_size < memblock_end_of_DRAM()) {
+ /* ensure that memstart_addr remains sufficiently aligned */
+ memstart_addr = round_up(memblock_end_of_DRAM() - linear_region_size,
+ ARM64_MEMSTART_ALIGN);
+ memblock_remove(0, memstart_addr);
+ }
+
+ /*
+ * If we are running with a 52-bit kernel VA config on a system that
+ * does not support it, we have to place the available physical
+ * memory in the 48-bit addressable part of the linear region, i.e.,
+ * we have to move it upward. Since memstart_addr represents the
+ * physical address of PAGE_OFFSET, we have to *subtract* from it.
+ */
+ if (IS_ENABLED(CONFIG_ARM64_VA_BITS_52) && (vabits_actual != 52))
+ memstart_addr -= _PAGE_OFFSET(48) - _PAGE_OFFSET(52);
+
+ /*
+ * Apply the memory limit if it was set. Since the kernel may be loaded
+ * high up in memory, add back the kernel region that must be accessible
+ * via the linear mapping.
+ */
+ if (memory_limit != PHYS_ADDR_MAX) {
+ memblock_mem_limit_remove_map(memory_limit);
+ memblock_add(__pa_symbol(_text), (u64)(_end - _text));
+ }
+
+ if (IS_ENABLED(CONFIG_BLK_DEV_INITRD) && phys_initrd_size) {
+ /*
+ * Add back the memory we just removed if it results in the
+ * initrd to become inaccessible via the linear mapping.
+ * Otherwise, this is a no-op
+ */
+ u64 base = phys_initrd_start & PAGE_MASK;
+ u64 size = PAGE_ALIGN(phys_initrd_start + phys_initrd_size) - base;
+
+ /*
+ * We can only add back the initrd memory if we don't end up
+ * with more memory than we can address via the linear mapping.
+ * It is up to the bootloader to position the kernel and the
+ * initrd reasonably close to each other (i.e., within 32 GB of
+ * each other) so that all granule/#levels combinations can
+ * always access both.
+ */
+ if (WARN(base < memblock_start_of_DRAM() ||
+ base + size > memblock_start_of_DRAM() +
+ linear_region_size,
+ "initrd not fully accessible via the linear mapping -- please check your bootloader ...\n")) {
+ phys_initrd_size = 0;
+ } else {
+ memblock_add(base, size);
+ memblock_clear_nomap(base, size);
+ memblock_reserve(base, size);
+ }
+ }
+
+ if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) {
+ extern u16 memstart_offset_seed;
+ u64 mmfr0 = read_cpuid(ID_AA64MMFR0_EL1);
+ int parange = cpuid_feature_extract_unsigned_field(
+ mmfr0, ID_AA64MMFR0_EL1_PARANGE_SHIFT);
+ s64 range = linear_region_size -
+ BIT(id_aa64mmfr0_parange_to_phys_shift(parange));
+
+ /*
+ * If the size of the linear region exceeds, by a sufficient
+ * margin, the size of the region that the physical memory can
+ * span, randomize the linear region as well.
+ */
+ if (memstart_offset_seed > 0 && range >= (s64)ARM64_MEMSTART_ALIGN) {
+ range /= ARM64_MEMSTART_ALIGN;
+ memstart_addr -= ARM64_MEMSTART_ALIGN *
+ ((range * memstart_offset_seed) >> 16);
+ }
+ }
+
+ /*
+ * Register the kernel text, kernel data, initrd, and initial
+ * pagetables with memblock.
+ */
+ memblock_reserve(__pa_symbol(_stext), _end - _stext);
+ if (IS_ENABLED(CONFIG_BLK_DEV_INITRD) && phys_initrd_size) {
+ /* the generic initrd code expects virtual addresses */
+ initrd_start = __phys_to_virt(phys_initrd_start);
+ initrd_end = initrd_start + phys_initrd_size;
+ }
+
+ early_init_fdt_scan_reserved_mem();
+
+ if (!defer_reserve_crashkernel())
+ reserve_crashkernel();
+
+ high_memory = __va(memblock_end_of_DRAM() - 1) + 1;
+}
+
+void __init bootmem_init(void)
+{
+ unsigned long min, max;
+
+ min = PFN_UP(memblock_start_of_DRAM());
+ max = PFN_DOWN(memblock_end_of_DRAM());
+
+ early_memtest(min << PAGE_SHIFT, max << PAGE_SHIFT);
+
+ max_pfn = max_low_pfn = max;
+ min_low_pfn = min;
+
+ arch_numa_init();
+
+ /*
+ * must be done after arch_numa_init() which calls numa_init() to
+ * initialize node_online_map that gets used in hugetlb_cma_reserve()
+ * while allocating required CMA size across online nodes.
+ */
+#if defined(CONFIG_HUGETLB_PAGE) && defined(CONFIG_CMA)
+ arm64_hugetlb_cma_reserve();
+#endif
+
+ dma_pernuma_cma_reserve();
+
+ kvm_hyp_reserve();
+
+ /*
+ * sparse_init() tries to allocate memory from memblock, so must be
+ * done after the fixed reservations
+ */
+ sparse_init();
+ zone_sizes_init();
+
+ /*
+ * Reserve the CMA area after arm64_dma_phys_limit was initialised.
+ */
+ dma_contiguous_reserve(arm64_dma_phys_limit);
+
+ /*
+ * request_standard_resources() depends on crashkernel's memory being
+ * reserved, so do it here.
+ */
+ if (defer_reserve_crashkernel())
+ reserve_crashkernel();
+
+ memblock_dump_all();
+}
+
+/*
+ * mem_init() marks the free areas in the mem_map and tells us how much memory
+ * is free. This is done after various parts of the system have claimed their
+ * memory after the kernel image.
+ */
+void __init mem_init(void)
+{
+ swiotlb_init(max_pfn > PFN_DOWN(arm64_dma_phys_limit), SWIOTLB_VERBOSE);
+
+ /* this will put all unused low memory onto the freelists */
+ memblock_free_all();
+
+ /*
+ * Check boundaries twice: Some fundamental inconsistencies can be
+ * detected at build time already.
+ */
+#ifdef CONFIG_COMPAT
+ BUILD_BUG_ON(TASK_SIZE_32 > DEFAULT_MAP_WINDOW_64);
+#endif
+
+ /*
+ * Selected page table levels should match when derived from
+ * scratch using the virtual address range and page size.
+ */
+ BUILD_BUG_ON(ARM64_HW_PGTABLE_LEVELS(CONFIG_ARM64_VA_BITS) !=
+ CONFIG_PGTABLE_LEVELS);
+
+ if (PAGE_SIZE >= 16384 && get_num_physpages() <= 128) {
+ extern int sysctl_overcommit_memory;
+ /*
+ * On a machine this small we won't get anywhere without
+ * overcommit, so turn it on by default.
+ */
+ sysctl_overcommit_memory = OVERCOMMIT_ALWAYS;
+ }
+}
+
+void free_initmem(void)
+{
+ free_reserved_area(lm_alias(__init_begin),
+ lm_alias(__init_end),
+ POISON_FREE_INITMEM, "unused kernel");
+ /*
+ * Unmap the __init region but leave the VM area in place. This
+ * prevents the region from being reused for kernel modules, which
+ * is not supported by kallsyms.
+ */
+ vunmap_range((u64)__init_begin, (u64)__init_end);
+}
+
+void dump_mem_limit(void)
+{
+ if (memory_limit != PHYS_ADDR_MAX) {
+ pr_emerg("Memory Limit: %llu MB\n", memory_limit >> 20);
+ } else {
+ pr_emerg("Memory Limit: none\n");
+ }
+}