From 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Tue, 21 Feb 2023 18:24:12 -0800 Subject: Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - Add dedicated kmem_cache for typical/small skb->head, avoid having to access struct page at kfree time, and improve memory use. - Introduce sysctl to set default RPS configuration for new netdevs. - Define Netlink protocol specification format which can be used to describe messages used by each family and auto-generate parsers. Add tools for generating kernel data structures and uAPI headers. - Expose all net/core sysctls inside netns. - Remove 4s sleep in netpoll if carrier is instantly detected on boot. - Add configurable limit of MDB entries per port, and port-vlan. - Continue populating drop reasons throughout the stack. - Retire a handful of legacy Qdiscs and classifiers. Protocols: - Support IPv4 big TCP (TSO frames larger than 64kB). - Add IP_LOCAL_PORT_RANGE socket option, to control local port range on socket by socket basis. - Track and report in procfs number of MPTCP sockets used. - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path manager. - IPv6: don't check net.ipv6.route.max_size and rely on garbage collection to free memory (similarly to IPv4). - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986). - ICMP: add per-rate limit counters. - Add support for user scanning requests in ieee802154. - Remove static WEP support. - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate reporting. - WiFi 7 EHT channel puncturing support (client & AP). BPF: - Add a rbtree data structure following the "next-gen data structure" precedent set by recently added linked list, that is, by using kfunc + kptr instead of adding a new BPF map type. - Expose XDP hints via kfuncs with initial support for RX hash and timestamp metadata. - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better support decap on GRE tunnel devices not operating in collect metadata. - Improve x86 JIT's codegen for PROBE_MEM runtime error checks. - Remove the need for trace_printk_lock for bpf_trace_printk and bpf_trace_vprintk helpers. - Extend libbpf's bpf_tracing.h support for tracing arguments of kprobes/uprobes and syscall as a special case. - Significantly reduce the search time for module symbols by livepatch and BPF. - Enable cpumasks to be used as kptrs, which is useful for tracing programs tracking which tasks end up running on which CPUs in different time intervals. - Add support for BPF trampoline on s390x and riscv64. - Add capability to export the XDP features supported by the NIC. - Add __bpf_kfunc tag for marking kernel functions as kfuncs. - Add cgroup.memory=nobpf kernel parameter option to disable BPF memory accounting for container environments. Netfilter: - Remove the CLUSTERIP target. It has been marked as obsolete for years, and we still have WARN splats wrt races of the out-of-band /proc interface installed by this target. - Add 'destroy' commands to nf_tables. They are identical to the existing 'delete' commands, but do not return an error if the referenced object (set, chain, rule...) did not exist. Driver API: - Improve cpumask_local_spread() locality to help NICs set the right IRQ affinity on AMD platforms. - Separate C22 and C45 MDIO bus transactions more clearly. - Introduce new DCB table to control DSCP rewrite on egress. - Support configuration of Physical Layer Collision Avoidance (PLCA) Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of shared medium Ethernet. - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing preemption of low priority frames by high priority frames. - Add support for controlling MACSec offload using netlink SET. - Rework devlink instance refcounts to allow registration and de-registration under the instance lock. Split the code into multiple files, drop some of the unnecessarily granular locks and factor out common parts of netlink operation handling. - Add TX frame aggregation parameters (for USB drivers). - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning messages with notifications for debug. - Allow offloading of UDP NEW connections via act_ct. - Add support for per action HW stats in TC. - Support hardware miss to TC action (continue processing in SW from a specific point in the action chain). - Warn if old Wireless Extension user space interface is used with modern cfg80211/mac80211 drivers. Do not support Wireless Extensions for Wi-Fi 7 devices at all. Everyone should switch to using nl80211 interface instead. - Improve the CAN bit timing configuration. Use extack to return error messages directly to user space, update the SJW handling, including the definition of a new default value that will benefit CAN-FD controllers, by increasing their oscillator tolerance. New hardware / drivers: - Ethernet: - nVidia BlueField-3 support (control traffic driver) - Ethernet support for imx93 SoCs - Motorcomm yt8531 gigabit Ethernet PHY - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA) - Microchip LAN8841 PHY (incl. cable diagnostics and PTP) - Amlogic gxl MDIO mux - WiFi: - RealTek RTL8188EU (rtl8xxxu) - Qualcomm Wi-Fi 7 devices (ath12k) - CAN: - Renesas R-Car V4H Drivers: - Bluetooth: - Set Per Platform Antenna Gain (PPAG) for Intel controllers. - Ethernet NICs: - Intel (1G, igc): - support TSN / Qbv / packet scheduling features of i226 model - Intel (100G, ice): - use GNSS subsystem instead of TTY - multi-buffer XDP support - extend support for GPIO pins to E823 devices - nVidia/Mellanox: - update the shared buffer configuration on PFC commands - implement PTP adjphase function for HW offset control - TC support for Geneve and GRE with VF tunnel offload - more efficient crypto key management method - multi-port eswitch support - Netronome/Corigine: - add DCB IEEE support - support IPsec offloading for NFP3800 - Freescale/NXP (enetc): - support XDP_REDIRECT for XDP non-linear buffers - improve reconfig, avoid link flap and waiting for idle - support MAC Merge layer - Other NICs: - sfc/ef100: add basic devlink support for ef100 - ionic: rx_push mode operation (writing descriptors via MMIO) - bnxt: use the auxiliary bus abstraction for RDMA - r8169: disable ASPM and reset bus in case of tx timeout - cpsw: support QSGMII mode for J721e CPSW9G - cpts: support pulse-per-second output - ngbe: add an mdio bus driver - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing - r8152: handle devices with FW with NCM support - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation - virtio-net: support multi buffer XDP - virtio/vsock: replace virtio_vsock_pkt with sk_buff - tsnep: XDP support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add support for latency TLV (in FW control messages) - Microchip (sparx5): - separate explicit and implicit traffic forwarding rules, make the implicit rules always active - add support for egress DSCP rewrite - IS0 VCAP support (Ingress Classification) - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS etc.) - ES2 VCAP support (Egress Access Control) - support for Per-Stream Filtering and Policing (802.1Q, 8.6.5.1) - Ethernet embedded switches: - Marvell (mv88e6xxx): - add MAB (port auth) offload support - enable PTP receive for mv88e6390 - NXP (ocelot): - support MAC Merge layer - support for the the vsc7512 internal copper phys - Microchip: - lan9303: convert to PHYLINK - lan966x: support TC flower filter statistics - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x - lan937x: support Credit Based Shaper configuration - ksz9477: support Energy Efficient Ethernet - other: - qca8k: convert to regmap read/write API, use bulk operations - rswitch: Improve TX timestamp accuracy - Intel WiFi (iwlwifi): - EHT (Wi-Fi 7) rate reporting - STEP equalizer support: transfer some STEP (connection to radio on platforms with integrated wifi) related parameters from the BIOS to the firmware. - Qualcomm 802.11ax WiFi (ath11k): - IPQ5018 support - Fine Timing Measurement (FTM) responder role support - channel 177 support - MediaTek WiFi (mt76): - per-PHY LED support - mt7996: EHT (Wi-Fi 7) support - Wireless Ethernet Dispatch (WED) reset support - switch to using page pool allocator - RealTek WiFi (rtw89): - support new version of Bluetooth co-existance - Mobile: - rmnet: support TX aggregation" * tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits) page_pool: add a comment explaining the fragment counter usage net: ethtool: fix __ethtool_dev_mm_supported() implementation ethtool: pse-pd: Fix double word in comments xsk: add linux/vmalloc.h to xsk.c sefltests: netdevsim: wait for devlink instance after netns removal selftest: fib_tests: Always cleanup before exit net/mlx5e: Align IPsec ASO result memory to be as required by hardware net/mlx5e: TC, Set CT miss to the specific ct action instance net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG net/mlx5: Refactor tc miss handling to a single function net/mlx5: Kconfig: Make tc offload depend on tc skb extension net/sched: flower: Support hardware miss to tc action net/sched: flower: Move filter handle initialization earlier net/sched: cls_api: Support hardware miss to tc action net/sched: Rename user cookie and act cookie sfc: fix builds without CONFIG_RTC_LIB sfc: clean up some inconsistent indentings net/mlx4_en: Introduce flexible array to silence overflow warning net: lan966x: Fix possible deadlock inside PTP net/ulp: Remove redundant ->clone() test in inet_clone_ulp(). ... --- lib/objagg.c | 1051 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1051 insertions(+) create mode 100644 lib/objagg.c (limited to 'lib/objagg.c') diff --git a/lib/objagg.c b/lib/objagg.c new file mode 100644 index 000000000..1e248629e --- /dev/null +++ b/lib/objagg.c @@ -0,0 +1,1051 @@ +// SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0 +/* Copyright (c) 2018 Mellanox Technologies. All rights reserved */ + +#include +#include +#include +#include +#include +#include +#include + +#define CREATE_TRACE_POINTS +#include + +struct objagg_hints { + struct rhashtable node_ht; + struct rhashtable_params ht_params; + struct list_head node_list; + unsigned int node_count; + unsigned int root_count; + unsigned int refcount; + const struct objagg_ops *ops; +}; + +struct objagg_hints_node { + struct rhash_head ht_node; /* member of objagg_hints->node_ht */ + struct list_head list; /* member of objagg_hints->node_list */ + struct objagg_hints_node *parent; + unsigned int root_id; + struct objagg_obj_stats_info stats_info; + unsigned long obj[]; +}; + +static struct objagg_hints_node * +objagg_hints_lookup(struct objagg_hints *objagg_hints, void *obj) +{ + if (!objagg_hints) + return NULL; + return rhashtable_lookup_fast(&objagg_hints->node_ht, obj, + objagg_hints->ht_params); +} + +struct objagg { + const struct objagg_ops *ops; + void *priv; + struct rhashtable obj_ht; + struct rhashtable_params ht_params; + struct list_head obj_list; + unsigned int obj_count; + struct ida root_ida; + struct objagg_hints *hints; +}; + +struct objagg_obj { + struct rhash_head ht_node; /* member of objagg->obj_ht */ + struct list_head list; /* member of objagg->obj_list */ + struct objagg_obj *parent; /* if the object is nested, this + * holds pointer to parent, otherwise NULL + */ + union { + void *delta_priv; /* user delta private */ + void *root_priv; /* user root private */ + }; + unsigned int root_id; + unsigned int refcount; /* counts number of users of this object + * including nested objects + */ + struct objagg_obj_stats stats; + unsigned long obj[]; +}; + +static unsigned int objagg_obj_ref_inc(struct objagg_obj *objagg_obj) +{ + return ++objagg_obj->refcount; +} + +static unsigned int objagg_obj_ref_dec(struct objagg_obj *objagg_obj) +{ + return --objagg_obj->refcount; +} + +static void objagg_obj_stats_inc(struct objagg_obj *objagg_obj) +{ + objagg_obj->stats.user_count++; + objagg_obj->stats.delta_user_count++; + if (objagg_obj->parent) + objagg_obj->parent->stats.delta_user_count++; +} + +static void objagg_obj_stats_dec(struct objagg_obj *objagg_obj) +{ + objagg_obj->stats.user_count--; + objagg_obj->stats.delta_user_count--; + if (objagg_obj->parent) + objagg_obj->parent->stats.delta_user_count--; +} + +static bool objagg_obj_is_root(const struct objagg_obj *objagg_obj) +{ + /* Nesting is not supported, so we can use ->parent + * to figure out if the object is root. + */ + return !objagg_obj->parent; +} + +/** + * objagg_obj_root_priv - obtains root private for an object + * @objagg_obj: objagg object instance + * + * Note: all locking must be provided by the caller. + * + * Either the object is root itself when the private is returned + * directly, or the parent is root and its private is returned + * instead. + * + * Returns a user private root pointer. + */ +const void *objagg_obj_root_priv(const struct objagg_obj *objagg_obj) +{ + if (objagg_obj_is_root(objagg_obj)) + return objagg_obj->root_priv; + WARN_ON(!objagg_obj_is_root(objagg_obj->parent)); + return objagg_obj->parent->root_priv; +} +EXPORT_SYMBOL(objagg_obj_root_priv); + +/** + * objagg_obj_delta_priv - obtains delta private for an object + * @objagg_obj: objagg object instance + * + * Note: all locking must be provided by the caller. + * + * Returns user private delta pointer or NULL in case the passed + * object is root. + */ +const void *objagg_obj_delta_priv(const struct objagg_obj *objagg_obj) +{ + if (objagg_obj_is_root(objagg_obj)) + return NULL; + return objagg_obj->delta_priv; +} +EXPORT_SYMBOL(objagg_obj_delta_priv); + +/** + * objagg_obj_raw - obtains object user private pointer + * @objagg_obj: objagg object instance + * + * Note: all locking must be provided by the caller. + * + * Returns user private pointer as was passed to objagg_obj_get() by "obj" arg. + */ +const void *objagg_obj_raw(const struct objagg_obj *objagg_obj) +{ + return objagg_obj->obj; +} +EXPORT_SYMBOL(objagg_obj_raw); + +static struct objagg_obj *objagg_obj_lookup(struct objagg *objagg, void *obj) +{ + return rhashtable_lookup_fast(&objagg->obj_ht, obj, objagg->ht_params); +} + +static int objagg_obj_parent_assign(struct objagg *objagg, + struct objagg_obj *objagg_obj, + struct objagg_obj *parent, + bool take_parent_ref) +{ + void *delta_priv; + + delta_priv = objagg->ops->delta_create(objagg->priv, parent->obj, + objagg_obj->obj); + if (IS_ERR(delta_priv)) + return PTR_ERR(delta_priv); + + /* User returned a delta private, that means that + * our object can be aggregated into the parent. + */ + objagg_obj->parent = parent; + objagg_obj->delta_priv = delta_priv; + if (take_parent_ref) + objagg_obj_ref_inc(objagg_obj->parent); + trace_objagg_obj_parent_assign(objagg, objagg_obj, + parent, + parent->refcount); + return 0; +} + +static int objagg_obj_parent_lookup_assign(struct objagg *objagg, + struct objagg_obj *objagg_obj) +{ + struct objagg_obj *objagg_obj_cur; + int err; + + list_for_each_entry(objagg_obj_cur, &objagg->obj_list, list) { + /* Nesting is not supported. In case the object + * is not root, it cannot be assigned as parent. + */ + if (!objagg_obj_is_root(objagg_obj_cur)) + continue; + err = objagg_obj_parent_assign(objagg, objagg_obj, + objagg_obj_cur, true); + if (!err) + return 0; + } + return -ENOENT; +} + +static void __objagg_obj_put(struct objagg *objagg, + struct objagg_obj *objagg_obj); + +static void objagg_obj_parent_unassign(struct objagg *objagg, + struct objagg_obj *objagg_obj) +{ + trace_objagg_obj_parent_unassign(objagg, objagg_obj, + objagg_obj->parent, + objagg_obj->parent->refcount); + objagg->ops->delta_destroy(objagg->priv, objagg_obj->delta_priv); + __objagg_obj_put(objagg, objagg_obj->parent); +} + +static int objagg_obj_root_id_alloc(struct objagg *objagg, + struct objagg_obj *objagg_obj, + struct objagg_hints_node *hnode) +{ + unsigned int min, max; + int root_id; + + /* In case there are no hints available, the root id is invalid. */ + if (!objagg->hints) { + objagg_obj->root_id = OBJAGG_OBJ_ROOT_ID_INVALID; + return 0; + } + + if (hnode) { + min = hnode->root_id; + max = hnode->root_id; + } else { + /* For objects with no hint, start after the last + * hinted root_id. + */ + min = objagg->hints->root_count; + max = ~0; + } + + root_id = ida_alloc_range(&objagg->root_ida, min, max, GFP_KERNEL); + + if (root_id < 0) + return root_id; + objagg_obj->root_id = root_id; + return 0; +} + +static void objagg_obj_root_id_free(struct objagg *objagg, + struct objagg_obj *objagg_obj) +{ + if (!objagg->hints) + return; + ida_free(&objagg->root_ida, objagg_obj->root_id); +} + +static int objagg_obj_root_create(struct objagg *objagg, + struct objagg_obj *objagg_obj, + struct objagg_hints_node *hnode) +{ + int err; + + err = objagg_obj_root_id_alloc(objagg, objagg_obj, hnode); + if (err) + return err; + objagg_obj->root_priv = objagg->ops->root_create(objagg->priv, + objagg_obj->obj, + objagg_obj->root_id); + if (IS_ERR(objagg_obj->root_priv)) { + err = PTR_ERR(objagg_obj->root_priv); + goto err_root_create; + } + trace_objagg_obj_root_create(objagg, objagg_obj); + return 0; + +err_root_create: + objagg_obj_root_id_free(objagg, objagg_obj); + return err; +} + +static void objagg_obj_root_destroy(struct objagg *objagg, + struct objagg_obj *objagg_obj) +{ + trace_objagg_obj_root_destroy(objagg, objagg_obj); + objagg->ops->root_destroy(objagg->priv, objagg_obj->root_priv); + objagg_obj_root_id_free(objagg, objagg_obj); +} + +static struct objagg_obj *__objagg_obj_get(struct objagg *objagg, void *obj); + +static int objagg_obj_init_with_hints(struct objagg *objagg, + struct objagg_obj *objagg_obj, + bool *hint_found) +{ + struct objagg_hints_node *hnode; + struct objagg_obj *parent; + int err; + + hnode = objagg_hints_lookup(objagg->hints, objagg_obj->obj); + if (!hnode) { + *hint_found = false; + return 0; + } + *hint_found = true; + + if (!hnode->parent) + return objagg_obj_root_create(objagg, objagg_obj, hnode); + + parent = __objagg_obj_get(objagg, hnode->parent->obj); + if (IS_ERR(parent)) + return PTR_ERR(parent); + + err = objagg_obj_parent_assign(objagg, objagg_obj, parent, false); + if (err) { + *hint_found = false; + err = 0; + goto err_parent_assign; + } + + return 0; + +err_parent_assign: + objagg_obj_put(objagg, parent); + return err; +} + +static int objagg_obj_init(struct objagg *objagg, + struct objagg_obj *objagg_obj) +{ + bool hint_found; + int err; + + /* First, try to use hints if they are available and + * if they provide result. + */ + err = objagg_obj_init_with_hints(objagg, objagg_obj, &hint_found); + if (err) + return err; + + if (hint_found) + return 0; + + /* Try to find if the object can be aggregated under an existing one. */ + err = objagg_obj_parent_lookup_assign(objagg, objagg_obj); + if (!err) + return 0; + /* If aggregation is not possible, make the object a root. */ + return objagg_obj_root_create(objagg, objagg_obj, NULL); +} + +static void objagg_obj_fini(struct objagg *objagg, + struct objagg_obj *objagg_obj) +{ + if (!objagg_obj_is_root(objagg_obj)) + objagg_obj_parent_unassign(objagg, objagg_obj); + else + objagg_obj_root_destroy(objagg, objagg_obj); +} + +static struct objagg_obj *objagg_obj_create(struct objagg *objagg, void *obj) +{ + struct objagg_obj *objagg_obj; + int err; + + objagg_obj = kzalloc(sizeof(*objagg_obj) + objagg->ops->obj_size, + GFP_KERNEL); + if (!objagg_obj) + return ERR_PTR(-ENOMEM); + objagg_obj_ref_inc(objagg_obj); + memcpy(objagg_obj->obj, obj, objagg->ops->obj_size); + + err = objagg_obj_init(objagg, objagg_obj); + if (err) + goto err_obj_init; + + err = rhashtable_insert_fast(&objagg->obj_ht, &objagg_obj->ht_node, + objagg->ht_params); + if (err) + goto err_ht_insert; + list_add(&objagg_obj->list, &objagg->obj_list); + objagg->obj_count++; + trace_objagg_obj_create(objagg, objagg_obj); + + return objagg_obj; + +err_ht_insert: + objagg_obj_fini(objagg, objagg_obj); +err_obj_init: + kfree(objagg_obj); + return ERR_PTR(err); +} + +static struct objagg_obj *__objagg_obj_get(struct objagg *objagg, void *obj) +{ + struct objagg_obj *objagg_obj; + + /* First, try to find the object exactly as user passed it, + * perhaps it is already in use. + */ + objagg_obj = objagg_obj_lookup(objagg, obj); + if (objagg_obj) { + objagg_obj_ref_inc(objagg_obj); + return objagg_obj; + } + + return objagg_obj_create(objagg, obj); +} + +/** + * objagg_obj_get - gets an object within objagg instance + * @objagg: objagg instance + * @obj: user-specific private object pointer + * + * Note: all locking must be provided by the caller. + * + * Size of the "obj" memory is specified in "objagg->ops". + * + * There are 3 main options this function wraps: + * 1) The object according to "obj" already exist. In that case + * the reference counter is incrementes and the object is returned. + * 2) The object does not exist, but it can be aggregated within + * another object. In that case, user ops->delta_create() is called + * to obtain delta data and a new object is created with returned + * user-delta private pointer. + * 3) The object does not exist and cannot be aggregated into + * any of the existing objects. In that case, user ops->root_create() + * is called to create the root and a new object is created with + * returned user-root private pointer. + * + * Returns a pointer to objagg object instance in case of success, + * otherwise it returns pointer error using ERR_PTR macro. + */ +struct objagg_obj *objagg_obj_get(struct objagg *objagg, void *obj) +{ + struct objagg_obj *objagg_obj; + + objagg_obj = __objagg_obj_get(objagg, obj); + if (IS_ERR(objagg_obj)) + return objagg_obj; + objagg_obj_stats_inc(objagg_obj); + trace_objagg_obj_get(objagg, objagg_obj, objagg_obj->refcount); + return objagg_obj; +} +EXPORT_SYMBOL(objagg_obj_get); + +static void objagg_obj_destroy(struct objagg *objagg, + struct objagg_obj *objagg_obj) +{ + trace_objagg_obj_destroy(objagg, objagg_obj); + --objagg->obj_count; + list_del(&objagg_obj->list); + rhashtable_remove_fast(&objagg->obj_ht, &objagg_obj->ht_node, + objagg->ht_params); + objagg_obj_fini(objagg, objagg_obj); + kfree(objagg_obj); +} + +static void __objagg_obj_put(struct objagg *objagg, + struct objagg_obj *objagg_obj) +{ + if (!objagg_obj_ref_dec(objagg_obj)) + objagg_obj_destroy(objagg, objagg_obj); +} + +/** + * objagg_obj_put - puts an object within objagg instance + * @objagg: objagg instance + * @objagg_obj: objagg object instance + * + * Note: all locking must be provided by the caller. + * + * Symmetric to objagg_obj_get(). + */ +void objagg_obj_put(struct objagg *objagg, struct objagg_obj *objagg_obj) +{ + trace_objagg_obj_put(objagg, objagg_obj, objagg_obj->refcount); + objagg_obj_stats_dec(objagg_obj); + __objagg_obj_put(objagg, objagg_obj); +} +EXPORT_SYMBOL(objagg_obj_put); + +/** + * objagg_create - creates a new objagg instance + * @ops: user-specific callbacks + * @objagg_hints: hints, can be NULL + * @priv: pointer to a private data passed to the ops + * + * Note: all locking must be provided by the caller. + * + * The purpose of the library is to provide an infrastructure to + * aggregate user-specified objects. Library does not care about the type + * of the object. User fills-up ops which take care of the specific + * user object manipulation. + * + * As a very stupid example, consider integer numbers. For example + * number 8 as a root object. That can aggregate number 9 with delta 1, + * number 10 with delta 2, etc. This example is implemented as + * a part of a testing module in test_objagg.c file. + * + * Each objagg instance contains multiple trees. Each tree node is + * represented by "an object". In the current implementation there can be + * only roots and leafs nodes. Leaf nodes are called deltas. + * But in general, this can be easily extended for intermediate nodes. + * In that extension, a delta would be associated with all non-root + * nodes. + * + * Returns a pointer to newly created objagg instance in case of success, + * otherwise it returns pointer error using ERR_PTR macro. + */ +struct objagg *objagg_create(const struct objagg_ops *ops, + struct objagg_hints *objagg_hints, void *priv) +{ + struct objagg *objagg; + int err; + + if (WARN_ON(!ops || !ops->root_create || !ops->root_destroy || + !ops->delta_check || !ops->delta_create || + !ops->delta_destroy)) + return ERR_PTR(-EINVAL); + + objagg = kzalloc(sizeof(*objagg), GFP_KERNEL); + if (!objagg) + return ERR_PTR(-ENOMEM); + objagg->ops = ops; + if (objagg_hints) { + objagg->hints = objagg_hints; + objagg_hints->refcount++; + } + objagg->priv = priv; + INIT_LIST_HEAD(&objagg->obj_list); + + objagg->ht_params.key_len = ops->obj_size; + objagg->ht_params.key_offset = offsetof(struct objagg_obj, obj); + objagg->ht_params.head_offset = offsetof(struct objagg_obj, ht_node); + + err = rhashtable_init(&objagg->obj_ht, &objagg->ht_params); + if (err) + goto err_rhashtable_init; + + ida_init(&objagg->root_ida); + + trace_objagg_create(objagg); + return objagg; + +err_rhashtable_init: + kfree(objagg); + return ERR_PTR(err); +} +EXPORT_SYMBOL(objagg_create); + +/** + * objagg_destroy - destroys a new objagg instance + * @objagg: objagg instance + * + * Note: all locking must be provided by the caller. + */ +void objagg_destroy(struct objagg *objagg) +{ + trace_objagg_destroy(objagg); + ida_destroy(&objagg->root_ida); + WARN_ON(!list_empty(&objagg->obj_list)); + rhashtable_destroy(&objagg->obj_ht); + if (objagg->hints) + objagg_hints_put(objagg->hints); + kfree(objagg); +} +EXPORT_SYMBOL(objagg_destroy); + +static int objagg_stats_info_sort_cmp_func(const void *a, const void *b) +{ + const struct objagg_obj_stats_info *stats_info1 = a; + const struct objagg_obj_stats_info *stats_info2 = b; + + if (stats_info1->is_root != stats_info2->is_root) + return stats_info2->is_root - stats_info1->is_root; + if (stats_info1->stats.delta_user_count != + stats_info2->stats.delta_user_count) + return stats_info2->stats.delta_user_count - + stats_info1->stats.delta_user_count; + return stats_info2->stats.user_count - stats_info1->stats.user_count; +} + +/** + * objagg_stats_get - obtains stats of the objagg instance + * @objagg: objagg instance + * + * Note: all locking must be provided by the caller. + * + * The returned structure contains statistics of all object + * currently in use, ordered by following rules: + * 1) Root objects are always on lower indexes than the rest. + * 2) Objects with higher delta user count are always on lower + * indexes. + * 3) In case more objects have the same delta user count, + * the objects are ordered by user count. + * + * Returns a pointer to stats instance in case of success, + * otherwise it returns pointer error using ERR_PTR macro. + */ +const struct objagg_stats *objagg_stats_get(struct objagg *objagg) +{ + struct objagg_stats *objagg_stats; + struct objagg_obj *objagg_obj; + int i; + + objagg_stats = kzalloc(struct_size(objagg_stats, stats_info, + objagg->obj_count), GFP_KERNEL); + if (!objagg_stats) + return ERR_PTR(-ENOMEM); + + i = 0; + list_for_each_entry(objagg_obj, &objagg->obj_list, list) { + memcpy(&objagg_stats->stats_info[i].stats, &objagg_obj->stats, + sizeof(objagg_stats->stats_info[0].stats)); + objagg_stats->stats_info[i].objagg_obj = objagg_obj; + objagg_stats->stats_info[i].is_root = + objagg_obj_is_root(objagg_obj); + if (objagg_stats->stats_info[i].is_root) + objagg_stats->root_count++; + i++; + } + objagg_stats->stats_info_count = i; + + sort(objagg_stats->stats_info, objagg_stats->stats_info_count, + sizeof(struct objagg_obj_stats_info), + objagg_stats_info_sort_cmp_func, NULL); + + return objagg_stats; +} +EXPORT_SYMBOL(objagg_stats_get); + +/** + * objagg_stats_put - puts stats of the objagg instance + * @objagg_stats: objagg instance stats + * + * Note: all locking must be provided by the caller. + */ +void objagg_stats_put(const struct objagg_stats *objagg_stats) +{ + kfree(objagg_stats); +} +EXPORT_SYMBOL(objagg_stats_put); + +static struct objagg_hints_node * +objagg_hints_node_create(struct objagg_hints *objagg_hints, + struct objagg_obj *objagg_obj, size_t obj_size, + struct objagg_hints_node *parent_hnode) +{ + unsigned int user_count = objagg_obj->stats.user_count; + struct objagg_hints_node *hnode; + int err; + + hnode = kzalloc(sizeof(*hnode) + obj_size, GFP_KERNEL); + if (!hnode) + return ERR_PTR(-ENOMEM); + memcpy(hnode->obj, &objagg_obj->obj, obj_size); + hnode->stats_info.stats.user_count = user_count; + hnode->stats_info.stats.delta_user_count = user_count; + if (parent_hnode) { + parent_hnode->stats_info.stats.delta_user_count += user_count; + } else { + hnode->root_id = objagg_hints->root_count++; + hnode->stats_info.is_root = true; + } + hnode->stats_info.objagg_obj = objagg_obj; + + err = rhashtable_insert_fast(&objagg_hints->node_ht, &hnode->ht_node, + objagg_hints->ht_params); + if (err) + goto err_ht_insert; + + list_add(&hnode->list, &objagg_hints->node_list); + hnode->parent = parent_hnode; + objagg_hints->node_count++; + + return hnode; + +err_ht_insert: + kfree(hnode); + return ERR_PTR(err); +} + +static void objagg_hints_flush(struct objagg_hints *objagg_hints) +{ + struct objagg_hints_node *hnode, *tmp; + + list_for_each_entry_safe(hnode, tmp, &objagg_hints->node_list, list) { + list_del(&hnode->list); + rhashtable_remove_fast(&objagg_hints->node_ht, &hnode->ht_node, + objagg_hints->ht_params); + kfree(hnode); + } +} + +struct objagg_tmp_node { + struct objagg_obj *objagg_obj; + bool crossed_out; +}; + +struct objagg_tmp_graph { + struct objagg_tmp_node *nodes; + unsigned long nodes_count; + unsigned long *edges; +}; + +static int objagg_tmp_graph_edge_index(struct objagg_tmp_graph *graph, + int parent_index, int index) +{ + return index * graph->nodes_count + parent_index; +} + +static void objagg_tmp_graph_edge_set(struct objagg_tmp_graph *graph, + int parent_index, int index) +{ + int edge_index = objagg_tmp_graph_edge_index(graph, index, + parent_index); + + __set_bit(edge_index, graph->edges); +} + +static bool objagg_tmp_graph_is_edge(struct objagg_tmp_graph *graph, + int parent_index, int index) +{ + int edge_index = objagg_tmp_graph_edge_index(graph, index, + parent_index); + + return test_bit(edge_index, graph->edges); +} + +static unsigned int objagg_tmp_graph_node_weight(struct objagg_tmp_graph *graph, + unsigned int index) +{ + struct objagg_tmp_node *node = &graph->nodes[index]; + unsigned int weight = node->objagg_obj->stats.user_count; + int j; + + /* Node weight is sum of node users and all other nodes users + * that this node can represent with delta. + */ + + for (j = 0; j < graph->nodes_count; j++) { + if (!objagg_tmp_graph_is_edge(graph, index, j)) + continue; + node = &graph->nodes[j]; + if (node->crossed_out) + continue; + weight += node->objagg_obj->stats.user_count; + } + return weight; +} + +static int objagg_tmp_graph_node_max_weight(struct objagg_tmp_graph *graph) +{ + struct objagg_tmp_node *node; + unsigned int max_weight = 0; + unsigned int weight; + int max_index = -1; + int i; + + for (i = 0; i < graph->nodes_count; i++) { + node = &graph->nodes[i]; + if (node->crossed_out) + continue; + weight = objagg_tmp_graph_node_weight(graph, i); + if (weight >= max_weight) { + max_weight = weight; + max_index = i; + } + } + return max_index; +} + +static struct objagg_tmp_graph *objagg_tmp_graph_create(struct objagg *objagg) +{ + unsigned int nodes_count = objagg->obj_count; + struct objagg_tmp_graph *graph; + struct objagg_tmp_node *node; + struct objagg_tmp_node *pnode; + struct objagg_obj *objagg_obj; + int i, j; + + graph = kzalloc(sizeof(*graph), GFP_KERNEL); + if (!graph) + return NULL; + + graph->nodes = kcalloc(nodes_count, sizeof(*graph->nodes), GFP_KERNEL); + if (!graph->nodes) + goto err_nodes_alloc; + graph->nodes_count = nodes_count; + + graph->edges = bitmap_zalloc(nodes_count * nodes_count, GFP_KERNEL); + if (!graph->edges) + goto err_edges_alloc; + + i = 0; + list_for_each_entry(objagg_obj, &objagg->obj_list, list) { + node = &graph->nodes[i++]; + node->objagg_obj = objagg_obj; + } + + /* Assemble a temporary graph. Insert edge X->Y in case Y can be + * in delta of X. + */ + for (i = 0; i < nodes_count; i++) { + for (j = 0; j < nodes_count; j++) { + if (i == j) + continue; + pnode = &graph->nodes[i]; + node = &graph->nodes[j]; + if (objagg->ops->delta_check(objagg->priv, + pnode->objagg_obj->obj, + node->objagg_obj->obj)) { + objagg_tmp_graph_edge_set(graph, i, j); + + } + } + } + return graph; + +err_edges_alloc: + kfree(graph->nodes); +err_nodes_alloc: + kfree(graph); + return NULL; +} + +static void objagg_tmp_graph_destroy(struct objagg_tmp_graph *graph) +{ + bitmap_free(graph->edges); + kfree(graph->nodes); + kfree(graph); +} + +static int +objagg_opt_simple_greedy_fillup_hints(struct objagg_hints *objagg_hints, + struct objagg *objagg) +{ + struct objagg_hints_node *hnode, *parent_hnode; + struct objagg_tmp_graph *graph; + struct objagg_tmp_node *node; + int index; + int j; + int err; + + graph = objagg_tmp_graph_create(objagg); + if (!graph) + return -ENOMEM; + + /* Find the nodes from the ones that can accommodate most users + * and cross them out of the graph. Save them to the hint list. + */ + while ((index = objagg_tmp_graph_node_max_weight(graph)) != -1) { + node = &graph->nodes[index]; + node->crossed_out = true; + hnode = objagg_hints_node_create(objagg_hints, + node->objagg_obj, + objagg->ops->obj_size, + NULL); + if (IS_ERR(hnode)) { + err = PTR_ERR(hnode); + goto out; + } + parent_hnode = hnode; + for (j = 0; j < graph->nodes_count; j++) { + if (!objagg_tmp_graph_is_edge(graph, index, j)) + continue; + node = &graph->nodes[j]; + if (node->crossed_out) + continue; + node->crossed_out = true; + hnode = objagg_hints_node_create(objagg_hints, + node->objagg_obj, + objagg->ops->obj_size, + parent_hnode); + if (IS_ERR(hnode)) { + err = PTR_ERR(hnode); + goto out; + } + } + } + + err = 0; +out: + objagg_tmp_graph_destroy(graph); + return err; +} + +struct objagg_opt_algo { + int (*fillup_hints)(struct objagg_hints *objagg_hints, + struct objagg *objagg); +}; + +static const struct objagg_opt_algo objagg_opt_simple_greedy = { + .fillup_hints = objagg_opt_simple_greedy_fillup_hints, +}; + + +static const struct objagg_opt_algo *objagg_opt_algos[] = { + [OBJAGG_OPT_ALGO_SIMPLE_GREEDY] = &objagg_opt_simple_greedy, +}; + +static int objagg_hints_obj_cmp(struct rhashtable_compare_arg *arg, + const void *obj) +{ + struct rhashtable *ht = arg->ht; + struct objagg_hints *objagg_hints = + container_of(ht, struct objagg_hints, node_ht); + const struct objagg_ops *ops = objagg_hints->ops; + const char *ptr = obj; + + ptr += ht->p.key_offset; + return ops->hints_obj_cmp ? ops->hints_obj_cmp(ptr, arg->key) : + memcmp(ptr, arg->key, ht->p.key_len); +} + +/** + * objagg_hints_get - obtains hints instance + * @objagg: objagg instance + * @opt_algo_type: type of hints finding algorithm + * + * Note: all locking must be provided by the caller. + * + * According to the algo type, the existing objects of objagg instance + * are going to be went-through to assemble an optimal tree. We call this + * tree hints. These hints can be later on used for creation of + * a new objagg instance. There, the future object creations are going + * to be consulted with these hints in order to find out, where exactly + * the new object should be put as a root or delta. + * + * Returns a pointer to hints instance in case of success, + * otherwise it returns pointer error using ERR_PTR macro. + */ +struct objagg_hints *objagg_hints_get(struct objagg *objagg, + enum objagg_opt_algo_type opt_algo_type) +{ + const struct objagg_opt_algo *algo = objagg_opt_algos[opt_algo_type]; + struct objagg_hints *objagg_hints; + int err; + + objagg_hints = kzalloc(sizeof(*objagg_hints), GFP_KERNEL); + if (!objagg_hints) + return ERR_PTR(-ENOMEM); + + objagg_hints->ops = objagg->ops; + objagg_hints->refcount = 1; + + INIT_LIST_HEAD(&objagg_hints->node_list); + + objagg_hints->ht_params.key_len = objagg->ops->obj_size; + objagg_hints->ht_params.key_offset = + offsetof(struct objagg_hints_node, obj); + objagg_hints->ht_params.head_offset = + offsetof(struct objagg_hints_node, ht_node); + objagg_hints->ht_params.obj_cmpfn = objagg_hints_obj_cmp; + + err = rhashtable_init(&objagg_hints->node_ht, &objagg_hints->ht_params); + if (err) + goto err_rhashtable_init; + + err = algo->fillup_hints(objagg_hints, objagg); + if (err) + goto err_fillup_hints; + + if (WARN_ON(objagg_hints->node_count != objagg->obj_count)) { + err = -EINVAL; + goto err_node_count_check; + } + + return objagg_hints; + +err_node_count_check: +err_fillup_hints: + objagg_hints_flush(objagg_hints); + rhashtable_destroy(&objagg_hints->node_ht); +err_rhashtable_init: + kfree(objagg_hints); + return ERR_PTR(err); +} +EXPORT_SYMBOL(objagg_hints_get); + +/** + * objagg_hints_put - puts hints instance + * @objagg_hints: objagg hints instance + * + * Note: all locking must be provided by the caller. + */ +void objagg_hints_put(struct objagg_hints *objagg_hints) +{ + if (--objagg_hints->refcount) + return; + objagg_hints_flush(objagg_hints); + rhashtable_destroy(&objagg_hints->node_ht); + kfree(objagg_hints); +} +EXPORT_SYMBOL(objagg_hints_put); + +/** + * objagg_hints_stats_get - obtains stats of the hints instance + * @objagg_hints: hints instance + * + * Note: all locking must be provided by the caller. + * + * The returned structure contains statistics of all objects + * currently in use, ordered by following rules: + * 1) Root objects are always on lower indexes than the rest. + * 2) Objects with higher delta user count are always on lower + * indexes. + * 3) In case multiple objects have the same delta user count, + * the objects are ordered by user count. + * + * Returns a pointer to stats instance in case of success, + * otherwise it returns pointer error using ERR_PTR macro. + */ +const struct objagg_stats * +objagg_hints_stats_get(struct objagg_hints *objagg_hints) +{ + struct objagg_stats *objagg_stats; + struct objagg_hints_node *hnode; + int i; + + objagg_stats = kzalloc(struct_size(objagg_stats, stats_info, + objagg_hints->node_count), + GFP_KERNEL); + if (!objagg_stats) + return ERR_PTR(-ENOMEM); + + i = 0; + list_for_each_entry(hnode, &objagg_hints->node_list, list) { + memcpy(&objagg_stats->stats_info[i], &hnode->stats_info, + sizeof(objagg_stats->stats_info[0])); + if (objagg_stats->stats_info[i].is_root) + objagg_stats->root_count++; + i++; + } + objagg_stats->stats_info_count = i; + + sort(objagg_stats->stats_info, objagg_stats->stats_info_count, + sizeof(struct objagg_obj_stats_info), + objagg_stats_info_sort_cmp_func, NULL); + + return objagg_stats; +} +EXPORT_SYMBOL(objagg_hints_stats_get); + +MODULE_LICENSE("Dual BSD/GPL"); +MODULE_AUTHOR("Jiri Pirko "); +MODULE_DESCRIPTION("Object aggregation manager"); -- cgit v1.2.3