diff options
author | 2023-02-21 18:24:12 -0800 | |
---|---|---|
committer | 2023-02-21 18:24:12 -0800 | |
commit | 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 (patch) | |
tree | cc5c2d0a898769fd59549594fedb3ee6f84e59a0 /Documentation/networking/vrf.rst | |
download | linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.tar.gz linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.zip |
Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextgrafted
Pull networking updates from Jakub Kicinski:
"Core:
- Add dedicated kmem_cache for typical/small skb->head, avoid having
to access struct page at kfree time, and improve memory use.
- Introduce sysctl to set default RPS configuration for new netdevs.
- Define Netlink protocol specification format which can be used to
describe messages used by each family and auto-generate parsers.
Add tools for generating kernel data structures and uAPI headers.
- Expose all net/core sysctls inside netns.
- Remove 4s sleep in netpoll if carrier is instantly detected on
boot.
- Add configurable limit of MDB entries per port, and port-vlan.
- Continue populating drop reasons throughout the stack.
- Retire a handful of legacy Qdiscs and classifiers.
Protocols:
- Support IPv4 big TCP (TSO frames larger than 64kB).
- Add IP_LOCAL_PORT_RANGE socket option, to control local port range
on socket by socket basis.
- Track and report in procfs number of MPTCP sockets used.
- Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path
manager.
- IPv6: don't check net.ipv6.route.max_size and rely on garbage
collection to free memory (similarly to IPv4).
- Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).
- ICMP: add per-rate limit counters.
- Add support for user scanning requests in ieee802154.
- Remove static WEP support.
- Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
reporting.
- WiFi 7 EHT channel puncturing support (client & AP).
BPF:
- Add a rbtree data structure following the "next-gen data structure"
precedent set by recently added linked list, that is, by using
kfunc + kptr instead of adding a new BPF map type.
- Expose XDP hints via kfuncs with initial support for RX hash and
timestamp metadata.
- Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to
better support decap on GRE tunnel devices not operating in collect
metadata.
- Improve x86 JIT's codegen for PROBE_MEM runtime error checks.
- Remove the need for trace_printk_lock for bpf_trace_printk and
bpf_trace_vprintk helpers.
- Extend libbpf's bpf_tracing.h support for tracing arguments of
kprobes/uprobes and syscall as a special case.
- Significantly reduce the search time for module symbols by
livepatch and BPF.
- Enable cpumasks to be used as kptrs, which is useful for tracing
programs tracking which tasks end up running on which CPUs in
different time intervals.
- Add support for BPF trampoline on s390x and riscv64.
- Add capability to export the XDP features supported by the NIC.
- Add __bpf_kfunc tag for marking kernel functions as kfuncs.
- Add cgroup.memory=nobpf kernel parameter option to disable BPF
memory accounting for container environments.
Netfilter:
- Remove the CLUSTERIP target. It has been marked as obsolete for
years, and we still have WARN splats wrt races of the out-of-band
/proc interface installed by this target.
- Add 'destroy' commands to nf_tables. They are identical to the
existing 'delete' commands, but do not return an error if the
referenced object (set, chain, rule...) did not exist.
Driver API:
- Improve cpumask_local_spread() locality to help NICs set the right
IRQ affinity on AMD platforms.
- Separate C22 and C45 MDIO bus transactions more clearly.
- Introduce new DCB table to control DSCP rewrite on egress.
- Support configuration of Physical Layer Collision Avoidance (PLCA)
Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
shared medium Ethernet.
- Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
preemption of low priority frames by high priority frames.
- Add support for controlling MACSec offload using netlink SET.
- Rework devlink instance refcounts to allow registration and
de-registration under the instance lock. Split the code into
multiple files, drop some of the unnecessarily granular locks and
factor out common parts of netlink operation handling.
- Add TX frame aggregation parameters (for USB drivers).
- Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
messages with notifications for debug.
- Allow offloading of UDP NEW connections via act_ct.
- Add support for per action HW stats in TC.
- Support hardware miss to TC action (continue processing in SW from
a specific point in the action chain).
- Warn if old Wireless Extension user space interface is used with
modern cfg80211/mac80211 drivers. Do not support Wireless
Extensions for Wi-Fi 7 devices at all. Everyone should switch to
using nl80211 interface instead.
- Improve the CAN bit timing configuration. Use extack to return
error messages directly to user space, update the SJW handling,
including the definition of a new default value that will benefit
CAN-FD controllers, by increasing their oscillator tolerance.
New hardware / drivers:
- Ethernet:
- nVidia BlueField-3 support (control traffic driver)
- Ethernet support for imx93 SoCs
- Motorcomm yt8531 gigabit Ethernet PHY
- onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
- Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
- Amlogic gxl MDIO mux
- WiFi:
- RealTek RTL8188EU (rtl8xxxu)
- Qualcomm Wi-Fi 7 devices (ath12k)
- CAN:
- Renesas R-Car V4H
Drivers:
- Bluetooth:
- Set Per Platform Antenna Gain (PPAG) for Intel controllers.
- Ethernet NICs:
- Intel (1G, igc):
- support TSN / Qbv / packet scheduling features of i226 model
- Intel (100G, ice):
- use GNSS subsystem instead of TTY
- multi-buffer XDP support
- extend support for GPIO pins to E823 devices
- nVidia/Mellanox:
- update the shared buffer configuration on PFC commands
- implement PTP adjphase function for HW offset control
- TC support for Geneve and GRE with VF tunnel offload
- more efficient crypto key management method
- multi-port eswitch support
- Netronome/Corigine:
- add DCB IEEE support
- support IPsec offloading for NFP3800
- Freescale/NXP (enetc):
- support XDP_REDIRECT for XDP non-linear buffers
- improve reconfig, avoid link flap and waiting for idle
- support MAC Merge layer
- Other NICs:
- sfc/ef100: add basic devlink support for ef100
- ionic: rx_push mode operation (writing descriptors via MMIO)
- bnxt: use the auxiliary bus abstraction for RDMA
- r8169: disable ASPM and reset bus in case of tx timeout
- cpsw: support QSGMII mode for J721e CPSW9G
- cpts: support pulse-per-second output
- ngbe: add an mdio bus driver
- usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
- r8152: handle devices with FW with NCM support
- amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
- virtio-net: support multi buffer XDP
- virtio/vsock: replace virtio_vsock_pkt with sk_buff
- tsnep: XDP support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add support for latency TLV (in FW control messages)
- Microchip (sparx5):
- separate explicit and implicit traffic forwarding rules, make
the implicit rules always active
- add support for egress DSCP rewrite
- IS0 VCAP support (Ingress Classification)
- IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS
etc.)
- ES2 VCAP support (Egress Access Control)
- support for Per-Stream Filtering and Policing (802.1Q,
8.6.5.1)
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- add MAB (port auth) offload support
- enable PTP receive for mv88e6390
- NXP (ocelot):
- support MAC Merge layer
- support for the the vsc7512 internal copper phys
- Microchip:
- lan9303: convert to PHYLINK
- lan966x: support TC flower filter statistics
- lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
- lan937x: support Credit Based Shaper configuration
- ksz9477: support Energy Efficient Ethernet
- other:
- qca8k: convert to regmap read/write API, use bulk operations
- rswitch: Improve TX timestamp accuracy
- Intel WiFi (iwlwifi):
- EHT (Wi-Fi 7) rate reporting
- STEP equalizer support: transfer some STEP (connection to radio
on platforms with integrated wifi) related parameters from the
BIOS to the firmware.
- Qualcomm 802.11ax WiFi (ath11k):
- IPQ5018 support
- Fine Timing Measurement (FTM) responder role support
- channel 177 support
- MediaTek WiFi (mt76):
- per-PHY LED support
- mt7996: EHT (Wi-Fi 7) support
- Wireless Ethernet Dispatch (WED) reset support
- switch to using page pool allocator
- RealTek WiFi (rtw89):
- support new version of Bluetooth co-existance
- Mobile:
- rmnet: support TX aggregation"
* tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits)
page_pool: add a comment explaining the fragment counter usage
net: ethtool: fix __ethtool_dev_mm_supported() implementation
ethtool: pse-pd: Fix double word in comments
xsk: add linux/vmalloc.h to xsk.c
sefltests: netdevsim: wait for devlink instance after netns removal
selftest: fib_tests: Always cleanup before exit
net/mlx5e: Align IPsec ASO result memory to be as required by hardware
net/mlx5e: TC, Set CT miss to the specific ct action instance
net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
net/mlx5: Refactor tc miss handling to a single function
net/mlx5: Kconfig: Make tc offload depend on tc skb extension
net/sched: flower: Support hardware miss to tc action
net/sched: flower: Move filter handle initialization earlier
net/sched: cls_api: Support hardware miss to tc action
net/sched: Rename user cookie and act cookie
sfc: fix builds without CONFIG_RTC_LIB
sfc: clean up some inconsistent indentings
net/mlx4_en: Introduce flexible array to silence overflow warning
net: lan966x: Fix possible deadlock inside PTP
net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
...
Diffstat (limited to 'Documentation/networking/vrf.rst')
-rw-r--r-- | Documentation/networking/vrf.rst | 464 |
1 files changed, 464 insertions, 0 deletions
diff --git a/Documentation/networking/vrf.rst b/Documentation/networking/vrf.rst new file mode 100644 index 000000000..0a9a6f968 --- /dev/null +++ b/Documentation/networking/vrf.rst @@ -0,0 +1,464 @@ +.. SPDX-License-Identifier: GPL-2.0 + +==================================== +Virtual Routing and Forwarding (VRF) +==================================== + +The VRF Device +============== + +The VRF device combined with ip rules provides the ability to create virtual +routing and forwarding domains (aka VRFs, VRF-lite to be specific) in the +Linux network stack. One use case is the multi-tenancy problem where each +tenant has their own unique routing tables and in the very least need +different default gateways. + +Processes can be "VRF aware" by binding a socket to the VRF device. Packets +through the socket then use the routing table associated with the VRF +device. An important feature of the VRF device implementation is that it +impacts only Layer 3 and above so L2 tools (e.g., LLDP) are not affected +(ie., they do not need to be run in each VRF). The design also allows +the use of higher priority ip rules (Policy Based Routing, PBR) to take +precedence over the VRF device rules directing specific traffic as desired. + +In addition, VRF devices allow VRFs to be nested within namespaces. For +example network namespaces provide separation of network interfaces at the +device layer, VLANs on the interfaces within a namespace provide L2 separation +and then VRF devices provide L3 separation. + +Design +------ +A VRF device is created with an associated route table. Network interfaces +are then enslaved to a VRF device:: + + +-----------------------------+ + | vrf-blue | ===> route table 10 + +-----------------------------+ + | | | + +------+ +------+ +-------------+ + | eth1 | | eth2 | ... | bond1 | + +------+ +------+ +-------------+ + | | + +------+ +------+ + | eth8 | | eth9 | + +------+ +------+ + +Packets received on an enslaved device and are switched to the VRF device +in the IPv4 and IPv6 processing stacks giving the impression that packets +flow through the VRF device. Similarly on egress routing rules are used to +send packets to the VRF device driver before getting sent out the actual +interface. This allows tcpdump on a VRF device to capture all packets into +and out of the VRF as a whole\ [1]_. Similarly, netfilter\ [2]_ and tc rules +can be applied using the VRF device to specify rules that apply to the VRF +domain as a whole. + +.. [1] Packets in the forwarded state do not flow through the device, so those + packets are not seen by tcpdump. Will revisit this limitation in a + future release. + +.. [2] Iptables on ingress supports PREROUTING with skb->dev set to the real + ingress device and both INPUT and PREROUTING rules with skb->dev set to + the VRF device. For egress POSTROUTING and OUTPUT rules can be written + using either the VRF device or real egress device. + +Setup +----- +1. VRF device is created with an association to a FIB table. + e.g,:: + + ip link add vrf-blue type vrf table 10 + ip link set dev vrf-blue up + +2. An l3mdev FIB rule directs lookups to the table associated with the device. + A single l3mdev rule is sufficient for all VRFs. The VRF device adds the + l3mdev rule for IPv4 and IPv6 when the first device is created with a + default preference of 1000. Users may delete the rule if desired and add + with a different priority or install per-VRF rules. + + Prior to the v4.8 kernel iif and oif rules are needed for each VRF device:: + + ip ru add oif vrf-blue table 10 + ip ru add iif vrf-blue table 10 + +3. Set the default route for the table (and hence default route for the VRF):: + + ip route add table 10 unreachable default metric 4278198272 + + This high metric value ensures that the default unreachable route can + be overridden by a routing protocol suite. FRRouting interprets + kernel metrics as a combined admin distance (upper byte) and priority + (lower 3 bytes). Thus the above metric translates to [255/8192]. + +4. Enslave L3 interfaces to a VRF device:: + + ip link set dev eth1 master vrf-blue + + Local and connected routes for enslaved devices are automatically moved to + the table associated with VRF device. Any additional routes depending on + the enslaved device are dropped and will need to be reinserted to the VRF + FIB table following the enslavement. + + The IPv6 sysctl option keep_addr_on_down can be enabled to keep IPv6 global + addresses as VRF enslavement changes:: + + sysctl -w net.ipv6.conf.all.keep_addr_on_down=1 + +5. Additional VRF routes are added to associated table:: + + ip route add table 10 ... + + +Applications +------------ +Applications that are to work within a VRF need to bind their socket to the +VRF device:: + + setsockopt(sd, SOL_SOCKET, SO_BINDTODEVICE, dev, strlen(dev)+1); + +or to specify the output device using cmsg and IP_PKTINFO. + +By default the scope of the port bindings for unbound sockets is +limited to the default VRF. That is, it will not be matched by packets +arriving on interfaces enslaved to an l3mdev and processes may bind to +the same port if they bind to an l3mdev. + +TCP & UDP services running in the default VRF context (ie., not bound +to any VRF device) can work across all VRF domains by enabling the +tcp_l3mdev_accept and udp_l3mdev_accept sysctl options:: + + sysctl -w net.ipv4.tcp_l3mdev_accept=1 + sysctl -w net.ipv4.udp_l3mdev_accept=1 + +These options are disabled by default so that a socket in a VRF is only +selected for packets in that VRF. There is a similar option for RAW +sockets, which is enabled by default for reasons of backwards compatibility. +This is so as to specify the output device with cmsg and IP_PKTINFO, but +using a socket not bound to the corresponding VRF. This allows e.g. older ping +implementations to be run with specifying the device but without executing it +in the VRF. This option can be disabled so that packets received in a VRF +context are only handled by a raw socket bound to the VRF, and packets in the +default VRF are only handled by a socket not bound to any VRF:: + + sysctl -w net.ipv4.raw_l3mdev_accept=0 + +netfilter rules on the VRF device can be used to limit access to services +running in the default VRF context as well. + +Using VRF-aware applications (applications which simultaneously create sockets +outside and inside VRFs) in conjunction with ``net.ipv4.tcp_l3mdev_accept=1`` +is possible but may lead to problems in some situations. With that sysctl +value, it is unspecified which listening socket will be selected to handle +connections for VRF traffic; ie. either a socket bound to the VRF or an unbound +socket may be used to accept new connections from a VRF. This somewhat +unexpected behavior can lead to problems if sockets are configured with extra +options (ex. TCP MD5 keys) with the expectation that VRF traffic will +exclusively be handled by sockets bound to VRFs, as would be the case with +``net.ipv4.tcp_l3mdev_accept=0``. Finally and as a reminder, regardless of +which listening socket is selected, established sockets will be created in the +VRF based on the ingress interface, as documented earlier. + +-------------------------------------------------------------------------------- + +Using iproute2 for VRFs +======================= +iproute2 supports the vrf keyword as of v4.7. For backwards compatibility this +section lists both commands where appropriate -- with the vrf keyword and the +older form without it. + +1. Create a VRF + + To instantiate a VRF device and associate it with a table:: + + $ ip link add dev NAME type vrf table ID + + As of v4.8 the kernel supports the l3mdev FIB rule where a single rule + covers all VRFs. The l3mdev rule is created for IPv4 and IPv6 on first + device create. + +2. List VRFs + + To list VRFs that have been created:: + + $ ip [-d] link show type vrf + NOTE: The -d option is needed to show the table id + + For example:: + + $ ip -d link show type vrf + 11: mgmt: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 + link/ether 72:b3:ba:91:e2:24 brd ff:ff:ff:ff:ff:ff promiscuity 0 + vrf table 1 addrgenmode eui64 + 12: red: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 + link/ether b6:6f:6e:f6:da:73 brd ff:ff:ff:ff:ff:ff promiscuity 0 + vrf table 10 addrgenmode eui64 + 13: blue: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 + link/ether 36:62:e8:7d:bb:8c brd ff:ff:ff:ff:ff:ff promiscuity 0 + vrf table 66 addrgenmode eui64 + 14: green: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 + link/ether e6:28:b8:63:70:bb brd ff:ff:ff:ff:ff:ff promiscuity 0 + vrf table 81 addrgenmode eui64 + + + Or in brief output:: + + $ ip -br link show type vrf + mgmt UP 72:b3:ba:91:e2:24 <NOARP,MASTER,UP,LOWER_UP> + red UP b6:6f:6e:f6:da:73 <NOARP,MASTER,UP,LOWER_UP> + blue UP 36:62:e8:7d:bb:8c <NOARP,MASTER,UP,LOWER_UP> + green UP e6:28:b8:63:70:bb <NOARP,MASTER,UP,LOWER_UP> + + +3. Assign a Network Interface to a VRF + + Network interfaces are assigned to a VRF by enslaving the netdevice to a + VRF device:: + + $ ip link set dev NAME master NAME + + On enslavement connected and local routes are automatically moved to the + table associated with the VRF device. + + For example:: + + $ ip link set dev eth0 master mgmt + + +4. Show Devices Assigned to a VRF + + To show devices that have been assigned to a specific VRF add the master + option to the ip command:: + + $ ip link show vrf NAME + $ ip link show master NAME + + For example:: + + $ ip link show vrf red + 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP mode DEFAULT group default qlen 1000 + link/ether 02:00:00:00:02:02 brd ff:ff:ff:ff:ff:ff + 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP mode DEFAULT group default qlen 1000 + link/ether 02:00:00:00:02:03 brd ff:ff:ff:ff:ff:ff + 7: eth5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master red state DOWN mode DEFAULT group default qlen 1000 + link/ether 02:00:00:00:02:06 brd ff:ff:ff:ff:ff:ff + + + Or using the brief output:: + + $ ip -br link show vrf red + eth1 UP 02:00:00:00:02:02 <BROADCAST,MULTICAST,UP,LOWER_UP> + eth2 UP 02:00:00:00:02:03 <BROADCAST,MULTICAST,UP,LOWER_UP> + eth5 DOWN 02:00:00:00:02:06 <BROADCAST,MULTICAST> + + +5. Show Neighbor Entries for a VRF + + To list neighbor entries associated with devices enslaved to a VRF device + add the master option to the ip command:: + + $ ip [-6] neigh show vrf NAME + $ ip [-6] neigh show master NAME + + For example:: + + $ ip neigh show vrf red + 10.2.1.254 dev eth1 lladdr a6:d9:c7:4f:06:23 REACHABLE + 10.2.2.254 dev eth2 lladdr 5e:54:01:6a:ee:80 REACHABLE + + $ ip -6 neigh show vrf red + 2002:1::64 dev eth1 lladdr a6:d9:c7:4f:06:23 REACHABLE + + +6. Show Addresses for a VRF + + To show addresses for interfaces associated with a VRF add the master + option to the ip command:: + + $ ip addr show vrf NAME + $ ip addr show master NAME + + For example:: + + $ ip addr show vrf red + 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000 + link/ether 02:00:00:00:02:02 brd ff:ff:ff:ff:ff:ff + inet 10.2.1.2/24 brd 10.2.1.255 scope global eth1 + valid_lft forever preferred_lft forever + inet6 2002:1::2/120 scope global + valid_lft forever preferred_lft forever + inet6 fe80::ff:fe00:202/64 scope link + valid_lft forever preferred_lft forever + 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000 + link/ether 02:00:00:00:02:03 brd ff:ff:ff:ff:ff:ff + inet 10.2.2.2/24 brd 10.2.2.255 scope global eth2 + valid_lft forever preferred_lft forever + inet6 2002:2::2/120 scope global + valid_lft forever preferred_lft forever + inet6 fe80::ff:fe00:203/64 scope link + valid_lft forever preferred_lft forever + 7: eth5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master red state DOWN group default qlen 1000 + link/ether 02:00:00:00:02:06 brd ff:ff:ff:ff:ff:ff + + Or in brief format:: + + $ ip -br addr show vrf red + eth1 UP 10.2.1.2/24 2002:1::2/120 fe80::ff:fe00:202/64 + eth2 UP 10.2.2.2/24 2002:2::2/120 fe80::ff:fe00:203/64 + eth5 DOWN + + +7. Show Routes for a VRF + + To show routes for a VRF use the ip command to display the table associated + with the VRF device:: + + $ ip [-6] route show vrf NAME + $ ip [-6] route show table ID + + For example:: + + $ ip route show vrf red + unreachable default metric 4278198272 + broadcast 10.2.1.0 dev eth1 proto kernel scope link src 10.2.1.2 + 10.2.1.0/24 dev eth1 proto kernel scope link src 10.2.1.2 + local 10.2.1.2 dev eth1 proto kernel scope host src 10.2.1.2 + broadcast 10.2.1.255 dev eth1 proto kernel scope link src 10.2.1.2 + broadcast 10.2.2.0 dev eth2 proto kernel scope link src 10.2.2.2 + 10.2.2.0/24 dev eth2 proto kernel scope link src 10.2.2.2 + local 10.2.2.2 dev eth2 proto kernel scope host src 10.2.2.2 + broadcast 10.2.2.255 dev eth2 proto kernel scope link src 10.2.2.2 + + $ ip -6 route show vrf red + local 2002:1:: dev lo proto none metric 0 pref medium + local 2002:1::2 dev lo proto none metric 0 pref medium + 2002:1::/120 dev eth1 proto kernel metric 256 pref medium + local 2002:2:: dev lo proto none metric 0 pref medium + local 2002:2::2 dev lo proto none metric 0 pref medium + 2002:2::/120 dev eth2 proto kernel metric 256 pref medium + local fe80:: dev lo proto none metric 0 pref medium + local fe80:: dev lo proto none metric 0 pref medium + local fe80::ff:fe00:202 dev lo proto none metric 0 pref medium + local fe80::ff:fe00:203 dev lo proto none metric 0 pref medium + fe80::/64 dev eth1 proto kernel metric 256 pref medium + fe80::/64 dev eth2 proto kernel metric 256 pref medium + ff00::/8 dev red metric 256 pref medium + ff00::/8 dev eth1 metric 256 pref medium + ff00::/8 dev eth2 metric 256 pref medium + unreachable default dev lo metric 4278198272 error -101 pref medium + +8. Route Lookup for a VRF + + A test route lookup can be done for a VRF:: + + $ ip [-6] route get vrf NAME ADDRESS + $ ip [-6] route get oif NAME ADDRESS + + For example:: + + $ ip route get 10.2.1.40 vrf red + 10.2.1.40 dev eth1 table red src 10.2.1.2 + cache + + $ ip -6 route get 2002:1::32 vrf red + 2002:1::32 from :: dev eth1 table red proto kernel src 2002:1::2 metric 256 pref medium + + +9. Removing Network Interface from a VRF + + Network interfaces are removed from a VRF by breaking the enslavement to + the VRF device:: + + $ ip link set dev NAME nomaster + + Connected routes are moved back to the default table and local entries are + moved to the local table. + + For example:: + + $ ip link set dev eth0 nomaster + +-------------------------------------------------------------------------------- + +Commands used in this example:: + + cat >> /etc/iproute2/rt_tables.d/vrf.conf <<EOF + 1 mgmt + 10 red + 66 blue + 81 green + EOF + + function vrf_create + { + VRF=$1 + TBID=$2 + + # create VRF device + ip link add ${VRF} type vrf table ${TBID} + + if [ "${VRF}" != "mgmt" ]; then + ip route add table ${TBID} unreachable default metric 4278198272 + fi + ip link set dev ${VRF} up + } + + vrf_create mgmt 1 + ip link set dev eth0 master mgmt + + vrf_create red 10 + ip link set dev eth1 master red + ip link set dev eth2 master red + ip link set dev eth5 master red + + vrf_create blue 66 + ip link set dev eth3 master blue + + vrf_create green 81 + ip link set dev eth4 master green + + + Interface addresses from /etc/network/interfaces: + auto eth0 + iface eth0 inet static + address 10.0.0.2 + netmask 255.255.255.0 + gateway 10.0.0.254 + + iface eth0 inet6 static + address 2000:1::2 + netmask 120 + + auto eth1 + iface eth1 inet static + address 10.2.1.2 + netmask 255.255.255.0 + + iface eth1 inet6 static + address 2002:1::2 + netmask 120 + + auto eth2 + iface eth2 inet static + address 10.2.2.2 + netmask 255.255.255.0 + + iface eth2 inet6 static + address 2002:2::2 + netmask 120 + + auto eth3 + iface eth3 inet static + address 10.2.3.2 + netmask 255.255.255.0 + + iface eth3 inet6 static + address 2002:3::2 + netmask 120 + + auto eth4 + iface eth4 inet static + address 10.2.4.2 + netmask 255.255.255.0 + + iface eth4 inet6 static + address 2002:4::2 + netmask 120 |