diff options
author | 2023-02-21 18:24:12 -0800 | |
---|---|---|
committer | 2023-02-21 18:24:12 -0800 | |
commit | 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 (patch) | |
tree | cc5c2d0a898769fd59549594fedb3ee6f84e59a0 /Documentation/networking/dsa/sja1105.rst | |
download | linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.tar.gz linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.zip |
Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextgrafted
Pull networking updates from Jakub Kicinski:
"Core:
- Add dedicated kmem_cache for typical/small skb->head, avoid having
to access struct page at kfree time, and improve memory use.
- Introduce sysctl to set default RPS configuration for new netdevs.
- Define Netlink protocol specification format which can be used to
describe messages used by each family and auto-generate parsers.
Add tools for generating kernel data structures and uAPI headers.
- Expose all net/core sysctls inside netns.
- Remove 4s sleep in netpoll if carrier is instantly detected on
boot.
- Add configurable limit of MDB entries per port, and port-vlan.
- Continue populating drop reasons throughout the stack.
- Retire a handful of legacy Qdiscs and classifiers.
Protocols:
- Support IPv4 big TCP (TSO frames larger than 64kB).
- Add IP_LOCAL_PORT_RANGE socket option, to control local port range
on socket by socket basis.
- Track and report in procfs number of MPTCP sockets used.
- Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path
manager.
- IPv6: don't check net.ipv6.route.max_size and rely on garbage
collection to free memory (similarly to IPv4).
- Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).
- ICMP: add per-rate limit counters.
- Add support for user scanning requests in ieee802154.
- Remove static WEP support.
- Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
reporting.
- WiFi 7 EHT channel puncturing support (client & AP).
BPF:
- Add a rbtree data structure following the "next-gen data structure"
precedent set by recently added linked list, that is, by using
kfunc + kptr instead of adding a new BPF map type.
- Expose XDP hints via kfuncs with initial support for RX hash and
timestamp metadata.
- Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to
better support decap on GRE tunnel devices not operating in collect
metadata.
- Improve x86 JIT's codegen for PROBE_MEM runtime error checks.
- Remove the need for trace_printk_lock for bpf_trace_printk and
bpf_trace_vprintk helpers.
- Extend libbpf's bpf_tracing.h support for tracing arguments of
kprobes/uprobes and syscall as a special case.
- Significantly reduce the search time for module symbols by
livepatch and BPF.
- Enable cpumasks to be used as kptrs, which is useful for tracing
programs tracking which tasks end up running on which CPUs in
different time intervals.
- Add support for BPF trampoline on s390x and riscv64.
- Add capability to export the XDP features supported by the NIC.
- Add __bpf_kfunc tag for marking kernel functions as kfuncs.
- Add cgroup.memory=nobpf kernel parameter option to disable BPF
memory accounting for container environments.
Netfilter:
- Remove the CLUSTERIP target. It has been marked as obsolete for
years, and we still have WARN splats wrt races of the out-of-band
/proc interface installed by this target.
- Add 'destroy' commands to nf_tables. They are identical to the
existing 'delete' commands, but do not return an error if the
referenced object (set, chain, rule...) did not exist.
Driver API:
- Improve cpumask_local_spread() locality to help NICs set the right
IRQ affinity on AMD platforms.
- Separate C22 and C45 MDIO bus transactions more clearly.
- Introduce new DCB table to control DSCP rewrite on egress.
- Support configuration of Physical Layer Collision Avoidance (PLCA)
Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
shared medium Ethernet.
- Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
preemption of low priority frames by high priority frames.
- Add support for controlling MACSec offload using netlink SET.
- Rework devlink instance refcounts to allow registration and
de-registration under the instance lock. Split the code into
multiple files, drop some of the unnecessarily granular locks and
factor out common parts of netlink operation handling.
- Add TX frame aggregation parameters (for USB drivers).
- Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
messages with notifications for debug.
- Allow offloading of UDP NEW connections via act_ct.
- Add support for per action HW stats in TC.
- Support hardware miss to TC action (continue processing in SW from
a specific point in the action chain).
- Warn if old Wireless Extension user space interface is used with
modern cfg80211/mac80211 drivers. Do not support Wireless
Extensions for Wi-Fi 7 devices at all. Everyone should switch to
using nl80211 interface instead.
- Improve the CAN bit timing configuration. Use extack to return
error messages directly to user space, update the SJW handling,
including the definition of a new default value that will benefit
CAN-FD controllers, by increasing their oscillator tolerance.
New hardware / drivers:
- Ethernet:
- nVidia BlueField-3 support (control traffic driver)
- Ethernet support for imx93 SoCs
- Motorcomm yt8531 gigabit Ethernet PHY
- onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
- Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
- Amlogic gxl MDIO mux
- WiFi:
- RealTek RTL8188EU (rtl8xxxu)
- Qualcomm Wi-Fi 7 devices (ath12k)
- CAN:
- Renesas R-Car V4H
Drivers:
- Bluetooth:
- Set Per Platform Antenna Gain (PPAG) for Intel controllers.
- Ethernet NICs:
- Intel (1G, igc):
- support TSN / Qbv / packet scheduling features of i226 model
- Intel (100G, ice):
- use GNSS subsystem instead of TTY
- multi-buffer XDP support
- extend support for GPIO pins to E823 devices
- nVidia/Mellanox:
- update the shared buffer configuration on PFC commands
- implement PTP adjphase function for HW offset control
- TC support for Geneve and GRE with VF tunnel offload
- more efficient crypto key management method
- multi-port eswitch support
- Netronome/Corigine:
- add DCB IEEE support
- support IPsec offloading for NFP3800
- Freescale/NXP (enetc):
- support XDP_REDIRECT for XDP non-linear buffers
- improve reconfig, avoid link flap and waiting for idle
- support MAC Merge layer
- Other NICs:
- sfc/ef100: add basic devlink support for ef100
- ionic: rx_push mode operation (writing descriptors via MMIO)
- bnxt: use the auxiliary bus abstraction for RDMA
- r8169: disable ASPM and reset bus in case of tx timeout
- cpsw: support QSGMII mode for J721e CPSW9G
- cpts: support pulse-per-second output
- ngbe: add an mdio bus driver
- usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
- r8152: handle devices with FW with NCM support
- amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
- virtio-net: support multi buffer XDP
- virtio/vsock: replace virtio_vsock_pkt with sk_buff
- tsnep: XDP support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add support for latency TLV (in FW control messages)
- Microchip (sparx5):
- separate explicit and implicit traffic forwarding rules, make
the implicit rules always active
- add support for egress DSCP rewrite
- IS0 VCAP support (Ingress Classification)
- IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS
etc.)
- ES2 VCAP support (Egress Access Control)
- support for Per-Stream Filtering and Policing (802.1Q,
8.6.5.1)
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- add MAB (port auth) offload support
- enable PTP receive for mv88e6390
- NXP (ocelot):
- support MAC Merge layer
- support for the the vsc7512 internal copper phys
- Microchip:
- lan9303: convert to PHYLINK
- lan966x: support TC flower filter statistics
- lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
- lan937x: support Credit Based Shaper configuration
- ksz9477: support Energy Efficient Ethernet
- other:
- qca8k: convert to regmap read/write API, use bulk operations
- rswitch: Improve TX timestamp accuracy
- Intel WiFi (iwlwifi):
- EHT (Wi-Fi 7) rate reporting
- STEP equalizer support: transfer some STEP (connection to radio
on platforms with integrated wifi) related parameters from the
BIOS to the firmware.
- Qualcomm 802.11ax WiFi (ath11k):
- IPQ5018 support
- Fine Timing Measurement (FTM) responder role support
- channel 177 support
- MediaTek WiFi (mt76):
- per-PHY LED support
- mt7996: EHT (Wi-Fi 7) support
- Wireless Ethernet Dispatch (WED) reset support
- switch to using page pool allocator
- RealTek WiFi (rtw89):
- support new version of Bluetooth co-existance
- Mobile:
- rmnet: support TX aggregation"
* tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits)
page_pool: add a comment explaining the fragment counter usage
net: ethtool: fix __ethtool_dev_mm_supported() implementation
ethtool: pse-pd: Fix double word in comments
xsk: add linux/vmalloc.h to xsk.c
sefltests: netdevsim: wait for devlink instance after netns removal
selftest: fib_tests: Always cleanup before exit
net/mlx5e: Align IPsec ASO result memory to be as required by hardware
net/mlx5e: TC, Set CT miss to the specific ct action instance
net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
net/mlx5: Refactor tc miss handling to a single function
net/mlx5: Kconfig: Make tc offload depend on tc skb extension
net/sched: flower: Support hardware miss to tc action
net/sched: flower: Move filter handle initialization earlier
net/sched: cls_api: Support hardware miss to tc action
net/sched: Rename user cookie and act cookie
sfc: fix builds without CONFIG_RTC_LIB
sfc: clean up some inconsistent indentings
net/mlx4_en: Introduce flexible array to silence overflow warning
net: lan966x: Fix possible deadlock inside PTP
net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
...
Diffstat (limited to 'Documentation/networking/dsa/sja1105.rst')
-rw-r--r-- | Documentation/networking/dsa/sja1105.rst | 445 |
1 files changed, 445 insertions, 0 deletions
diff --git a/Documentation/networking/dsa/sja1105.rst b/Documentation/networking/dsa/sja1105.rst new file mode 100644 index 000000000..e0219c145 --- /dev/null +++ b/Documentation/networking/dsa/sja1105.rst @@ -0,0 +1,445 @@ +========================= +NXP SJA1105 switch driver +========================= + +Overview +======== + +The NXP SJA1105 is a family of 10 SPI-managed automotive switches: + +- SJA1105E: First generation, no TTEthernet +- SJA1105T: First generation, TTEthernet +- SJA1105P: Second generation, no TTEthernet, no SGMII +- SJA1105Q: Second generation, TTEthernet, no SGMII +- SJA1105R: Second generation, no TTEthernet, SGMII +- SJA1105S: Second generation, TTEthernet, SGMII +- SJA1110A: Third generation, TTEthernet, SGMII, integrated 100base-T1 and + 100base-TX PHYs +- SJA1110B: Third generation, TTEthernet, SGMII, 100base-T1, 100base-TX +- SJA1110C: Third generation, TTEthernet, SGMII, 100base-T1, 100base-TX +- SJA1110D: Third generation, TTEthernet, SGMII, 100base-T1 + +Being automotive parts, their configuration interface is geared towards +set-and-forget use, with minimal dynamic interaction at runtime. They +require a static configuration to be composed by software and packed +with CRC and table headers, and sent over SPI. + +The static configuration is composed of several configuration tables. Each +table takes a number of entries. Some configuration tables can be (partially) +reconfigured at runtime, some not. Some tables are mandatory, some not: + +============================= ================== ============================= +Table Mandatory Reconfigurable +============================= ================== ============================= +Schedule no no +Schedule entry points if Scheduling no +VL Lookup no no +VL Policing if VL Lookup no +VL Forwarding if VL Lookup no +L2 Lookup no no +L2 Policing yes no +VLAN Lookup yes yes +L2 Forwarding yes partially (fully on P/Q/R/S) +MAC Config yes partially (fully on P/Q/R/S) +Schedule Params if Scheduling no +Schedule Entry Points Params if Scheduling no +VL Forwarding Params if VL Forwarding no +L2 Lookup Params no partially (fully on P/Q/R/S) +L2 Forwarding Params yes no +Clock Sync Params no no +AVB Params no no +General Params yes partially +Retagging no yes +xMII Params yes no +SGMII no yes +============================= ================== ============================= + + +Also the configuration is write-only (software cannot read it back from the +switch except for very few exceptions). + +The driver creates a static configuration at probe time, and keeps it at +all times in memory, as a shadow for the hardware state. When required to +change a hardware setting, the static configuration is also updated. +If that changed setting can be transmitted to the switch through the dynamic +reconfiguration interface, it is; otherwise the switch is reset and +reprogrammed with the updated static configuration. + +Switching features +================== + +The driver supports the configuration of L2 forwarding rules in hardware for +port bridging. The forwarding, broadcast and flooding domain between ports can +be restricted through two methods: either at the L2 forwarding level (isolate +one bridge's ports from another's) or at the VLAN port membership level +(isolate ports within the same bridge). The final forwarding decision taken by +the hardware is a logical AND of these two sets of rules. + +The hardware tags all traffic internally with a port-based VLAN (pvid), or it +decodes the VLAN information from the 802.1Q tag. Advanced VLAN classification +is not possible. Once attributed a VLAN tag, frames are checked against the +port's membership rules and dropped at ingress if they don't match any VLAN. +This behavior is available when switch ports are enslaved to a bridge with +``vlan_filtering 1``. + +Normally the hardware is not configurable with respect to VLAN awareness, but +by changing what TPID the switch searches 802.1Q tags for, the semantics of a +bridge with ``vlan_filtering 0`` can be kept (accept all traffic, tagged or +untagged), and therefore this mode is also supported. + +Segregating the switch ports in multiple bridges is supported (e.g. 2 + 2), but +all bridges should have the same level of VLAN awareness (either both have +``vlan_filtering`` 0, or both 1). + +Topology and loop detection through STP is supported. + +Offloads +======== + +Time-aware scheduling +--------------------- + +The switch supports a variation of the enhancements for scheduled traffic +specified in IEEE 802.1Q-2018 (formerly 802.1Qbv). This means it can be used to +ensure deterministic latency for priority traffic that is sent in-band with its +gate-open event in the network schedule. + +This capability can be managed through the tc-taprio offload ('flags 2'). The +difference compared to the software implementation of taprio is that the latter +would only be able to shape traffic originated from the CPU, but not +autonomously forwarded flows. + +The device has 8 traffic classes, and maps incoming frames to one of them based +on the VLAN PCP bits (if no VLAN is present, the port-based default is used). +As described in the previous sections, depending on the value of +``vlan_filtering``, the EtherType recognized by the switch as being VLAN can +either be the typical 0x8100 or a custom value used internally by the driver +for tagging. Therefore, the switch ignores the VLAN PCP if used in standalone +or bridge mode with ``vlan_filtering=0``, as it will not recognize the 0x8100 +EtherType. In these modes, injecting into a particular TX queue can only be +done by the DSA net devices, which populate the PCP field of the tagging header +on egress. Using ``vlan_filtering=1``, the behavior is the other way around: +offloaded flows can be steered to TX queues based on the VLAN PCP, but the DSA +net devices are no longer able to do that. To inject frames into a hardware TX +queue with VLAN awareness active, it is necessary to create a VLAN +sub-interface on the DSA master port, and send normal (0x8100) VLAN-tagged +towards the switch, with the VLAN PCP bits set appropriately. + +Management traffic (having DMAC 01-80-C2-xx-xx-xx or 01-19-1B-xx-xx-xx) is the +notable exception: the switch always treats it with a fixed priority and +disregards any VLAN PCP bits even if present. The traffic class for management +traffic has a value of 7 (highest priority) at the moment, which is not +configurable in the driver. + +Below is an example of configuring a 500 us cyclic schedule on egress port +``swp5``. The traffic class gate for management traffic (7) is open for 100 us, +and the gates for all other traffic classes are open for 400 us:: + + #!/bin/bash + + set -e -u -o pipefail + + NSEC_PER_SEC="1000000000" + + gatemask() { + local tc_list="$1" + local mask=0 + + for tc in ${tc_list}; do + mask=$((${mask} | (1 << ${tc}))) + done + + printf "%02x" ${mask} + } + + if ! systemctl is-active --quiet ptp4l; then + echo "Please start the ptp4l service" + exit + fi + + now=$(phc_ctl /dev/ptp1 get | gawk '/clock time is/ { print $5; }') + # Phase-align the base time to the start of the next second. + sec=$(echo "${now}" | gawk -F. '{ print $1; }') + base_time="$(((${sec} + 1) * ${NSEC_PER_SEC}))" + + tc qdisc add dev swp5 parent root handle 100 taprio \ + num_tc 8 \ + map 0 1 2 3 5 6 7 \ + queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ + base-time ${base_time} \ + sched-entry S $(gatemask 7) 100000 \ + sched-entry S $(gatemask "0 1 2 3 4 5 6") 400000 \ + flags 2 + +It is possible to apply the tc-taprio offload on multiple egress ports. There +are hardware restrictions related to the fact that no gate event may trigger +simultaneously on two ports. The driver checks the consistency of the schedules +against this restriction and errors out when appropriate. Schedule analysis is +needed to avoid this, which is outside the scope of the document. + +Routing actions (redirect, trap, drop) +-------------------------------------- + +The switch is able to offload flow-based redirection of packets to a set of +destination ports specified by the user. Internally, this is implemented by +making use of Virtual Links, a TTEthernet concept. + +The driver supports 2 types of keys for Virtual Links: + +- VLAN-aware virtual links: these match on destination MAC address, VLAN ID and + VLAN PCP. +- VLAN-unaware virtual links: these match on destination MAC address only. + +The VLAN awareness state of the bridge (vlan_filtering) cannot be changed while +there are virtual link rules installed. + +Composing multiple actions inside the same rule is supported. When only routing +actions are requested, the driver creates a "non-critical" virtual link. When +the action list also contains tc-gate (more details below), the virtual link +becomes "time-critical" (draws frame buffers from a reserved memory partition, +etc). + +The 3 routing actions that are supported are "trap", "drop" and "redirect". + +Example 1: send frames received on swp2 with a DA of 42:be:24:9b:76:20 to the +CPU and to swp3. This type of key (DA only) when the port's VLAN awareness +state is off:: + + tc qdisc add dev swp2 clsact + tc filter add dev swp2 ingress flower skip_sw dst_mac 42:be:24:9b:76:20 \ + action mirred egress redirect dev swp3 \ + action trap + +Example 2: drop frames received on swp2 with a DA of 42:be:24:9b:76:20, a VID +of 100 and a PCP of 0:: + + tc filter add dev swp2 ingress protocol 802.1Q flower skip_sw \ + dst_mac 42:be:24:9b:76:20 vlan_id 100 vlan_prio 0 action drop + +Time-based ingress policing +--------------------------- + +The TTEthernet hardware abilities of the switch can be constrained to act +similarly to the Per-Stream Filtering and Policing (PSFP) clause specified in +IEEE 802.1Q-2018 (formerly 802.1Qci). This means it can be used to perform +tight timing-based admission control for up to 1024 flows (identified by a +tuple composed of destination MAC address, VLAN ID and VLAN PCP). Packets which +are received outside their expected reception window are dropped. + +This capability can be managed through the offload of the tc-gate action. As +routing actions are intrinsic to virtual links in TTEthernet (which performs +explicit routing of time-critical traffic and does not leave that in the hands +of the FDB, flooding etc), the tc-gate action may never appear alone when +asking sja1105 to offload it. One (or more) redirect or trap actions must also +follow along. + +Example: create a tc-taprio schedule that is phase-aligned with a tc-gate +schedule (the clocks must be synchronized by a 1588 application stack, which is +outside the scope of this document). No packet delivered by the sender will be +dropped. Note that the reception window is larger than the transmission window +(and much more so, in this example) to compensate for the packet propagation +delay of the link (which can be determined by the 1588 application stack). + +Receiver (sja1105):: + + tc qdisc add dev swp2 clsact + now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \ + sec=$(echo $now | awk -F. '{print $1}') && \ + base_time="$(((sec + 2) * 1000000000))" && \ + echo "base time ${base_time}" + tc filter add dev swp2 ingress flower skip_sw \ + dst_mac 42:be:24:9b:76:20 \ + action gate base-time ${base_time} \ + sched-entry OPEN 60000 -1 -1 \ + sched-entry CLOSE 40000 -1 -1 \ + action trap + +Sender:: + + now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \ + sec=$(echo $now | awk -F. '{print $1}') && \ + base_time="$(((sec + 2) * 1000000000))" && \ + echo "base time ${base_time}" + tc qdisc add dev eno0 parent root taprio \ + num_tc 8 \ + map 0 1 2 3 4 5 6 7 \ + queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ + base-time ${base_time} \ + sched-entry S 01 50000 \ + sched-entry S 00 50000 \ + flags 2 + +The engine used to schedule the ingress gate operations is the same that the +one used for the tc-taprio offload. Therefore, the restrictions regarding the +fact that no two gate actions (either tc-gate or tc-taprio gates) may fire at +the same time (during the same 200 ns slot) still apply. + +To come in handy, it is possible to share time-triggered virtual links across +more than 1 ingress port, via flow blocks. In this case, the restriction of +firing at the same time does not apply because there is a single schedule in +the system, that of the shared virtual link:: + + tc qdisc add dev swp2 ingress_block 1 clsact + tc qdisc add dev swp3 ingress_block 1 clsact + tc filter add block 1 flower skip_sw dst_mac 42:be:24:9b:76:20 \ + action gate index 2 \ + base-time 0 \ + sched-entry OPEN 50000000 -1 -1 \ + sched-entry CLOSE 50000000 -1 -1 \ + action trap + +Hardware statistics for each flow are also available ("pkts" counts the number +of dropped frames, which is a sum of frames dropped due to timing violations, +lack of destination ports and MTU enforcement checks). Byte-level counters are +not available. + +Limitations +=========== + +The SJA1105 switch family always performs VLAN processing. When configured as +VLAN-unaware, frames carry a different VLAN tag internally, depending on +whether the port is standalone or under a VLAN-unaware bridge. + +The virtual link keys are always fixed at {MAC DA, VLAN ID, VLAN PCP}, but the +driver asks for the VLAN ID and VLAN PCP when the port is under a VLAN-aware +bridge. Otherwise, it fills in the VLAN ID and PCP automatically, based on +whether the port is standalone or in a VLAN-unaware bridge, and accepts only +"VLAN-unaware" tc-flower keys (MAC DA). + +The existing tc-flower keys that are offloaded using virtual links are no +longer operational after one of the following happens: + +- port was standalone and joins a bridge (VLAN-aware or VLAN-unaware) +- port is part of a bridge whose VLAN awareness state changes +- port was part of a bridge and becomes standalone +- port was standalone, but another port joins a VLAN-aware bridge and this + changes the global VLAN awareness state of the bridge + +The driver cannot veto all these operations, and it cannot update/remove the +existing tc-flower filters either. So for proper operation, the tc-flower +filters should be installed only after the forwarding configuration of the port +has been made, and removed by user space before making any changes to it. + +Device Tree bindings and board design +===================================== + +This section references ``Documentation/devicetree/bindings/net/dsa/nxp,sja1105.yaml`` +and aims to showcase some potential switch caveats. + +RMII PHY role and out-of-band signaling +--------------------------------------- + +In the RMII spec, the 50 MHz clock signals are either driven by the MAC or by +an external oscillator (but not by the PHY). +But the spec is rather loose and devices go outside it in several ways. +Some PHYs go against the spec and may provide an output pin where they source +the 50 MHz clock themselves, in an attempt to be helpful. +On the other hand, the SJA1105 is only binary configurable - when in the RMII +MAC role it will also attempt to drive the clock signal. To prevent this from +happening it must be put in RMII PHY role. +But doing so has some unintended consequences. +In the RMII spec, the PHY can transmit extra out-of-band signals via RXD[1:0]. +These are practically some extra code words (/J/ and /K/) sent prior to the +preamble of each frame. The MAC does not have this out-of-band signaling +mechanism defined by the RMII spec. +So when the SJA1105 port is put in PHY role to avoid having 2 drivers on the +clock signal, inevitably an RMII PHY-to-PHY connection is created. The SJA1105 +emulates a PHY interface fully and generates the /J/ and /K/ symbols prior to +frame preambles, which the real PHY is not expected to understand. So the PHY +simply encodes the extra symbols received from the SJA1105-as-PHY onto the +100Base-Tx wire. +On the other side of the wire, some link partners might discard these extra +symbols, while others might choke on them and discard the entire Ethernet +frames that follow along. This looks like packet loss with some link partners +but not with others. +The take-away is that in RMII mode, the SJA1105 must be let to drive the +reference clock if connected to a PHY. + +RGMII fixed-link and internal delays +------------------------------------ + +As mentioned in the bindings document, the second generation of devices has +tunable delay lines as part of the MAC, which can be used to establish the +correct RGMII timing budget. +When powered up, these can shift the Rx and Tx clocks with a phase difference +between 73.8 and 101.7 degrees. +The catch is that the delay lines need to lock onto a clock signal with a +stable frequency. This means that there must be at least 2 microseconds of +silence between the clock at the old vs at the new frequency. Otherwise the +lock is lost and the delay lines must be reset (powered down and back up). +In RGMII the clock frequency changes with link speed (125 MHz at 1000 Mbps, 25 +MHz at 100 Mbps and 2.5 MHz at 10 Mbps), and link speed might change during the +AN process. +In the situation where the switch port is connected through an RGMII fixed-link +to a link partner whose link state life cycle is outside the control of Linux +(such as a different SoC), then the delay lines would remain unlocked (and +inactive) until there is manual intervention (ifdown/ifup on the switch port). +The take-away is that in RGMII mode, the switch's internal delays are only +reliable if the link partner never changes link speeds, or if it does, it does +so in a way that is coordinated with the switch port (practically, both ends of +the fixed-link are under control of the same Linux system). +As to why would a fixed-link interface ever change link speeds: there are +Ethernet controllers out there which come out of reset in 100 Mbps mode, and +their driver inevitably needs to change the speed and clock frequency if it's +required to work at gigabit. + +MDIO bus and PHY management +--------------------------- + +The SJA1105 does not have an MDIO bus and does not perform in-band AN either. +Therefore there is no link state notification coming from the switch device. +A board would need to hook up the PHYs connected to the switch to any other +MDIO bus available to Linux within the system (e.g. to the DSA master's MDIO +bus). Link state management then works by the driver manually keeping in sync +(over SPI commands) the MAC link speed with the settings negotiated by the PHY. + +By comparison, the SJA1110 supports an MDIO slave access point over which its +internal 100base-T1 PHYs can be accessed from the host. This is, however, not +used by the driver, instead the internal 100base-T1 and 100base-TX PHYs are +accessed through SPI commands, modeled in Linux as virtual MDIO buses. + +The microcontroller attached to the SJA1110 port 0 also has an MDIO controller +operating in master mode, however the driver does not support this either, +since the microcontroller gets disabled when the Linux driver operates. +Discrete PHYs connected to the switch ports should have their MDIO interface +attached to an MDIO controller from the host system and not to the switch, +similar to SJA1105. + +Port compatibility matrix +------------------------- + +The SJA1105 port compatibility matrix is: + +===== ============== ============== ============== +Port SJA1105E/T SJA1105P/Q SJA1105R/S +===== ============== ============== ============== +0 xMII xMII xMII +1 xMII xMII xMII +2 xMII xMII xMII +3 xMII xMII xMII +4 xMII xMII SGMII +===== ============== ============== ============== + + +The SJA1110 port compatibility matrix is: + +===== ============== ============== ============== ============== +Port SJA1110A SJA1110B SJA1110C SJA1110D +===== ============== ============== ============== ============== +0 RevMII (uC) RevMII (uC) RevMII (uC) RevMII (uC) +1 100base-TX 100base-TX 100base-TX + or SGMII SGMII +2 xMII xMII xMII xMII + or SGMII or SGMII +3 xMII xMII xMII + or SGMII or SGMII SGMII + or 2500base-X or 2500base-X or 2500base-X +4 SGMII SGMII SGMII SGMII + or 2500base-X or 2500base-X or 2500base-X or 2500base-X +5 100base-T1 100base-T1 100base-T1 100base-T1 +6 100base-T1 100base-T1 100base-T1 100base-T1 +7 100base-T1 100base-T1 100base-T1 100base-T1 +8 100base-T1 100base-T1 n/a n/a +9 100base-T1 100base-T1 n/a n/a +10 100base-T1 n/a n/a n/a +===== ============== ============== ============== ============== |