tt_ethernet_cable_present
Name
Prometheus Metric Name
tt_ethernet_cable_present
Metric Path (tt-telemetry)
Schema:
{hostname}/tray{tray}/chip{chip}/channel{channel}/tt_ethernet_cable_present
Example path:
bh-glx-c09u02/tray1/chip2/channel0/tt_ethernet_cable_present
Description
Whether a QSFP-DD cable is physically connected at the port associated with a
given Ethernet channel. Unlike tt_ethernet_link_up, which reflects the Ethernet
link state on the ASIC, this metric reflects only physical connector presence,
as detected via the QSFP-DD EEPROM on the I2C bus behind the BMC. A port with no
cable inserted reports false; a port with a cable inserted reports true
regardless of whether the link has come up.
How it works
QSFP connectivity is probed via a sequence of raw IPMI commands (issued with
ipmitool) that walk the CPLD and I2C MUX to the QSFP-DD EEPROM and read
register 0. Each ipmitool invocation is wrapped with timeout 5s so a stuck
I2C transaction cannot hang the polling thread.
Because each port check issues several ipmitool commands, and because I2C
access is serialized through the BMC, a full scan of all 4 UBBs × 14 ports
(56 total) takes tens of seconds on Blackhole Galaxy. The IPMI service runs a
dedicated background poller thread that issues one full scan approximately
every 60 seconds and caches the per-port status in memory. Metric update
cycles are O(1) cache lookups and never call ipmitool directly.
A per-port scan can resolve three ways:
EEPROM read returns
0xff(on the success path or embedded in an error response from a write NAK) → port is empty, cached asfalse.EEPROM read returns valid data → cable is present, cached as
true.Scan genuinely errors (timeout, BMC unresponsive, I2C bus wedged) → the previous cached value for that port is retained; health metrics flip to unhealthy for that cycle.
Initial / pre-scan value
Because the first scan takes ~30–60 seconds, the metric defaults to true
(connected) at startup and remains true until the first scan result for the
corresponding (tray, port_id) entry lands in the cache. This avoids spurious
“cable missing” alarms during the initial polling window. Consult
tt_ipmi_service_healthy to determine whether the poller is actually healthy.
When the metric is created
This metric is only created for Ethernet channels whose physical link is of
port type QSFP_DD. Channels corresponding to trace links, linking-board
ports, or other non-QSFP port types do not produce this metric. Port-type
information comes from the Factory System Descriptor (FSD); if no FSD is
provided at startup, no instances of this metric are created and a single
warning is emitted to the log. IPMI must be explicitly enabled with
--enable_ipmi; otherwise this metric is not created.
Currently only the Blackhole Galaxy architecture is supported by the underlying IPMI I2C layer.
Values
Type: Boolean
Units: None
Allowable values:
True (1): A QSFP-DD cable is physically inserted at this port, or the first scan has not yet populated this port’s cache entry (startup default).
False (0): The most recent successful scan confirmed the port is empty.
Prometheus Labels
Label Name |
Value |
|---|---|
hostname |
The host from which the metric was collected. |
hall |
The datacenter hall where the host is located. Sourced from the Factory System Descriptor (FSD). |
aisle |
The datacenter aisle where the host is located. Sourced from the Factory System Descriptor (FSD). |
rack |
The rack number where the host is located. Sourced from the Factory System Descriptor (FSD). |
shelf_u |
The shelf U position in the rack where the host is located. Sourced from the Factory System Descriptor (FSD). |
tray |
The tray (UBB) that the device is located on. |
chip |
The ASIC location within the tray. |
channel |
The Ethernet channel number on the chip. |
port_type |
The physical port type. Always |
port_id |
The physical port ID (tray-scoped, 1-based, from the FSD). |
qsfp_designator |
The QSFP connector reference designator (e.g., |
remote_hostname |
The hostname of the remote endpoint. Present when remote endpoint information is available. |
remote_tray |
The tray number of the remote endpoint. Present when remote endpoint information is available. |
remote_chip |
The ASIC location of the remote endpoint. Present when remote endpoint information is available. |
remote_channel |
The Ethernet channel of the remote endpoint. Present when remote endpoint information is available. |
remote_hall |
The datacenter hall of the remote endpoint. Present when remote endpoint information is available. |
remote_aisle |
The datacenter aisle of the remote endpoint. Present when remote endpoint information is available. |
remote_rack |
The rack number of the remote endpoint. Present when remote endpoint information is available. |
remote_shelf_u |
The shelf U position of the remote endpoint. Present when remote endpoint information is available. |