tt_ethernet_cable_present

<< Home | << Metrics

Name

Prometheus Metric Name

tt_ethernet_cable_present

Metric Path (tt-telemetry)

Schema:

{hostname}/tray{tray}/chip{chip}/channel{channel}/tt_ethernet_cable_present

Example path:

bh-glx-c09u02/tray1/chip2/channel0/tt_ethernet_cable_present

Description

Whether a QSFP-DD cable is physically connected at the port associated with a given Ethernet channel. Unlike tt_ethernet_link_up, which reflects the Ethernet link state on the ASIC, this metric reflects only physical connector presence, as detected via the QSFP-DD EEPROM on the I2C bus behind the BMC. A port with no cable inserted reports false; a port with a cable inserted reports true regardless of whether the link has come up.

How it works

QSFP connectivity is probed via a sequence of raw IPMI commands (issued with ipmitool) that walk the CPLD and I2C MUX to the QSFP-DD EEPROM and read register 0. Each ipmitool invocation is wrapped with timeout 5s so a stuck I2C transaction cannot hang the polling thread.

Because each port check issues several ipmitool commands, and because I2C access is serialized through the BMC, a full scan of all 4 UBBs × 14 ports (56 total) takes tens of seconds on Blackhole Galaxy. The IPMI service runs a dedicated background poller thread that issues one full scan approximately every 60 seconds and caches the per-port status in memory. Metric update cycles are O(1) cache lookups and never call ipmitool directly.

A per-port scan can resolve three ways:

  • EEPROM read returns 0xff (on the success path or embedded in an error response from a write NAK) → port is empty, cached as false.

  • EEPROM read returns valid data → cable is present, cached as true.

  • Scan genuinely errors (timeout, BMC unresponsive, I2C bus wedged) → the previous cached value for that port is retained; health metrics flip to unhealthy for that cycle.

Initial / pre-scan value

Because the first scan takes ~30–60 seconds, the metric defaults to true (connected) at startup and remains true until the first scan result for the corresponding (tray, port_id) entry lands in the cache. This avoids spurious “cable missing” alarms during the initial polling window. Consult tt_ipmi_service_healthy to determine whether the poller is actually healthy.

When the metric is created

This metric is only created for Ethernet channels whose physical link is of port type QSFP_DD. Channels corresponding to trace links, linking-board ports, or other non-QSFP port types do not produce this metric. Port-type information comes from the Factory System Descriptor (FSD); if no FSD is provided at startup, no instances of this metric are created and a single warning is emitted to the log. IPMI must be explicitly enabled with --enable_ipmi; otherwise this metric is not created.

Currently only the Blackhole Galaxy architecture is supported by the underlying IPMI I2C layer.

Values

Type: Boolean

Units: None

Allowable values:

  • True (1): A QSFP-DD cable is physically inserted at this port, or the first scan has not yet populated this port’s cache entry (startup default).

  • False (0): The most recent successful scan confirmed the port is empty.

Prometheus Labels

Label Name

Value

hostname

The host from which the metric was collected.

hall

The datacenter hall where the host is located. Sourced from the Factory System Descriptor (FSD).

aisle

The datacenter aisle where the host is located. Sourced from the Factory System Descriptor (FSD).

rack

The rack number where the host is located. Sourced from the Factory System Descriptor (FSD).

shelf_u

The shelf U position in the rack where the host is located. Sourced from the Factory System Descriptor (FSD).

tray

The tray (UBB) that the device is located on.

chip

The ASIC location within the tray.

channel

The Ethernet channel number on the chip.

port_type

The physical port type. Always QSFP_DD for this metric (by construction).

port_id

The physical port ID (tray-scoped, 1-based, from the FSD).

qsfp_designator

The QSFP connector reference designator (e.g., j101j404). Unique within a tray but not across trays. Maps 1:1 with (tray, port_id).

remote_hostname

The hostname of the remote endpoint. Present when remote endpoint information is available.

remote_tray

The tray number of the remote endpoint. Present when remote endpoint information is available.

remote_chip

The ASIC location of the remote endpoint. Present when remote endpoint information is available.

remote_channel

The Ethernet channel of the remote endpoint. Present when remote endpoint information is available.

remote_hall

The datacenter hall of the remote endpoint. Present when remote endpoint information is available.

remote_aisle

The datacenter aisle of the remote endpoint. Present when remote endpoint information is available.

remote_rack

The rack number of the remote endpoint. Present when remote endpoint information is available.

remote_shelf_u

The shelf U position of the remote endpoint. Present when remote endpoint information is available.