tt_fabric_tx_heartbeat_total

<< Home | << Metrics

Name

Prometheus Metric Name

tt_fabric_tx_heartbeat_total

Metric Path (tt-telemetry)

Schema:

{hostname}/tray{tray}/chip{chip}/channel{channel}/fabric/erisc{erisc_core}/tt_fabric_tx_heartbeat_total

Example path:

bh-glx-c09u02/tray1/chip2/channel0/fabric/erisc0/tt_fabric_tx_heartbeat_total

Description

The transmit heartbeat counter for a specific eRISC core on a fabric Ethernet channel. This counter increments as the eRISC core processes transmit operations. If it stops incrementing, the eRISC TX path may be hung. This metric is available on Wormhole B0 and Blackhole devices.

Note: Fabric metrics are only updated by fabric firmware when workloads are run with fabric telemetry explicitly enabled. For example, by setting the environment variable TT_METAL_FABRIC_TELEMETRY to 1.

Values

Type: Unsigned Integer

Units: None

Allowable values: A non-negative integer that increments over time. The absolute value is not meaningful; what matters is that it continues to change between successive reads.

Prometheus Labels

Label Name

Value

hostname

The host from which the metric was collected.

hall

The datacenter hall where the host is located. Sourced from the Factory System Descriptor (FSD).

aisle

The datacenter aisle where the host is located. Sourced from the Factory System Descriptor (FSD).

rack

The rack number where the host is located. Sourced from the Factory System Descriptor (FSD).

shelf_u

The shelf U position in the rack where the host is located. Sourced from the Factory System Descriptor (FSD).

tray

The tray (UBB) that the device is located on.

chip

The ASIC location within the tray.

channel

The Ethernet channel number on the chip.

port_type

The physical port type (e.g., QSFP, backplane). Present when topology information is available.

port_id

The physical port ID. Present when topology information is available.

remote_hostname

The hostname of the remote endpoint. Present when remote endpoint information is available.

remote_tray

The tray number of the remote endpoint. Present when remote endpoint information is available.

remote_chip

The ASIC location of the remote endpoint. Present when remote endpoint information is available.

remote_channel

The Ethernet channel of the remote endpoint. Present when remote endpoint information is available.

remote_hall

The datacenter hall of the remote endpoint. Present when remote endpoint information is available.

remote_aisle

The datacenter aisle of the remote endpoint. Present when remote endpoint information is available.

remote_rack

The rack number of the remote endpoint. Present when remote endpoint information is available.

remote_shelf_u

The shelf U position of the remote endpoint. Present when remote endpoint information is available.

erisc_core

The eRISC core index (0 for Wormhole, 0-1 for Blackhole).