tt_eth_firmware_signature

<< Home | << Metrics

Name

Prometheus Metric Name

tt_eth_firmware_signature

Metric Path (tt-telemetry)

Schema:

{hostname}/tray{tray}/chip{chip}/channel{channel}/tt_eth_firmware_signature

Example path:

bh-glx-c09u02/tray1/chip2/channel0/tt_eth_firmware_signature

Description

The 16-bit ERISC firmware signature for a specific Ethernet channel, read directly from the heartbeat word in the Ethernet core’s L1 memory.

The heartbeat word is a 32-bit value: the upper 16 bits carry a signature identifying which firmware is running on the core, and the lower 16 bits are a counter the firmware increments while alive. This metric reports the signature half only (already shifted into the low 16 bits of the reported value). The companion metric tt_ethernet_heartbeat reports whether the counter half is advancing.

The heartbeat word is read from:

  • Wormhole B0: ETH_HEARTBEAT_ADDR (0x1C) in the Ethernet core’s L1.

  • Blackhole: the heartbeat[0] field of the eth_status_t struct inside BOOT_RESULTS_ADDR (0x7CC70).

The layout is identical on both architectures.

This metric is collected unconditionally on every Ethernet core; no ARC telemetry or topology information is required to populate it.

Observed values

Signature

Meaning

0xABCD (43981)

Base ERISC firmware is running on the core.

0xDCBA (56506)

AERISC software fabric has taken over the core. See aerisc_context_switch in tt_metal/hw/inc/internal/tt-1xx/blackhole/eth_fw_api.h, which writes 0xdcba0000 | counter into the heartbeat word to mark “software has taken over.” This is the authoritative indicator that the fabric firmware is the one driving the core.

Other

A firmware variant not enumerated here, an uninitialized/corrupt heartbeat word, or a core that is not running ERISC firmware at all.

The constants are mirrored in src/include/telemetry/ethernet/ethernet_helpers.hpp (AERISC_FABRIC_HEARTBEAT_SIGNATURE) and in the fabric-view frontend at src/frontend/static/js/fabric/constants.js. Note that the UMD-defined tt::umd::erisc_firmware::FABRIC_HEARTBEAT_SIGNATURE (0xAABB) does not match what is actually written to L1 in practice — the value to compare against is 0xDCBA.

Values

Type: Unsigned Integer

Units: None

Allowable values: A 16-bit value (0..65535) carrying the firmware signature. See the table above for known signatures.

Prometheus Labels

Label Name

Value

hostname

The host from which the metric was collected.

hall

The datacenter hall where the host is located. Sourced from the Factory System Descriptor (FSD).

aisle

The datacenter aisle where the host is located. Sourced from the Factory System Descriptor (FSD).

rack

The rack number where the host is located. Sourced from the Factory System Descriptor (FSD).

shelf_u

The shelf U position in the rack where the host is located. Sourced from the Factory System Descriptor (FSD).

tray

The tray (UBB) that the device is located on.

chip

The ASIC location within the tray.

channel

The Ethernet channel number on the chip.

port_type

The physical port type (e.g., QSFP, backplane). Present when topology information is available.

port_id

The physical port ID. Present when topology information is available.

remote_hostname

The hostname of the remote endpoint. Present when remote endpoint information is available.

remote_tray

The tray number of the remote endpoint. Present when remote endpoint information is available.

remote_chip

The ASIC location of the remote endpoint. Present when remote endpoint information is available.

remote_channel

The Ethernet channel of the remote endpoint. Present when remote endpoint information is available.

remote_hall

The datacenter hall of the remote endpoint. Present when remote endpoint information is available.

remote_aisle

The datacenter aisle of the remote endpoint. Present when remote endpoint information is available.

remote_rack

The rack number of the remote endpoint. Present when remote endpoint information is available.

remote_shelf_u

The shelf U position of the remote endpoint. Present when remote endpoint information is available.