tt_expected_chip_count

<< Home | << Metrics

Name

Prometheus Metric Name

tt_expected_chip_count

Metric Path (tt-telemetry)

Schema:

{hostname}/tt_expected_chip_count

Example path:

wh-glx-c12u04/tt_expected_chip_count

Description

The number of Tenstorrent chips a Galaxy host is expected to expose. This is a fixed reference value used alongside tt_chip_count to detect a mismatch between what the host should have and what UMD currently sees on the PCI bus.

The metric is created only when the host is identified as a Galaxy system. This identification is made from the Factory System Descriptor (FSD): the host is a Galaxy if the FSD lists a board location for it with board type UBB (Wormhole Galaxy) or UBB_BLACKHOLE (Blackhole Galaxy). Because this check reads the FSD rather than the device, the metric is only available when a valid FSD that includes this host is provided — if no FSD is configured, or the host is absent from it, the metric is not emitted (even on a Galaxy machine). On non-Galaxy hosts the metric is likewise not emitted.

The value is set once at metric creation to 32. It is not refreshed on later collection cycles.

On creation, if the live tt_chip_count differs from the expected value, tt-telemetry logs an error (Chip count mismatch).

Values

Type: Unsigned Integer

Units: None

Allowable values: 32 on Galaxy hosts. The value is constant for the lifetime of the collector process after the metric is first created.

Prometheus Labels

Label Name

Value

hostname

The host from which the metric was collected.

hall

The datacenter hall where the host is located. Sourced from the Factory System Descriptor (FSD).

aisle

The datacenter aisle where the host is located. Sourced from the Factory System Descriptor (FSD).

rack

The rack number where the host is located. Sourced from the Factory System Descriptor (FSD).

shelf_u

The shelf U position in the rack where the host is located. Sourced from the Factory System Descriptor (FSD).