tt_thermal_trip_count
Name
Prometheus Metric Name
tt_thermal_trip_count
Metric Path (tt-telemetry)
Schema:
{hostname}/tray{tray}/chip{chip}/tt_thermal_trip_count
Example path:
bh-glx-c09u02/tray1/chip2/tt_thermal_trip_count
Description
The cumulative count of thermal trip events since the last reset (system reboot or tt-smi reset). A non-zero value indicates the device has experienced thermal throttling events. The chip temperature is monitored and when it exceeds safe thresholds, a thermal trip is triggered. This metric is only created if the device reports thermal trip data.
Values
Type: Unsigned Integer
Units: None
Allowable values:
A non-negative integer representing the number of thermal trip events. A value of 0 means no thermal trips have occurred since the last reset.
Prometheus Labels
Label Name |
Value |
|---|---|
hostname |
The host from which the metric was collected. |
hall |
The datacenter hall where the host is located. Sourced from the Factory System Descriptor (FSD). |
aisle |
The datacenter aisle where the host is located. Sourced from the Factory System Descriptor (FSD). |
rack |
The rack number where the host is located. Sourced from the Factory System Descriptor (FSD). |
shelf_u |
The shelf U position in the rack where the host is located. Sourced from the Factory System Descriptor (FSD). |
tray |
The tray (UBB) that the device is located on. |
chip |
The ASIC location within the tray. |