tt_dram_corrected_edc_write_errors
Name
Prometheus Metric Name
tt_dram_corrected_edc_write_errors
Metric Path (tt-telemetry)
Schema:
{hostname}/tray{tray}/chip{chip}/dram/module{module}/tt_dram_corrected_edc_write_errors
Example path:
bh-glx-c09u02/tray1/chip2/dram/module0/tt_dram_corrected_edc_write_errors
Description
The cumulative count of corrected EDC (Error Detection and Correction) write errors for a DRAM module. These are errors that were detected and corrected by the DRAM’s error correction mechanism during write operations. The counter saturates at 255. This metric is only created if GDDR telemetry is available on the device.
Values
Type: Unsigned Integer
Units: None
Allowable values:
A non-negative integer from 0 to 255. A value of 0 indicates no corrected write errors. The counter saturates at 255 and will not increment further.
Prometheus Labels
Label Name |
Value |
|---|---|
hostname |
The host from which the metric was collected. |
hall |
The datacenter hall where the host is located. Sourced from the Factory System Descriptor (FSD). |
aisle |
The datacenter aisle where the host is located. Sourced from the Factory System Descriptor (FSD). |
rack |
The rack number where the host is located. Sourced from the Factory System Descriptor (FSD). |
shelf_u |
The shelf U position in the rack where the host is located. Sourced from the Factory System Descriptor (FSD). |
tray |
The tray (UBB) that the device is located on. |
chip |
The ASIC location within the tray. |
module |
The DRAM/GDDR module index on the chip. |