tt_dram_corrected_edc_write_errors

<< Home | << Metrics

Name

Prometheus Metric Name

tt_dram_corrected_edc_write_errors

Metric Path (tt-telemetry)

Schema:

{hostname}/tray{tray}/chip{chip}/dram/module{module}/tt_dram_corrected_edc_write_errors

Example path:

bh-glx-c09u02/tray1/chip2/dram/module0/tt_dram_corrected_edc_write_errors

Description

The cumulative count of corrected EDC (Error Detection and Correction) write errors for a DRAM module. These are errors that were detected and corrected by the DRAM’s error correction mechanism during write operations. The counter saturates at 255. This metric is only created if GDDR telemetry is available on the device.

Values

Type: Unsigned Integer

Units: None

Allowable values: A non-negative integer from 0 to 255. A value of 0 indicates no corrected write errors. The counter saturates at 255 and will not increment further.

Prometheus Labels

Label Name

Value

hostname

The host from which the metric was collected.

hall

The datacenter hall where the host is located. Sourced from the Factory System Descriptor (FSD).

aisle

The datacenter aisle where the host is located. Sourced from the Factory System Descriptor (FSD).

rack

The rack number where the host is located. Sourced from the Factory System Descriptor (FSD).

shelf_u

The shelf U position in the rack where the host is located. Sourced from the Factory System Descriptor (FSD).

tray

The tray (UBB) that the device is located on.

chip

The ASIC location within the tray.

module

The DRAM/GDDR module index on the chip.