tt_device_dram_used_megabytes

<< Home | << Metrics

Name

Prometheus Metric Name

tt_device_dram_used_megabytes

Metric Path (tt-telemetry)

Schema:

{hostname}/tray{tray}/chip{chip}/dram/tt_device_dram_used_megabytes

Example path:

bh-glx-c09u02/tray1/chip2/dram/tt_device_dram_used_megabytes

Description

The amount of DRAM currently allocated on the chip by tt-metal, summed across every process attached to the chip, in mebibytes (1 MiB = 1024×1024 B), rounded to the nearest integer.

The underlying byte count is read from the total_dram_allocated field of tt-metal’s per-device shared-memory allocator-stats region (/dev/shm/tt_device_<asic_id>_memory). tt-telemetry maps the region read-only and never writes to it.

If no tt-metal process has ever touched the chip on this host the SHM file is absent and the metric reports 0; the reader retries on every cycle. If the SHM region exists but its layout version disagrees with what tt-telemetry was built against, the metric reports 0 and a warning is logged once. The layout contract is the version field of the SHM struct (currently 3).

For multi-chip mesh devices (e.g. N300, Galaxy) the shared-memory region currently aggregates allocations across the gateway chip and any remote chips reached through it, all reported under the gateway’s tray/chip labels. Per-chip breakdown for mesh devices is a planned follow-up.

Values

Type: Unsigned Integer

Units: Megabytes (MB) — reported as mebibytes (1 MiB = 1024×1024 B), rounded to the nearest integer.

Allowable values: A non-negative integer. 0 means either no allocations are live or the SHM region is unavailable for this chip.

Prometheus Labels

Label Name

Value

hostname

The host from which the metric was collected.

hall

The datacenter hall where the host is located. Sourced from the Factory System Descriptor (FSD).

aisle

The datacenter aisle where the host is located. Sourced from the Factory System Descriptor (FSD).

rack

The rack number where the host is located. Sourced from the Factory System Descriptor (FSD).

shelf_u

The shelf U position in the rack where the host is located. Sourced from the Factory System Descriptor (FSD).

tray

The tray (UBB) that the device is located on.

chip

The ASIC location within the tray.

unit

The unit of measurement. Always "MB" for this metric.