Tenstorrent DOCUMENTATION
  • Get Started
    Overview Installing Tenstorrent Software Manual Installation Deploy LLMs Learn by Building
  • Software
    TT-Forge TT-NN TT-Lang TT-MLIR TT-Metalium Cloud-Native Support
  • Hardware
    Blackhole Blackhole™ PCIe Cards TT-QuietBox TT-QuietBox 2 Wormhole Wormhole™ PCIe Cards TT-QuietBox TT-LoudBox™
  • Tools
    TT-Inference-Server TT-Studio TT-SMI TT-Toplike TT-NN Visualizer TT-Topology TT-VSCode-Toolkit TT-Blacksmith
  • Resources
    Request Support Developer Hub Join our Discord FAQ
Explore Github
Logo Cloud-Native Support
  • TT-Operator
  • Node Feature Discovery
  • Driver Manager
  • Telemetry
    • Metrics
    • Configuration
  • Fabric Manager
  • Device Allocation
  • Multi-Node Scheduling
Telemetry
  • Metrics
  • tt_pcie_link_alive
  • View page source

tt_pcie_link_alive

<< Home | << Metrics

Name

Prometheus Metric Name

tt_pcie_link_alive

Metric Path (tt-telemetry)

Schema:

{hostname}/tray{tray}/chip{chip}/pcie/tt_pcie_link_alive

Example path:

bh-glx-c09u02/tray1/chip2/pcie/tt_pcie_link_alive

Description

Indicates whether the chip’s PCIe link responds to a host read. Each collection cycle the telemetry server asks UMD’s hang detector to read a BAR register that the chip is guaranteed to never legitimately hold as 0xFFFFFFFF. If the read returns 0xFFFFFFFF, the PCIe link has silently dropped and any subsequent reads will also return the all-ones fault signature — recovery usually requires a board reset.

This metric is only created for MMIO-capable chips on Wormhole and Blackhole architectures, since those are the devices for which UMD provides a PCIe hang detector. Remote chips and other architectures are skipped (no metric is emitted).

Values

Type: Boolean

Units: None

Allowable values:

  • True (1): The chip responded normally to the PCIe probe read.

  • False (0): The chip returned the 0xFFFFFFFF fault signature; the PCIe link is hung.

Prometheus Labels

Label Name

Value

hostname

The host from which the metric was collected.

hall

The datacenter hall where the host is located. Sourced from the Factory System Descriptor (FSD).

aisle

The datacenter aisle where the host is located. Sourced from the Factory System Descriptor (FSD).

rack

The rack number where the host is located. Sourced from the Factory System Descriptor (FSD).

shelf_u

The shelf U position in the rack where the host is located. Sourced from the Factory System Descriptor (FSD).

tray

The tray (UBB) that the device is located on.

chip

The ASIC location within the tray.

Previous Next

© Copyright 2026, Tenstorrent AI ULC. Last updated on Jul 01, 2026.

Built with Sphinx using a theme provided by Read the Docs.
Version: latest
Versions
latest
v0.1.2
v0.1.1