Release Notes
tt-operator v0.1
The first release of the Tenstorrent Operator. tt-operator packages the components needed to run Tenstorrent devices under Kubernetes into a single umbrella Helm chart, validated end to end on real Tenstorrent hardware.
Highlights
One-step install of the full stack from the published Helm chart.
Tenstorrent device discovery. Nodes that have Tenstorrent devices are automatically labeled, so workloads and operands schedule only where a Tenstorrent device is present.
Driver lifecycle. Install, version-upgrade, and node-scope the
tt-kmdkernel driver declaratively with theTenstorrentDriverPolicyresource. During an upgrade the operator cordons and drains the node and pauses telemetry so it releases the device first.Firmware flashing via the
TenstorrentFirmwarePolicyresource.Telemetry. A Prometheus
/metricsendpoint reporting per-device health, with topology-aware identity labels.Continuous operations. In-place
helm upgradeand cleanhelm uninstall.
Supported features
Capability |
Resource or surface |
|---|---|
Device labeling |
Node Feature Discovery PCI label |
Driver install, upgrade, scoping |
|
Firmware flashing |
|
Telemetry |
Prometheus |
Fabric topology resolution |
Fabric Manager gRPC API |
Device allocation |
Dynamic Resource Allocation ( |
Multi-node scheduling |
JobSet and PMIx |
Requirements
Kubernetes 1.27 or later. Device Allocation (DRA) requires 1.33 or later.
cert-manager installed on the cluster. It is required by the bundled PMIx admission webhook. See Prerequisites.
Network access to the container registry hosting the Tenstorrent images.
See Platform support for the validated devices and environments.
Known limitations
Device Allocation (DRA) requires resolvable fabric topology on the node. On systems without staged topology, devices are not yet published as schedulable resources.
Air-gapped and private-registry installs are not yet covered by a documented workflow.