Install
Prerequisites
Per-node:
Linux, x86_64. Build-tested on Ubuntu 22.04 (jammy) with HWE 6.x kernel. Other distros work as long as the operator’s base image (currently
ubuntu:22.04) is libc/libssl-compatible with the host — see Troubleshooting → tt-smi can’t execute on host.linux-headers-$(uname -r)installed on the host. The builder pod builds tt-kmd against/lib/modules/$(uname -r)/build; if the headers aren’t there the pod CrashLoops withhost has no kernel build tree. Typicallyapt install linux-headers-generic-hwe-22.04or equivalent.gcc-12is the implicit compiler for HWE 6.8 kernels. The builder image ships gcc-12; if the host kernel was compiled with a different gcc (cat /proc/versionto check), the build will fail withgcc-12: not foundfrom the kernel Makefile. Rebuild the builder image with the matching gcc version.feature.node.kubernetes.io/pci-1200_1e52.present=truelabel on Tenstorrent nodes. node-feature-discovery emits this automatically — see NFD setup below.
Per-cluster:
Kubernetes 1.27+ (anything that supports kubebuilder v1).
Helm 3.8+ (for OCI registry support).
Install via Helm (driver-manager only)
helm install tt-k8s-driver-manager \
oci://ghcr.io/tenstorrent/helm/tt-k8s-driver-manager \
--namespace tt-k8s-driver-manager-system --create-namespace
This installs:
The controller-manager Deployment (one pod, leader-elected).
The CRDs
TenstorrentDriverPolicyandTenstorrentFirmwarePolicy.RBAC: a controller
ClusterRolefor managing those CRs + DaemonSets + Jobs, plus an installerClusterRolegranting per-podnodes:patchso the builder can label its own node.Two
ServiceAccounts in the install namespace: one for the controller, one for the per-CR builder/flasher pods.
It does not install node-feature-discovery. Use the
tt-operator umbrella chart
if you want NFD installed for you, or install NFD separately.
Install via the umbrella chart (driver-manager + NFD)
helm repo add node-feature-discovery https://kubernetes-sigs.github.io/node-feature-discovery/charts
helm install tt-operator oci://ghcr.io/tenstorrent/helm-charts/tt-operator \
--namespace tt-operator-system --create-namespace
Brings up node-feature-discovery + tt-k8s-driver-manager in one release. The
tt-k8s-driver-manager.* block in the umbrella’s values.yaml is forwarded
to the subchart unchanged.
NFD setup
If you didn’t install via the umbrella, install NFD separately:
helm repo add node-feature-discovery https://kubernetes-sigs.github.io/node-feature-discovery/charts
helm repo update
helm install nfd node-feature-discovery/node-feature-discovery \
--version 0.18.3 \
--namespace node-feature-discovery --create-namespace \
--set 'worker.config.core.featureSources={pci}' \
--set 'worker.config.core.labelSources={pci}'
Tenstorrent’s PCI class (1200 — Processing Accelerator) is in NFD’s
default whitelist, so the pci-1200_1e52.present label appears
automatically once NFD’s worker is running on a node. The featureSources
/ labelSources restriction keeps NFD from emitting hundreds of
unrelated labels (CPU, memory, network, etc.) that we don’t use.
Verifying the install
$ kubectl -n tt-k8s-driver-manager-system get all
NAME READY STATUS
pod/tt-k8s-driver-manager-controller-... 1/1 Running
deployment.apps/tt-k8s-driver-manager-controller 1/1
Then check NFD labelled your Tenstorrent nodes:
$ kubectl get nodes -L feature.node.kubernetes.io/pci-1200_1e52.present
NAME STATUS PRESENT
node-1 Ready true
node-2 Ready true
node-3 Ready true
If PRESENT is empty on a node that has a Tenstorrent card, NFD isn’t
seeing the device — check the NFD worker pod’s logs on that node.
Apply a CR to actually install the driver — see Driver Management.
Uninstall
helm -n tt-k8s-driver-manager-system uninstall tt-k8s-driver-manager
What helm uninstall removes:
Controller Deployment + ServiceAccounts + RBAC.
What it does NOT remove (intentional):
The CRDs (Helm convention: CRDs survive uninstall to avoid losing CRs).
The DaemonSets the controller created (they were owned by the CRs, not by helm). Delete the CRs first if you want full cleanup:
kubectl delete ttdp --all.Any state on the hosts:
/var/cache/tt-kmd/*,/usr/local/bin/tt-smi. The kernel module stays loaded too. Clean these manually if you’re tearing down a node — see Fully clean a host.