Configuration Reference
All tt-operator settings are Helm values. To override a value, pass --set <key>=<value> at install or upgrade time, or use
a values file with -f values.yaml. See Installation for
worked examples, including enabling and disabling components and pinning images.
tt-operator
Umbrella chart for Tenstorrent on Kubernetes. Installs NFD for device labelling, tt-k8s-driver-manager for tt-kmd, firmware, and tt-smi lifecycle, tt-fabric-manager for cluster fabric/interconnect management, tt-dra-driver for DRA-based device scheduling, tt-telemetry for device metrics export, jobset for multi-job workloads, and kubepmix for PMIx-aware MPI scheduling.
Requirements
Repository |
Name |
Version |
|---|---|---|
https://kubernetes-sigs.github.io/node-feature-discovery/charts |
node-feature-discovery |
0.18.3 |
oci://ghcr.io/tenstorrent/helm |
tt-dra-driver |
0.0.36 |
oci://ghcr.io/tenstorrent/helm |
tt-fabric-manager |
0.2.28 |
oci://ghcr.io/tenstorrent/helm |
tt-k8s-driver-manager |
0.0.5 |
oci://ghcr.io/tenstorrent/helm |
tt-telemetry |
0.1.1 |
oci://registry.k8s.io/jobset/charts |
jobset |
0.12.0 |
Values
Key |
Type |
Default |
Description |
|---|---|---|---|
jobset.enabled |
bool |
|
Install JobSet (kubernetes-sigs/jobset) for grouped multi-node Job lifecycle. No external prereqs. |
kubepmix.enabled |
bool |
|
Install the kubepmix mutating webhook (injects PMIx env into multi-node training Jobs). Requires cert-manager pre-installed. |
kubepmix.namespace |
string |
|
Namespace for the kubepmix webhook resources. Defaults to a dedicated kube-pmix namespace; set to your release namespace to consolidate. |
node-feature-discovery.enabled |
bool |
|
Install node-feature-discovery so Tenstorrent nodes get the feature.node.kubernetes.io/pci-1200_1e52.present label. |
node-feature-discovery.worker.config.core.featureSources |
list |
|
NFD feature sources to enable. Restricted to PCI — the only source needed to detect Tenstorrent devices. |
node-feature-discovery.worker.config.core.labelSources |
list |
|
NFD label sources to enable. Restricted to PCI to avoid the CPU/memory cost and label clutter of the full source set. |
node-feature-discovery.worker.tolerations |
list |
|
Tolerations for the NFD worker DaemonSet. Defaults to universal so the PCI label reaches tainted node pools. |
tt-dra-driver.enabled |
bool |
|
Install tt-dra-driver (DRA kubelet plugin publishing devices as ResourceSlices). Requires k8s 1.33+, tt-fabric-manager.enabled, and the DynamicResourceAllocation feature gate. |
tt-dra-driver.kubeletPlugin.fabricManagerAgentAddress |
string |
|
Address of the in-cluster TTFM agent Service the DRA plugin calls for GetTopology. |
tt-fabric-manager.enabled |
bool |
|
Install tt-fabric-manager (TTFM): inter-card/inter-host fabric topology + GetTopology gRPC. Opt-in; most single-node / non-Galaxy setups don’t need it. |
tt-fabric-manager.fullnameOverride |
string |
|
Drop the release prefix so TTFM Service names render as tt-fabric-manager-{controller,agent}. |
tt-k8s-driver-manager.enabled |
bool |
|
Install tt-k8s-driver-manager (controllers, CRDs, and per-node images for tt-kmd, firmware flashing, and tt-smi). Disable to manage drivers yourself. |
tt-telemetry.daemonset |
object |
|
Collector DaemonSet nodeAffinity — schedule on TT-equipped nodes and honor the tenstorrent.com/deploy.tt-telemetry drain gate. |
tt-telemetry.enabled |
bool |
|
Install tt-telemetry (device telemetry → Prometheus endpoint + web GUI). Alpha software. |
tt-telemetry.image.repository |
string |
|
tt-telemetry image repository, pinned to the published org-namespace image. |
tt-telemetry.image.tag |
string |
|
tt-telemetry image tag. Keep in lockstep with the tt-telemetry subchart version in Chart.yaml. |
tt-telemetry.namespace |
string |
|
Namespace for tt-telemetry resources. The subchart hardcodes this (no Release.Namespace fallback), so match your install namespace. |