Driver Management
Driver-manager installs and maintains a specific tt-kmd version on each
Tenstorrent node via a TenstorrentDriverPolicy (short name: ttdp).
The minimum CR
apiVersion: driver.tenstorrent.com/v1alpha1
kind: TenstorrentDriverPolicy
metadata:
name: default
spec:
version: "2.8.0"
nodeAffinity: {}
What happens:
Controller creates a privileged DaemonSet named
ttdrv-<crname>in the operator namespace.Pod template’s
nodeAffinityisspec.nodeAffinity∧feature.node.kubernetes.io/pci-1200_1e52.present=true∧!exists(driver.tenstorrent.com/skip). So pods schedule only on Tenstorrent nodes that aren’t opted out.Each pod’s entrypoint:
Probes for an existing host install (DKMS +
/usr/src/tenstorrent-*). If detected → label nodeinstall-mode=host, idle, don’t touch anything.Otherwise, compares
/sys/module/tenstorrent/versionagainstspec.version. Match → idle. Mismatch withrefcnt=0→rmmod, clone tt-kmd at the requested tag,make modulesagainst host kernel headers,insmod. Mismatch withrefcnt>0(workload holding the device) → fail loudly with the holder PIDs.
Copies the bundled self-contained
tt-smibinary to/host/usr/local/bin/tt-smi.Stamps
kmd-version,tt-smi.driver.tenstorrent.com/version, andinstall-modelabels on the node it’s running on.exec sleep infinity— pod stays Ready; readiness probe rechecks/sys/module/tenstorrent/versionevery 10s.
Spec fields
Field |
Default |
Purpose |
|---|---|---|
|
required |
tt-kmd release tag, minus |
|
required |
Standard |
|
|
Soft stop. Controller stops reconciling; existing DS keeps running. Useful for blast-radius pauses without deleting the CR. |
|
|
Pass 1: cordon + evict pods that |
|
|
Pass 2: full-node |
|
|
Restricts pass 2 to pods matching this selector ( |
|
|
Evict bare pods (no controller) instead of skipping. Applies to both passes. |
|
|
Pass 2 evicts pods with |
|
|
Per-node drain deadline. |
|
|
Last resort: if |
|
chart’s |
Per-CR override of the builder image. Useful for canary-ing a new builder. |
|
|
Override for the above. Set to |
CR examples
Whole-fleet install
apiVersion: driver.tenstorrent.com/v1alpha1
kind: TenstorrentDriverPolicy
metadata:
name: fleet
spec:
version: "2.8.0"
nodeAffinity: {}
Multiple versions across pools
Two non-overlapping CRs. Nodes are pinned to pools via a custom label
(here tt.tenstorrent.com/pool):
apiVersion: driver.tenstorrent.com/v1alpha1
kind: TenstorrentDriverPolicy
metadata: { name: prod }
spec:
version: "2.7.0"
nodeAffinity: { matchLabels: { tt.tenstorrent.com/pool: prod } }
---
apiVersion: driver.tenstorrent.com/v1alpha1
kind: TenstorrentDriverPolicy
metadata: { name: canary }
spec:
version: "2.8.0"
nodeAffinity: { matchLabels: { tt.tenstorrent.com/pool: canary } }
The selectors must be disjoint — overlapping CRs both spawn DS pods on
the shared nodes, and the second pod loses (rmmod fails because the
first’s module is loaded).
Pause for incident response
kubectl patch ttdp default --type merge -p '{"spec":{"paused":true}}'
The controller stops reconciling. Pods keep running with whatever version they last loaded. Unpause when ready:
kubectl patch ttdp default --type merge -p '{"spec":{"paused":false}}'
Per-CR installer image (canary)
spec:
version: "2.8.0"
installer:
image: ghcr.io/tenstorrent/tt-k8s-driver-manager-builder:sha-abc1234
imagePullPolicy: Always
Upgrade flow
Bump spec.version:
kubectl patch ttdp default --type merge -p '{"spec":{"version":"2.8.0"}}'
Per-node state machine (mirrors ttfwp):
Pending → Cordoning → Draining → Upgrading → Uncordoning → Done
↘ Failed
Cordoning / Draining / Uncordoning are skipped when
upgradePolicy.drain.enable=false. Per-node state is surfaced on
status.nodes[] — see Watch progress.
What the controller does:
Before bumping the DS template, cordons matched nodes and flips
controller.deployGateslabels (defaulttenstorrent.com/deploy.tt-telemetry=false) so sibling DaemonSets that hold/dev/tenstorrentevict themselves.Drains each node in two passes: pass 1 evicts pods that
hostPath-mount/dev/tenstorrent; pass 2 (gated bydrain.fullNode, default on) runs fullkubectl drainsemantics.Re-renders the DS pod template with the new
TT_KMD_VERSIONenv and a sha256driver.tenstorrent.com/template-hash. K8s rolling update kicks in withmaxUnavailable: 1.On each new pod start, entrypoint sees
LOADED != EXPECTED, checks refcnt; with the drain done it’s normally 0.forceUnload=true= SIGKILL holders via/proc/*/fdwalk beforermmod. Then either pulls from/var/cache/tt-kmd/<kver>/<new-version>/(cache hit, ~5s total) or clones tt-kmd +make modules(cache miss, ~30-90s) andinsmods.Pod’s readiness probe (
/sys/module/tenstorrent/versionmatches$TT_KMD_VERSION) passes; controller uncordons the node and removes the deploy-gate labels (sibling DSes reschedule); rolling update advances to the next node.
A 3-node cluster upgrade takes ~1–2 min cache-cold, ~30s cache-warm (after a previous upgrade on this CR has already populated the cache).
Watch progress
$ kubectl get ttdp default -o jsonpath='{.status.nodes}' | jq
[
{"name":"node-1","currentVersion":"2.8.0","state":"Done"},
{"name":"node-2","currentVersion":"2.7.0","state":"Draining"},
{"name":"node-3","currentVersion":"2.7.0","state":"Pending"}
]
Deploy gates
controller.deployGates in the chart’s values.yaml is the list of
node-label keys the controller flips off (=false) during a kmd
upgrade and removes on uncordon. Sibling DaemonSets that consume
/dev/tenstorrent (tt-telemetry, future workloads) must include
NotIn ["false"] on the same label key in their nodeAffinity — the
chart-level pattern. Default list:
tenstorrent.com/deploy.tt-telemetry. Set to [] to disable the
gate-flip entirely.
Downgrade
Same flow; just patch spec.version to a lower number. The previous
version’s .ko is already cached if you’ve used it before.
Roll back via Helm
For driver-manager controller upgrades (not driver upgrades), use Helm:
helm -n tt-k8s-driver-manager-system rollback tt-k8s-driver-manager
CRs are unaffected; the controller pod restarts with the older image.
What gets put on hosts
Per node, after a successful reconcile:
/var/cache/tt-kmd/
└── 6.8.0-111-generic/
├── 2.7.0/tenstorrent.ko ← cached build from a previous CR
└── 2.8.0/tenstorrent.ko ← currently loaded
/usr/local/bin/tt-smi ← self-contained binary (PyInstaller build,
no host Python needed)
The kernel module itself is in-kernel — /sys/module/tenstorrent/ shows
it; lsmod | grep tenstorrent confirms; the on-disk .ko is only
consulted at insmod time.
The operator does NOT touch:
/lib/modules/<kver>/updates/dkms/— that’s where DKMS would install. The containerized builder usesinsmodfrom its cache, notmodprobe/ DKMS. Hosts that previously had a DKMS install will still have files there; the operator just ignores them and detects them as host-managed.modules.alias,modules.dep, modules-load.d. The operator’s containerized model doesn’t participate in the kernel module auto-load chain.
Node labels the operator sets
Label |
Value |
When |
|---|---|---|
|
running module’s version (from |
When the builder pod’s readiness probe passes |
|
|
After the builder copies tt-smi to host |
|
|
Set every reconcile based on host-install detection |
Workloads requiring a specific tt-kmd version can nodeSelector against
driver.tenstorrent.com/kmd-version:
nodeSelector:
driver.tenstorrent.com/kmd-version: "2.8.0"
This is honest: the label reflects what’s loaded right now, not what the CR says it wants. During a rolling upgrade, the label flips per-node as each pod’s new version becomes Ready.
Mixed mode
driver-manager auto-detects nodes that already have a host-managed tt-kmd install (e.g. via apt/DKMS from a config-management tool) and stands down on them. Signals checked:
/var/lib/dkms/tenstorrent/exists — DKMS tracks the module./usr/src/tenstorrent-<v>/dkms.confexists — DKMS source registered.
If either is present, the builder pod labels the node
driver.tenstorrent.com/install-mode=host and idles without:
running
rmmodorinsmod;copying tt-smi to
/usr/local/bin;writing to
/var/cache/tt-kmd/.
The pod still propagates kmd-version to the node label, so observability
works the same as for container-managed nodes.
To force a host-managed node into container mode, remove the host’s DKMS state — see Migrating from DKMS for the per-node vacate procedure plus cluster-side coordination, or Fully clean a host for the broader operator-side sweep. To do the inverse (keep operator out entirely, even from labelling), use the skip label.
Skip label
kubectl label node <name> driver.tenstorrent.com/skip=true opts the node
out of all driver-manager reconciliation. The DaemonSet’s nodeAffinity
includes a DoesNotExist requirement on this label, so labelling a
running node immediately evicts its installer pod (no rmmod first — the
loaded module stays, but pod-level state goes away).
Remove with kubectl label node <name> driver.tenstorrent.com/skip-.
Use this for hosts that are under hard manual management (debug nodes, mid-incident hosts you don’t want touched, etc.). For “host has its own install but the operator should still observe it,” use auto-detected mixed mode instead.
kubectl plugin
hack/plugins/kubectl-tt-driver collapses CR + DaemonSet + per-pod state
into one view. Install:
make install-plugins # copies to ~/.local/bin/
kubectl tt driver
Output is a single table: per-CR rows, per-node columns, ANSI color for
state. See hack/plugins/kubectl-tt-driver --help
for subcommands.