Community · Open Source · Tenstorrent Ecosystem

A hidden dimension of Tenstorrent awesomeness

A curated directory of projects, tools, models, and research for Tenstorrent hardware — contributed by the community and our team. Browse by category or search across all entries.

107 Projects

12 Categories

Open Source

Browse by category

🚀 Getting Started

The essential first steps — installer, core SDKs, and guided onboarding

tt-metal

TT-NN operator library and TT-Metalium low-level kernel programming model. The primary SDK for devel…

6 entries Browse →

🤖 AI & Models

Running, serving, and experimenting with AI models

tt-bio

Boltz-2 biomolecular model for drug discovery on Tenstorrent Blackhole. Supports single-card and mul…

24 entries Browse →

🕵️ AI Agents

Agentic systems and AI assistants running on TT hardware

dstack

Vendor-agnostic orchestration for training, inference, and agentic workloads across NVIDIA, AMD, TPU…

4 entries Browse →

⚙️ Custom Kernels & Low-Level

Metalium/tt-lang kernel authoring; anything sub-compiler

tt-tiny

Minimal Python code to access and program the Tenstorrent Blackhole chip directly — George Hotz's ex…

22 entries Browse →

🔨 Compilers & Frontends

Getting PyTorch/JAX/ONNX/CUDA models onto TT hardware

BarraCUDA

Open-source CUDA compiler targeting multiple GPU architectures including Tenstorrent. Compiles .cu f…

14 entries Browse →

🛠 Dev Tools & Debugging

Profiling, visualization, and debugging workloads

nvtop

htop-style process monitor for GPUs and AI accelerators. Supports AMD, Apple, Huawei, Intel, NVIDIA,…

18 entries Browse →

🖥 Hardware & System

Drivers, firmware, monitoring, and hardware management

tt-kmd

Tenstorrent kernel module driver. The Linux kernel module required to interface with Tenstorrent PCI…

17 entries Browse →

☁️ Cloud & Orchestration

Kubernetes, cloud deployment, and multi-node infrastructure

TT Console

Browser-based cloud console for exploring AI on Tenstorrent hardware. Run LLM inference, image and v…

5 entries Browse →

🔩 RISC-V & Architecture

ISA, simulation, and running Linux on TT silicon

tt-bh-linux

Linux demo for the Tenstorrent Blackhole P100/P150 card RISC-V cores. Boot a real Linux kernel on th…

14 entries Browse →

🔬 Research & Papers

Academic papers, theses, and HPC experiments

tt-tutorial (HPC)

Tutorial on Tenstorrent hardware for HPC researchers from the RISC-V Testbed project at Edinburgh/EP…

14 entries Browse →

🎮 Games & Demos

Creative, playful, and proof-of-concept projects

tt-zork-and-more

A Tenstorrent fork of Infocom's Zork I (and more!), running a Z-machine interpreter at least four di…

11 entries Browse →

📚 Guides, Tutorials & Education

Getting-started content, blog posts, lessons, courses

Programming Tenstorrent Processors

Deep-dive into the Tenstorrent architecture and Metalium programming model — circular buffers, kerne…

16 entries Browse →

🚀 Getting Started

nvtop community 10739⭐

htop-style process monitor for GPUs and AI accelerators. Supports AMD, Apple, Huawei, Inte…

dstack community 2160⭐

Vendor-agnostic orchestration for training, inference, and agentic workloads across NVIDIA…

BarraCUDA community 1697⭐

Open-source CUDA compiler targeting multiple GPU architectures including Tenstorrent. Comp…

tt-tiny community 66⭐

Minimal Python code to access and program the Tenstorrent Blackhole chip directly — George…

tt-sim community 13⭐

Community-built Tenstorrent architecture simulator written in Python. Runs without hardwar…

tt-iree community 12⭐

IREE (Intermediate Representation Execution Environment) ML compiler ported to Tenstorrent…

triton-tenstorrent community 11⭐

OpenAI Triton compiler plugin for Tenstorrent hardware. Write Triton kernels and target Te…

bhx community 4⭐

Boot stock Linux cloud images on the SiFive X280 RISC-V cores inside Tenstorrent Blackhole…

tt-bio community

Boltz-2 biomolecular model for drug discovery on Tenstorrent Blackhole. Supports single-ca…

· Jan 31, 2026

Programming Tenstorrent Processors community

Deep-dive into the Tenstorrent architecture and Metalium programming model — circular buff…

· Apr 21, 2025

Tenstorrent SFPU Kernel Series — Jason Davies community

Sponsored series of deep technical articles on implementing optimal SFPU kernels for the T…

· Nov 12, 2025

Tenstorrent Blackhole Architecture Guide community

A 6,500-word community deep dive into the Blackhole p100a architecture: the tile model (Te…

· Feb 28, 2026

grayskull-attention community 38⭐

FlashAttention-style attention kernel implemented entirely in on-chip SRAM on the Tenstorr…

tt-twitch community 28⭐

A Tenstorrent Grayskull kernel written live on Twitch by George Hotz. 120-core grid demons…

koyeb/tenstorrent-examples community 18⭐

Example applications and deployment configurations for running AI workloads on Tenstorrent…

blackhole-py community 14⭐

Pure Python driver for Tenstorrent Blackhole cards providing direct low-level hardware acc…

tenstorrent-tiny-examples community 14⭐

Simple C++ kernel experiments on a GraySkull e75 chip. Hands-on examples for learning the …

ttnn-helloworld-cpp community 14⭐

Minimal working example of using Tenstorrent TTNN in C++. The simplest possible starting p…

TT-GoL community 12⭐

Conway's Game of Life implemented on Tenstorrent hardware using TT-Metal kernels.

ttMandelbrot community 7⭐

Mandelbrot Set fractal renderer running on Tenstorrent hardware. A classic demo showcasing…

TT-Metal Mini Template community 7⭐

Minimal working CMake project template for starting a new TT-Metal project from scratch. G…

tt-tutorial (HPC) community 7⭐

Tutorial on Tenstorrent hardware for HPC researchers from the RISC-V Testbed project at Ed…

ttPEAK community 6⭐

clpeak-style peak-performance benchmark for Tenstorrent devices using TT-Metalium. Measure…

tenstorrent.nix community 6⭐

Nix flake packaging the Tenstorrent software stack for NixOS and Nix users. Reproducible, …

current community 5⭐

High-level parallel programming framework for Tenstorrent accelerators, abstracting TT-Met…

ttVecAdd community 5⭐

Minimal vector-addition example on Tenstorrent devices using TT-Metalium. A clean hello-wo…

ttas community 4⭐

ttas is a hacker-friendly assembler/disassembler for Tensix on Wormhole. It turns assembly…

tt-tutorial (Korean) community 4⭐

Comprehensive tutorials for the Tenstorrent software stack in Korean. Jupyter notebooks co…

Collective Operations on Wormhole n150 (Sapienza University of Rome) community 4⭐

Master's thesis implementing and benchmarking five allreduce algorithms (Swing, Recursive …

libtt-metal-cxx community 2⭐

Rust crate that exposes the TT-Metal host API through a C++ bridge via cxx.rs — covering d…

gsplat_tt community 1⭐

Port of Gaussian Splatting (3D scene reconstruction from 2D images) to Tenstorrent hardwar…

A Gentle Guide: Tenstorrent Card on Arch Linux with Metalium community

Step-by-step guide to getting a Tenstorrent card running on Arch Linux with the full Metal…

· Jul 7, 2024

Thoughts and Logs After Messing with Tenstorrent Grayskull community

Honest field notes from getting a Grayskull card running and writing first Metalium kernel…

· Jun 2, 2024

Tenstorrent Architecture — W&M CSCI654 Advanced Computer Architecture community

Lecture 20 from William & Mary's graduate Computer Architecture course. Frames Tenstorrent…

· Oct 9, 2024

Attention in SRAM on Tenstorrent Grayskull community

A fused kernel for the Grayskull architecture implementing Transformer self-attention enti…

· Jul 18, 2024

Exploring Fast Fourier Transforms on the Tenstorrent Wormhole community

Ports the Cooley-Tukey FFT algorithm to the Wormhole n300 RISC-V accelerator. The Wormhole…

· Jun 18, 2025

Assessing Tenstorrent Grayskull RISC-V MatMul Acceleration for LLMs community

Evaluates the Tenstorrent Grayskull e75 RISC-V accelerator for matrix multiplication at re…

· May 9, 2025

Porting Strategies for Gravitational N-Body Simulations on Tenstorrent Wormhole community

Evaluates three strategies for scaling an N-body code across multiple Tenstorrent Wormhole…

· May 4, 2026

Accelerating Gravitational N-Body Simulations on Tenstorrent Wormhole community

Accelerates an astrophysical N-body simulation on the Wormhole n300. Achieves 2× speedup a…

Nov 16, 2025

Numerical Kernels on a Spatial Accelerator: Tenstorrent Wormhole community

Implements three numerical kernels and composes them into a conjugate gradient solver on W…

Mar 24, 2026

Accelerating Stencils on the Tenstorrent Grayskull RISC-V Accelerator community

Explores stencil computation on the Grayskull PCIe RISC-V accelerator. Early academic work…

Sep 27, 2024

Stencil Computations on Tenstorrent Wormhole community

Maps 2D 5-point stencil computations onto the Tenstorrent Wormhole RISC-V AI dataflow acce…

May 8, 2026

SwiftNPU: Scalable Shape-Flexible Allocation for Inter-Core Connected NPUs community

Makes multi-tenant NPU sharing practical for Blackhole-class hardware using polynomial-tim…

Apr 27, 2026

TileLoom: Automatic Dataflow Planning for Spatial Dataflow Accelerators community

Compiler system that automatically generates efficient dataflow plans for tile-based langu…

· Dec 17, 2025

Rewriting TTS Inference Economics: Lightning V2 on Tenstorrent vs. NVIDIA L40S community

Shows that Text-to-Speech inference on Tenstorrent Lightning V2 achieves 4× lower cost tha…

· Mar 24, 2026

tt-zork-and-more affiliated 2⭐

A Tenstorrent fork of Infocom's Zork I (and more!), running a Z-machine interpreter at lea…

Local AI Agents on Tenstorrent affiliated

Three agentic projects running fully on-device: local AI agents on QuietBox 2, a coding as…

Video Generation on Tenstorrent affiliated

Three lesson-projects covering on-device video synthesis: frame-by-frame diffusion with tt…

tensix-viz affiliated

Hardware topology visualizer for Tenstorrent chips — from individual chip to full cluster.…

Tenstorrent Cookbook: Particle Life Simulator affiliated

Particle Life simulation on Tenstorrent hardware — an emergent-behavior N-body system wher…

CS Fundamentals on Tenstorrent Hardware affiliated

Seven-module computer science curriculum taught on real Tenstorrent hardware. Covers RISC-…

tt-lang-models affiliated 7⭐

A growing collection of models that use tt-lang for some or all of their implementation. R…

tt-qb-lights affiliated 2⭐

Sync your Tenstorrent Quietbox's RGB lighting to accelerator utilization status. Visual fe…

gemma4 affiliated 1⭐

Gemma 4 language model implemented in tt-lang (e4b variant) for direct execution on Tensto…

open-oasis affiliated 1⭐

tt-lang inference script for Oasis 500M — an interactive video world model running on Tens…

tt-model-runner affiliated 1⭐

Discover, load, and benchmark models with a GUI and TUI for tt-inference-server. Makes exp…

tt-claw affiliated

A Tenstorrent-powered claw machine that rewards players with real prizes. The QuietBox 2 r…

dflash affiliated

DFlash: Block Diffusion for Flash Speculative Decoding on Tenstorrent hardware using tt-la…

diamond affiliated

DIAMOND: Atari game-playing agent implemented on Tenstorrent hardware via tt-lang. Diffusi…

Engram affiliated

A Tenstorrent port of the DeepSeek Engram model using tt-lang. Brings DeepSeek's memory-ef…

Stable Diffusion XL on Tenstorrent affiliated

On-device image generation with Stable Diffusion XL running entirely on Tenstorrent hardwa…

tt-forge-compiletron affiliated

Compile more than 100 models on tt-forge in a display format suitable for demos. Comprehen…

Image Classification with TT-Forge affiliated

End-to-end image classification project using TT-Forge — compile and run a PyTorch classif…

tt-warp affiliated

Warp terminal plugin for Tenstorrent — integrates hardware status, model management, and d…

Tensix Grid Playground affiliated

Interactive browser-based visualizer of the Tenstorrent Tensix grid architecture. Explore …

Tenstorrent Cookbook: Conway's Game of Life affiliated

TT-Metalium implementation of Conway's Game of Life as a cookbook recipe. Each generation …

Custom Model Training on Tenstorrent affiliated

Eight-lesson series covering the full custom training workflow on TT hardware: dataset fun…

Tenstorrent Cookbook: Core Recipes affiliated

Three hands-on TT-Metalium kernel recipes: a Mandelbrot fractal explorer, real-time audio …

tt-bh-linux official 55⭐

Linux demo for the Tenstorrent Blackhole P100/P150 card RISC-V cores. Boot a real Linux ke…

TT Console official

Browser-based cloud console for exploring AI on Tenstorrent hardware. Run LLM inference, i…

tt-metal official 1518⭐

TT-NN operator library and TT-Metalium low-level kernel programming model. The primary SDK…

tt-buda official 314⭐

TT-BUDA: Tenstorrent's original Python compiler and runtime for AI workloads. Legacy stack…

tt-forge official 289⭐

Tenstorrent's MLIR-based compiler frontend. Enables running AI workloads from PyTorch, ONN…

tt-mlir official 280⭐

Tenstorrent MLIR compiler — the core compiler infrastructure shared by tt-forge and other …

riscv-ocelot official 255⭐

The Berkeley Out-of-Order Machine with V-EXT (RISC-V Vector Extension) support. Tenstorren…

ttsim official 122⭐

Fast full-system simulator of Tenstorrent Wormhole and Blackhole hardware. Runs TT-Metaliu…

whisper official 88⭐

RISC-V Instruction Set Simulator (ISS) used by Tenstorrent for processor verification. Pow…

tt-xla official 68⭐

PJRT device plugin for Tenstorrent hardware. Enables JAX, PyTorch/XLA, and other XLA-based…

RiESCUE official 66⭐

RISC-V Directed Test Framework and Compliance Suite. Comprehensive test infrastructure for…

tt-kmd official apt* 65⭐

Tenstorrent kernel module driver. The Linux kernel module required to interface with Tenst…

tt-buda-demos official 64⭐

Repository of model demos using TT-Buda. The largest collection of pre-compiled model exam…

tt-forge-onnx official 64⭐

ONNX graph compiler for Tenstorrent hardware. Optimizes and transforms ONNX model graphs f…

tt-smi official pip 61⭐

Tenstorrent System Management Interface — monitor device telemetry, issue board-level rese…

tt-inference-server official 58⭐

Production-ready model serving for Tenstorrent hardware with OpenAI-compatible REST API. S…

ttnn-visualizer official pip 52⭐

Comprehensive tool for visualizing and analyzing model execution on Tenstorrent hardware. …

tt-llk official 52⭐

Tenstorrent Low-Level Kernels: the C++ library that directly programs the RISC-V cores ins…

Jun 5, 2025

tt-lang official pippip 51⭐

Python-based DSL that sits between TT-NN and TT-Metalium — expresses custom fused kernels …

TT-Studio official 48⭐

Web-based GUI for deploying and chatting with AI models on Tenstorrent hardware. Handles a…

WallaBMC official 46⭐

Lightweight BMC (Baseboard Management Controller) for STM32 and similar MCUs, with Web UI,…

tt-umd official 43⭐

User-mode driver for Tenstorrent hardware. The userspace layer that sits between the kerne…

tt-system-firmware official 39⭐

System firmware for Tenstorrent hardware. Low-level system initialization and control firm…

luwen official cargo 34⭐

Tenstorrent system interface library written in Rust. Low-level Rust bindings for communic…

tt-tvm official 31⭐

TVM for Tenstorrent ASICs. Brings the Apache TVM compiler stack to Tenstorrent hardware, e…

tensix-isa-simulator official 29⭐

ISA-level simulator for the Tensix compute engine. Simulates the matrix, vector, and scala…

tt-torch official 25⭐

Frontend integration for PyTorch with tt-mlir. Compile PyTorch models directly to Tenstorr…

tt-firmware official apt* 24⭐

Tenstorrent firmware repository. Board management and control firmware for Tenstorrent acc…

tt-installer official 23⭐

Install the complete Tenstorrent software stack with one command. Handles drivers, firmwar…

tt-exalens official pip 21⭐

Low-level hardware debugger for Tenstorrent devices. Inspect register state, memory conten…

tt-topology official pip 16⭐

Configure Ethernet routing on multi-card Tenstorrent systems. Flash NB cards to use specif…

tt-npe official 14⭐

Network-on-chip Performance Estimator for Tenstorrent Tensix-based devices. Model and esti…

tt-blacksmith official 13⭐

Optimized training recipes for a variety of ML models on Tenstorrent hardware, powered by …

tt-example-apps official 13⭐

End-to-end AI applications running on Tenstorrent AI accelerators. Complete application ex…

tt-flash official pip 13⭐

Tenstorrent firmware update utility. Flash new firmware onto Tenstorrent accelerator cards…

tt-vscode-toolkit official 7⭐

48 interactive lessons covering the full Tenstorrent developer path — from hardware detect…

Dec 18, 2025

tt-toplike official 2⭐

A vibrant htop-style visualizer for Tenstorrent hardware written in Rust. Real-time proces…

tt-local-generator official 1⭐

Generate infinite videos and images (and imaginative prompts to inspire them) on Tenstorre…

tt-animatediff official

Generates short, temporally coherent animated GIFs using the AnimateDiff model on Tenstorr…

🏷 Recent Releases

33 releases

ttsim official v1.8.3

2026-06-13T17:31:51Z

tt-inference-server official v0.16.0

2026-06-12T18:21:42Z

tt-smi official v5.3.0

2026-06-12T15:35:05Z

tt-system-firmware official v19.11.0

2026-06-11T14:56:55Z

tt-exalens official v0.3.23

2026-06-11T14:42:44Z

dstack community 0.20.24

2026-06-11T13:55:33Z

tt-animatediff official v0.6.0

2026-06-10T22:16:43Z

ttnn-visualizer official v0.89.0

2026-06-10T18:50:24Z

tensix-viz affiliated v1.1.0

2026-06-09T22:19:42Z

tt-local-generator official v0.7.4

2026-06-09T21:59:39Z

tt-vscode-toolkit official v0.0.465

2026-06-09T20:32:59Z

tt-kmd official ttkmd-2.9.0

2026-06-09T13:25:19Z

tt-metal official v0.72.0

2026-06-09T01:30:48Z

tt-toplike official v0.6.2

2026-06-08T19:42:59Z

tt-umd official v0.9.6

2026-06-03T10:59:12Z

tt-flash official v3.8.0

2026-06-01T18:04:27Z

BarraCUDA community v0.5.0

2026-05-29T04:30:28Z

tt-forge official 1.2.0

2026-05-28T09:59:56Z

tt-forge-onnx official 1.2.0

2026-05-28T09:57:37Z

tt-xla official 1.2.0

2026-05-28T09:50:59Z

ttas community v0.1.0

2026-05-28T07:08:35Z

TT-Studio official v2.6.0

2026-05-20T17:04:32Z

whisper official 1.861

2026-05-11T15:44:36Z

tt-sim community v1.0

2026-05-11T13:07:42Z

tt-bh-linux official v0.11

2026-04-13T15:10:59Z

luwen official v0.8.5

2026-03-30T21:03:56Z

tt-installer official v2.2.1

2026-03-16T18:54:29Z

tt-topology official v1.2.19

2026-02-26T21:14:41Z

tt-firmware official v19.6.0

2026-02-20T16:53:34Z

nvtop community 3.3.2

2026-02-08T17:57:16Z

RiESCUE official v1.7.0

2025-12-03T19:29:44Z

tt-torch official 0.4.0

2025-09-29T22:23:47Z

tt-buda official v0.19.3

2024-09-24T21:01:08Z

Select an entry to see details

nvtop

community★ featured

by Syllo · C · GPL-3.0 · 10739⭐ ·

htop-style process monitor for GPUs and AI accelerators. Supports AMD, Apple, Huawei, Intel, NVIDIA, Qualcomm — and Tenstorrent. Real-time utilization, memory, and process info in a terminal UI.

Links

📦 Repo

Releases

LATEST 3.3.2 2026-02-08T17:57:16Z Release notes ↗

⬇ nvtop-3.3.2-x86_64.AppImage

4 previous releases

3.3.1 2026-01-18T13:12:34Z

3.3.0 2026-01-16T13:28:09Z

3.2.0 2025-03-29T11:26:44Z

3.1.0 2024-02-23T15:04:44Z

See all releases on GitHub ↗

monitoring tui htop process-monitor terminal

Works on

wormhole blackhole

dstack

community★ featured

by dstackai · Python · MPL-2.0 · 2160⭐ ·

Vendor-agnostic orchestration for training, inference, and agentic workloads across NVIDIA, AMD, TPU, and Tenstorrent on clouds, Kubernetes, and bare metal.

Links

📦 Repo 🌐 Website

Releases

LATEST 0.20.24 2026-06-11T13:55:33Z Release notes ↗

4 previous releases

0.20.25rc1pre 2026-06-12T15:38:06Z

0.20.23 2026-06-04T10:20:34Z

0.20.22 2026-05-28T10:24:19Z

0.20.21 2026-05-21T12:43:40Z

See all releases on GitHub ↗

orchestration kubernetes cloud multi-vendor

BarraCUDA

community★ featured

by Zaneham · C · 1697⭐ ·

Open-source CUDA compiler targeting multiple GPU architectures including Tenstorrent. Compiles .cu files to run on AMD and Tenstorrent hardware without modification.

Links

📦 Repo

Releases

LATEST v0.5.0 2026-05-29T04:30:28Z Release notes ↗

See all releases on GitHub ↗

cuda compiler cross-platform blackhole

Works on

blackhole

tt-tiny

community★ featured

by geohot · Python · 66⭐ ·

Minimal Python code to access and program the Tenstorrent Blackhole chip directly — George Hotz's exploration of TT hardware programmability with pointed commentary on the architecture.

Links

📦 Repo

blackhole low-level exploration

Works on

blackhole

tt-sim

community★ featured

by mesham · Python · 13⭐ ·

Community-built Tenstorrent architecture simulator written in Python. Runs without hardware — useful for researchers and developers exploring the Tensix architecture offline.

Links

📦 Repo

Releases

LATEST v1.0 2026-05-11T13:07:42Z Release notes ↗

See all releases on GitHub ↗

simulator architecture no-hardware research

tt-iree

community★ featured

by swote-git · C++ · Apache-2.0 · 12⭐ ·

IREE (Intermediate Representation Execution Environment) ML compiler ported to Tenstorrent AI accelerators. Brings the IREE compiler ecosystem to TT hardware.

Links

📦 Repo

iree compiler mlir inference

Works on

wormhole blackhole

triton-tenstorrent

community★ featured

by kernelize-ai · C++ · 11⭐ ·

OpenAI Triton compiler plugin for Tenstorrent hardware. Write Triton kernels and target Tensix cores — brings the Triton ML kernel ecosystem to TT devices.

Links

📦 Repo

triton openai-triton compiler kernels

Works on

wormhole blackhole

bhx

community★ featured

by olofj · Rust · 4⭐ ·

Boot stock Linux cloud images on the SiFive X280 RISC-V cores inside Tenstorrent Blackhole AI accelerators. Per-card Rust daemon with virtio-mmio block/net/console and U-Boot/EFI support.

Links

📦 Repo

📋 Changelog

# Changelog

Notable changes per release. Format loosely follows
[Keep a Changelog](https://keepachangelog.com/en/1.1.0/);
this project does not yet promise SemVer compatibility on the RPC
wire format or library API surface (we're not 1.0).

## Unreleased

V2 virtio-dispatch redesign. The kick ring + completion ring + host-
side throttle that grew up around #184 are gone; in their place is a
per-(slot, queue) dirty bitmap in BRISC L1. The bitmap is level-
sensitive — guest QUEUE_NOTIFY storms coalesce into a single set
byte, so the dispatch path can't fall behind under any burst. Wire
incompatible with 0.9.0; `TENSIX_PROTOCOL_VERSION` bumped 4 → 5.

### Added

- **V2 dirty-bitmap dispatch** (`#187` / `#188` / `#189`). BRISC
  writes 1 to `CTRL_OFF_DIRTY[slot][queue]` on every guest
  QUEUE_NOTIFY; the daemon's `Dispatcher` clears the byte and
  dispatches each pass. Replaces V1's 2048-entry kick ring +
  daemon-side `consume_kick_ring_pass` consumer.
- **V2 processed-cursor table** at `CTRL_OFF_PROCESSED`. Daemon
  publishes `used.idx` after each successful dispatch so
  warm-resume reads cursors directly without re-probing guest
  DRAM.
- **`bhx_notify_events_total`, `bhx_dispatch_passes_total`,
  `bhx_dispatch_queues_drained`** Prometheus counters surface the
  new dispatch path. The burst regression test (`scripts/
  soak_virtio_burst.py`) asserts `dispatch_passes_total > 0` to
  confirm the workload reached the new path.
- **`scripts/soak_virtio_burst.py`** — multi-queue burst regression
  test. Sustains 16-job direct=1 fio randwrite + a tight
  `printf` loop to `/dev/console`, samples `/metrics` every 1 s,
  and verifies the daemon log contains zero
  `kick.*drop|rescue|throttle.*ENGAGE` matches.
- **`DaemonState.chip_reset_this_session`** flag — gates
  `maybe_opportunistic_reset_board` so 4-way parallel cold boots
  reset the chip exactly once, not once per L2CPU. Without this
  the second-and-later resets blip the chip while earlier-booted
  L2CPUs hold mmap pages, SIGBUSing their workers.
- **`Dispatcher` (was `KickPoller`)** with documented testability
  seam (`CtrlL1Access` trait); `drain_dirty_bitmap` is unit-tested
  against an in-memory L1 fake covering all five visit/clear
  semantics cases plus the address-formula pins.

### Changed

- **`KickPoller` → `Dispatcher`**, plus `kick_poller` → `dispatcher`
  field on `DaemonState`, `tensix-kick-poller` → `tensix-dispatcher`
  thread name, `[kick-poller]` → `[dispatcher]` log tag,
  `kicks_consumed` → `dispatches_total`,
  `last_kick_slot_queue` → `last_dispatch_slot_queue`. Pure
  rename; no behavior change. V1 vocabulary scrubbed throughout
  the codebase (firmware, daemon, scripts, docs).
- **`CTRL_SIZE` shrinks 36 KiB → 4 KiB**. V2 footprint is ~1.5 KiB;
  the rest is reserved for future fields.
- **Stats-page offsets repacked** — V1 `STATS_OFF_KICK_DROPS`,
  `STATS_OFF_COMPL_EVENTS`, `STATS_OFF_LAST_COMPL` retired with
  V1 (#190); deprecated PRECAP / BLINDCAP / POSTCAP slots dropp

blackhole risc-v linux boot virtio

Works on

blackhole

tt-bio

community★ featured

by moritztng · Python · MIT · Jan 31, 2026

Boltz-2 biomolecular model for drug discovery on Tenstorrent Blackhole. Supports single-card and multi-card configurations — QuietBox (4×) and Galaxy (32×). Approaches physics-based FEP accuracy at 1000× the speed.

Links

📦 Repo 🎤 FOSDEM 2026 — Drug Discovery on Tenstorrent Hardware

drug-discovery blackhole inference biology multi-card

Works on

blackhole quietbox galaxy

Programming Tenstorrent Processors

community★ featured

by · Apr 21, 2025

Deep-dive into the Tenstorrent architecture and Metalium programming model — circular buffers, kernel synchronization, NoC routing, and where the footguns are. The honest guide to thinking in Tensix.

Links

📝 clehaxze.tw — April 2025

metalium programming-model tensix noc circular-buffers blog

Works on

wormhole blackhole

Tenstorrent SFPU Kernel Series — Jason Davies

community★ featured

by jasondavies · Nov 12, 2025

Sponsored series of deep technical articles on implementing optimal SFPU kernels for the Tenstorrent Wormhole and Blackhole vector units. Covers where, typecasting, 16/32-bit integer multiplication, cube root, and accurate sin/cos/tan — with cycle counts, assembly walkthroughs, and Blackhole vs Wormhole comparisons throughout.

Links

📝 Optimal "where" on Tenstorrent 📝 32-bit Integer Multiplication on Tenstorrent 📝 Typecast on Tenstorrent 📝 16-bit Integer Multiplication on Tenstorrent 📝 Cube Root on Tenstorrent 📝 Accurate sin/cos/tan on Tenstorrent

sfpu assembly vector-unit cycle-counting wormhole blackhole optimization sponsored

Works on

wormhole blackhole

Tenstorrent Blackhole Architecture Guide

community★ featured

by · Feb 28, 2026

A 6,500-word community deep dive into the Blackhole p100a architecture: the tile model (Tensix, DRAM, SiFive x280 L2CPU, Ethernet, PCIe, NoC arc), firmware startup sequence, MOP micro-op processor, replay buffer, FPU/SFPU sync, and the anatomy of a kernel. From the author of blackhole-py.

Links

📝 anuraagw.me — February 2026

blackhole architecture tensix noc sifive-x280 firmware mop sfpu deep-dive blog

Works on

blackhole

grayskull-attention

community

by moritztng · TeX · MIT · 38⭐ ·

FlashAttention-style attention kernel implemented entirely in on-chip SRAM on the Tenstorrent Grayskull chip using TT-Metalium. Pioneering work in low-level attention on TT hardware.

Links

📦 Repo

attention grayskull metalium sram kernel

Works on

grayskull

tt-twitch

community

by geohot · C++ · 28⭐ ·

A Tenstorrent Grayskull kernel written live on Twitch by George Hotz. 120-core grid demonstration of live kernel programming.

Links

📦 Repo

grayskull kernel live-coding demo

Works on

grayskull

koyeb/tenstorrent-examples

community

by koyeb · Dockerfile · 18⭐ ·

Example applications and deployment configurations for running AI workloads on Tenstorrent hardware via Koyeb's cloud platform.

Links

📦 Repo 🌐 Koyeb blog post

cloud koyeb deployment examples

blackhole-py

community

by boopdotpng · Python · MIT · 14⭐ ·

Pure Python driver for Tenstorrent Blackhole cards providing direct low-level hardware access without going through the full TT-Metal stack.

Links

📦 Repo

driver python blackhole low-level hardware-access

Works on

blackhole

tenstorrent-tiny-examples

community

by jaebaek · C++ · 14⭐ ·

Simple C++ kernel experiments on a GraySkull e75 chip. Hands-on examples for learning the TT-Metal programming model at the metal level.

Links

📦 Repo

examples grayskull cpp learning

Works on

grayskull

ttnn-helloworld-cpp

community

by marty1885 · C++ · 14⭐ ·

Minimal working example of using Tenstorrent TTNN in C++. The simplest possible starting point for C++ developers targeting TT hardware with TTNN.

Links

📦 Repo

c++ ttnn hello-world template

Works on

wormhole blackhole

TT-GoL

community

by JushBJJ · C++ · 12⭐ ·

Conway's Game of Life implemented on Tenstorrent hardware using TT-Metal kernels.

Links

📦 Repo

game-of-life demo kernels

ttMandelbrot

community

by marty1885 · C · 0BSD · 7⭐ ·

Mandelbrot Set fractal renderer running on Tenstorrent hardware. A classic demo showcasing parallel compute on Tensix cores.

Links

📦 Repo

mandelbrot demo fractals parallel

TT-Metal Mini Template

community

by JushBJJ · C++ · 7⭐ ·

Minimal working CMake project template for starting a new TT-Metal project from scratch. Good starting point for community kernel development.

Links

📦 Repo

template cmake starter boilerplate

tt-tutorial (HPC)

community

by RISCVtestbed · C++ · BSD-3-Clause · 7⭐ ·

Tutorial on Tenstorrent hardware for HPC researchers from the RISC-V Testbed project at Edinburgh/EPCC. Covers Wormhole from an HPC parallel-computing perspective.

Links

📦 Repo

tutorial hpc epcc edinburgh wormhole

Works on

wormhole

ttPEAK

community

by TT-Bounty-Hunters · C++ · ISC · 6⭐ ·

clpeak-style peak-performance benchmark for Tenstorrent devices using TT-Metalium. Measures theoretical peak throughput across operations — useful for hardware characterization.

Links

📦 Repo

benchmark performance clpeak metalium

Works on

wormhole blackhole

tenstorrent.nix

community

by RossComputerGuy · Nix · LGPL-2.1 · 6⭐ ·

Nix flake packaging the Tenstorrent software stack for NixOS and Nix users. Reproducible, declarative installation of TT drivers and tools.

Links

📦 Repo

nix nixos packaging flake reproducible

current

community

by seansiddens · C++ · 5⭐ ·

High-level parallel programming framework for Tenstorrent accelerators, abstracting TT-Metal into a research-oriented programming model for parallel computation.

Links

📦 Repo

framework parallel abstraction research

Works on

wormhole blackhole

ttVecAdd

community

by marty1885 · C++ · ISC · 5⭐ ·

Minimal vector-addition example on Tenstorrent devices using TT-Metalium. A clean hello-world for the TT-Metal kernel programming model in C++.

Links

📦 Repo

vector-add example metalium hello-world

ttas

community

by Zaneham · C · Apache-2.0 · 4⭐ ·

ttas is a hacker-friendly assembler/disassembler for Tensix on Wormhole. It turns assembly into the exact 32-bit words the hardware runs, and turns binaries back into readable instructions using the same shared instruction table.

Links

📦 Repo

Releases

LATEST v0.1.0 2026-05-28T07:08:35Z Release notes ↗

1 previous release

v0.0.1 2026-05-27T15:19:11Z

See all releases on GitHub ↗

assembler

Works on

wormhole

tt-tutorial (Korean)

community

by changh95 · Jupyter Notebook · 4⭐ ·

Comprehensive tutorials for the Tenstorrent software stack in Korean. Jupyter notebooks covering the full developer path from hardware setup to model inference.

Links

📦 Repo

tutorial korean jupyter getting-started

Works on

wormhole

Collective Operations on Wormhole n150 (Sapienza University of Rome)

community

by Charles Heron (Sapienza University of Rome) · 4⭐ ·

Master's thesis implementing and benchmarking five allreduce algorithms (Swing, Recursive Doubling, Bandwidth Optimal, Latency Optimal, Shared Memory) on the Wormhole n150. Bandwidth Optimal achieved best performance, approaching within 2× of theoretical optimal.

Links

📦 Repo

allreduce collective-ops wormhole mpi bandwidth

Works on

wormhole

libtt-metal-cxx

community

by Knight-Ops · Rust · 2⭐ ·

Rust crate that exposes the TT-Metal host API through a C++ bridge via cxx.rs — covering device management, program/kernel creation (from source file or inline string), circular buffers, semaphores, runtime arguments, sharded buffers, and MeshDevice workflows, with hardware-backed integration tests.

Links

📦 Repo

rust bindings cxx tt-metal ffi host-api

Works on

wormhole blackhole

gsplat_tt

community

by Kovelja009 · Python · 1⭐ ·

Port of Gaussian Splatting (3D scene reconstruction from 2D images) to Tenstorrent hardware.

Links

📦 Repo

gaussian-splatting computer-vision 3d-reconstruction blackhole

Works on

blackhole

A Gentle Guide: Tenstorrent Card on Arch Linux with Metalium

community

by · Jul 7, 2024

Step-by-step guide to getting a Tenstorrent card running on Arch Linux with the full Metalium stack. Practical troubleshooting from someone who did it the hard way first.

Links

📝 clehaxze.tw — July 2024

arch-linux metalium installation blog getting-started

Works on

grayskull wormhole

Thoughts and Logs After Messing with Tenstorrent Grayskull

community

by · Jun 2, 2024

Honest field notes from getting a Grayskull card running and writing first Metalium kernels. Covers setup pitfalls, processor hangs, memory protection quirks, and what makes Metalium compelling despite early rough edges.

Links

📝 clehaxze.tw — June 2024

grayskull metalium getting-started blog honest-review

Works on

grayskull

Tenstorrent Architecture — W&M CSCI654 Advanced Computer Architecture

community

by · Oct 9, 2024

Lecture 20 from William & Mary's graduate Computer Architecture course. Frames Tenstorrent in the landscape between GPUs and TPUs, draws comparisons to Cerebras and SambaNova, then dives deep into the Wormhole chip and Tensix core: the 5 RISC-V core design, SFPU, NoC, and dataflow execution model.

Links

🎥 Lecture 20 — Tenstorrent Architecture (YouTube)

lecture architecture wormhole tensix risc-v sfpu noc academia

Works on

wormhole

Attention in SRAM on Tenstorrent Grayskull

community

by · Jul 18, 2024

A fused kernel for the Grayskull architecture implementing Transformer self-attention entirely within SRAM. Combines matrix multiply, attention score scaling, and Softmax without DRAM accesses, achieving significant speedups over non-fused implementations.

Links

📄 arXiv:2407.13885

attention transformer sram grayskull kernel risc-v

Works on

grayskull

Exploring Fast Fourier Transforms on the Tenstorrent Wormhole

community

by · Jun 18, 2025

Ports the Cooley-Tukey FFT algorithm to the Wormhole n300 RISC-V accelerator. The Wormhole draws 8× less power and consumes 2.8× less energy than a 24-core Xeon Platinum for a 2D FFT. ISC 2025.

Links

📄 arXiv:2506.15437 📝 University of Edinburgh

fft wormhole hpc risc-v energy-efficiency epcc

Works on

wormhole

Assessing Tenstorrent Grayskull RISC-V MatMul Acceleration for LLMs

community

by · May 9, 2025

Evaluates the Tenstorrent Grayskull e75 RISC-V accelerator for matrix multiplication at reduced numerical precision (BFP8 and LoFi), a fundamental kernel in LLM inference computation.

Links

📄 arXiv:2505.06085

matmul grayskull risc-v bfp8 lofi llm precision

Works on

grayskull

Porting Strategies for Gravitational N-Body Simulations on Tenstorrent Wormhole

community

by · May 4, 2026

Evaluates three strategies for scaling an N-body code across multiple Tenstorrent Wormhole accelerators. Builds on the established performance of single-card N-body work to explore parallelism via the on-chip NoC and multi-accelerator configurations.

Links

📄 arXiv:2605.02744

n-body astrophysics hpc wormhole risc-v multi-accelerator simulation

Works on

wormhole

Accelerating Gravitational N-Body Simulations on Tenstorrent Wormhole

community

Nov 16, 2025

Accelerates an astrophysical N-body simulation on the Wormhole n300. Achieves 2× speedup and 2× energy savings over a highly optimized CPU implementation. SC '25 Workshop.

Links

📄 arXiv:2509.19294 📝 ACM SC '25

n-body astrophysics hpc wormhole risc-v simulation

Works on

wormhole

Numerical Kernels on a Spatial Accelerator: Tenstorrent Wormhole

community

Mar 24, 2026

Implements three numerical kernels and composes them into a conjugate gradient solver on Wormhole. Demonstrates AI accelerators merit consideration for HPC workloads traditionally dominated by CPUs and GPUs. 2026.

Links

📄 arXiv:2603.23343

numerical-methods hpc conjugate-gradient wormhole sparse

Works on

wormhole

Accelerating Stencils on the Tenstorrent Grayskull RISC-V Accelerator

community

Sep 27, 2024

Explores stencil computation on the Grayskull PCIe RISC-V accelerator. Early academic work examining TT hardware for HPC stencil workloads. 2024.

Links

📄 arXiv:2409.18835

stencil hpc grayskull risc-v

Works on

grayskull

Stencil Computations on Tenstorrent Wormhole

community

May 8, 2026

Maps 2D 5-point stencil computations onto the Tenstorrent Wormhole RISC-V AI dataflow accelerator via two implementations: element-wise decomposition (Axpy) and matrix-multiplication reformulation (MatMul). Profiling shows the isolated Wormhole kernel is competitive with CPU execution, with PCIe transfers and initialization driving end-to-end overhead; Axpy achieves lower energy than the CPU baseline at large scales. Identifies architectural and software directions for making AI accelerators viable for HPC stencil workloads. 2025.

Links

📄 arXiv:2605.07599

stencil hpc wormhole risc-v energy-efficiency benchmarks dataflow

Works on

wormhole

SwiftNPU: Scalable Shape-Flexible Allocation for Inter-Core Connected NPUs

community

Apr 27, 2026

Makes multi-tenant NPU sharing practical for Blackhole-class hardware using polynomial-time allocation algorithms. Delivers up to 1.37× higher utilization and 1.14× faster workload completion. Up to 890,000× faster than NP-hard baselines.

Links

📄 ACM DL

multi-tenant allocation blackhole npu scheduling

Works on

blackhole

TileLoom: Automatic Dataflow Planning for Spatial Dataflow Accelerators

community

by · Dec 17, 2025

Compiler system that automatically generates efficient dataflow plans for tile-based languages on spatial accelerators including Tenstorrent Wormhole. Exploits on-chip network forwarding between processing elements to reduce DRAM pressure.

Links

📄 arXiv:2512.22168

compiler dataflow spatial-accelerator tile-based on-chip-network wormhole

Works on

wormhole

Rewriting TTS Inference Economics: Lightning V2 on Tenstorrent vs. NVIDIA L40S

community

by · Mar 24, 2026

Shows that Text-to-Speech inference on Tenstorrent Lightning V2 achieves 4× lower cost than NVIDIA L40S. Applies BlockFloat8 (BFP8) and low-fidelity (LoFi) precision strategies to TTS despite their greater numerical fragility compared to LLMs.

Links

📄 arXiv:2604.03279

tts text-to-speech inference bfp8 lofi cost-efficiency precision

Works on

wormhole

tt-zork-and-more

affiliated ⑂ historicalsource/zork1★ featured

by tsingletaryTT · Python · 2⭐ ·

A Tenstorrent fork of Infocom's Zork I (and more!), running a Z-machine interpreter at least four different ways on TT hardware. The most fun you can have with an AI accelerator.

Links

📦 Repo 🌐 Website

zork z-machine interactive-fiction demo fun

Local AI Agents on Tenstorrent

affiliated★ featured

by ·

Three agentic projects running fully on-device: local AI agents on QuietBox 2, a coding assistant powered by Aider against a local inference server, and the OpenClaw AI assistant on QuietBox 2. No cloud APIs — all inference runs on TT hardware.

Links

📖 Local AI Agents on QuietBox 2 📖 Coding Assistant with Aider

agents local-llm aider coding-assistant quietbox on-device

Works on

wormhole blackhole quietbox

Video Generation on Tenstorrent

affiliated★ featured

by ·

Three lesson-projects covering on-device video synthesis: frame-by-frame diffusion with tt-local-generator, native AnimateDiff video animation, and video generation on QuietBox 2. All run entirely on TT hardware with no cloud dependency.

Links

📖 Video Generation via Frame-by-Frame Diffusion 📖 Native Video Animation with AnimateDiff 📖 Video Generation on QuietBox 2

video-generation diffusion animatediff tt-local-generator quietbox on-device

Works on

wormhole blackhole quietbox

tensix-viz

affiliated★ featured

by tsingletaryTT · JavaScript ·

Hardware topology visualizer for Tenstorrent chips — from individual chip to full cluster. Interactive JavaScript visualization of Tensix core layout and NoC connections.

Links

📦 Repo 🌐 Website

Releases

LATEST v1.1.0 2026-06-09T22:19:42Z Release notes ↗

See all releases on GitHub ↗

📋 Changelog

# Changelog

All notable changes to tensix-viz are documented here.

## [1.1.0] - 2026-06-09

### Fixed

- **Heatmap: non-tensix cells no longer painted by heat overlay** (`src/chip.js` `_drawHeatmap`)
  Commit 76dca80 added `coreType !== 'tensix'` guards to the pre-built artifacts but never to
  the source. The guards are now in `src/chip.js` so the next build preserves them. Without this
  fix, DRAM (col 5 on Wormhole), ETH (row 6 on Wormhole), and PCIe (col 8 on Blackhole) cells
  were colored by the heatmap overlay and could inflate `maxVal`, compressing the visible range
  for all tensix cells.

- **Memory overlay: stale phase not rendered after `reset()` on `showMemory: true` instances**
  (`src/chip.js` `reset()` and constructor)
  After calling `viz.activate(mode)` followed by `viz.reset()` on a canvas created with
  `showMemory: true`, `_memPhase` retained the frozen `_mem` object from the animation closure.
  `reset()` calls `render()` at the end, which caused `_drawMemoryLayer()` to run with stale data,
  producing a faint DRAM glow and L1 fill bars on an otherwise blank chip. `reset()` now sets
  `this._memPhase = null`; the field is also explicitly initialized to `null` in the constructor.

- **Canvas context: `getContext('2d')` moved to after canvas sizing**
  (`src/chip.js` constructor)
  The 2D context was obtained before `canvas.width`/`canvas.height` were assigned. Assigning to
  `canvas.width` resets all context state per spec, making the early `getContext` call redundant
  and inconsistent with the intent. `this.ctx` is now assigned after the sizing block so the
  obtained context reflects the final dimensions.

### Added

- **Responsive canvas sizing** (`src/chip.js` constructor)
  If `canvas.parentElement` exists and `clientWidth` is smaller than the canvas's intrinsic
  `width` attribute, logical dimensions are capped to the container width and height is scaled
  proportionally. Applies at construction time; re-create the instance for later resizes.

- **Float label boundary clamping** (overridden `render()`)
  The floating tooltip label is now clamped so its pill box never overflows any canvas edge.
  `rawCx`/`rawCy` are constrained by `Math.max(w/2+margin, Math.min(logicalW-w/2-margin, raw*))`.

## [1.0.0] - 2026-05-18

Initial public release.

visualization topology noc hardware

Works on

wormhole blackhole

Live chip topology

Blackhole · P100 / P150 / P300c · 140 Tensix cores

Wormhole · N150 / N300 · 64 Tensix cores

mode

Tenstorrent Cookbook: Particle Life Simulator

affiliated★ featured

by ·

Particle Life simulation on Tenstorrent hardware — an emergent-behavior N-body system where simple attraction/repulsion rules between species produce complex lifelike patterns. Cookbook recipe demonstrating parallel N-body compute on Tensix.

Links

📖 Cookbook Recipe 5: Particle Life Simulator

particle-life n-body simulation emergent cookbook demo

Works on

wormhole blackhole

CS Fundamentals on Tenstorrent Hardware

affiliated★ featured

by ·

Seven-module computer science curriculum taught on real Tenstorrent hardware. Covers RISC-V architecture, memory hierarchy, parallel computing, networks and NoC, synchronization, abstraction layers, and computational complexity — all grounded in what is physically happening on the chip.

Links

📖 Module 1: RISC-V & Computer Architecture 📖 Module 2: The Memory Hierarchy 📖 Module 3: Parallel Computing 📖 Module 4: Networks and Communication 📖 Module 5: Synchronization 📖 Module 6: Abstraction Layers 📖 Module 7: Computational Complexity in Practice

computer-science curriculum risc-v parallelism memory noc education

Works on

wormhole blackhole

tt-lang-models

affiliated

by zoecarver · Python · 7⭐ ·

A growing collection of models that use tt-lang for some or all of their implementation. Reference implementations for bringing modern models to the tt-lang DSL.

Links

📦 Repo

tt-lang models dsl reference

tt-qb-lights

affiliated

by tsingletaryTT · Rust · 2⭐ ·

Sync your Tenstorrent Quietbox's RGB lighting to accelerator utilization status. Visual feedback for hardware activity in real time.

Links

📦 Repo

quietbox rgb hardware fun

Works on

quietbox

gemma4

affiliated

by zoecarver · Python · 1⭐ ·

Gemma 4 language model implemented in tt-lang (e4b variant) for direct execution on Tenstorrent hardware.

Links

📦 Repo

gemma llm tt-lang inference

Works on

blackhole

open-oasis

affiliated ⑂ etched-ai/open-oasis

by zoecarver · Python · 1⭐ ·

tt-lang inference script for Oasis 500M — an interactive video world model running on Tenstorrent hardware via the tt-lang DSL.

Links

📦 Repo

video world-model oasis tt-lang inference

Works on

blackhole

tt-model-runner

affiliated

by tsingletaryTT · Python · 1⭐ ·

Discover, load, and benchmark models with a GUI and TUI for tt-inference-server. Makes exploring available models on Tenstorrent hardware as easy as browsing a catalog.

Links

📦 Repo

gui tui models inference benchmark

Works on

wormhole blackhole quietbox

tt-claw

affiliated

by tsingletaryTT · Shell ·

A Tenstorrent-powered claw machine that rewards players with real prizes. The QuietBox 2 runs local AI inference to act as an agent controlling the claw hardware — the OpenClaw AI assistant lesson builds directly on this project.

Links

📦 Repo 📖 OpenClaw AI Assistant on QuietBox 2

claw-machine agents hardware quietbox physical on-device

Works on

quietbox

dflash

affiliated ⑂ z-lab/dflash

by zoecarver · Python ·

DFlash: Block Diffusion for Flash Speculative Decoding on Tenstorrent hardware using tt-lang. Combines block diffusion with speculative decoding for faster inference.

Links

📦 Repo 🌐 Website

speculative-decoding diffusion tt-lang inference

diamond

affiliated ⑂ eloialonso/diamond

by zoecarver · Python ·

DIAMOND: Atari game-playing agent implemented on Tenstorrent hardware via tt-lang. Diffusion-based world model for reinforcement learning.

Links

📦 Repo 🌐 Website

atari reinforcement-learning world-model tt-lang

Engram

affiliated ⑂ deepseek-ai/Engram

by zoecarver · Python ·

A Tenstorrent port of the DeepSeek Engram model using tt-lang. Brings DeepSeek's memory-efficient architecture to TT hardware.

Links

📦 Repo

deepseek engram tt-lang inference

Works on

blackhole

Stable Diffusion XL on Tenstorrent

affiliated

by ·

On-device image generation with Stable Diffusion XL running entirely on Tenstorrent hardware. Full inference pipeline with no cloud dependency.

Links

📖 Image Generation with Stable Diffusion XL

stable-diffusion sdxl image-generation diffusion on-device

Works on

wormhole blackhole

tt-forge-compiletron

affiliated

by tsingletaryTT · Python ·

Compile more than 100 models on tt-forge in a display format suitable for demos. Comprehensive showcase of tt-forge model compatibility.

Links

📦 Repo

tt-forge models demo compilation

Image Classification with TT-Forge

affiliated

by ·

End-to-end image classification project using TT-Forge — compile and run a PyTorch classification model on Tenstorrent hardware with no kernel authoring required.

Links

📖 Image Classification with TT-Forge

forge image-classification pytorch compiler inference

Works on

wormhole blackhole

tt-warp

affiliated

by tsingletaryTT · Python ·

Warp terminal plugin for Tenstorrent — integrates hardware status, model management, and developer workflows directly into the Warp terminal.

Links

📦 Repo

warp terminal plugin developer-experience

Tensix Grid Playground

affiliated

by ·

Interactive browser-based visualizer of the Tenstorrent Tensix grid architecture. Explore the NoC, core layout, and dataflow patterns without hardware — a great companion for learning kernel programming.

Links

🚀 Tensix Grid Playground (interactive)

visualization interactive noc tensix browser architecture

Tenstorrent Cookbook: Conway's Game of Life

affiliated

by ·

TT-Metalium implementation of Conway's Game of Life as a cookbook recipe. Each generation is a full parallel kernel dispatch over the grid — a clean introduction to stateful compute on Tensix cores.

Links

📖 Cookbook Recipe 1: Conway's Game of Life

game-of-life demo cookbook parallel metalium

Works on

wormhole blackhole

Custom Model Training on Tenstorrent

affiliated

by ·

Eight-lesson series covering the full custom training workflow on TT hardware: dataset fundamentals, configuration patterns, fine-tuning, multi-device distributed training, experiment tracking, model architecture basics, and training from scratch.

Links

📖 Understanding Custom Training 📖 Dataset Fundamentals 📖 Configuration Patterns 📖 Fine-tuning Basics 📖 Multi-Device Training 📖 Experiment Tracking 📖 Model Architecture Basics 📖 Training from Scratch

training fine-tuning multi-device distributed experiment-tracking curriculum

Works on

wormhole blackhole

Tenstorrent Cookbook: Core Recipes

affiliated

by ·

Three hands-on TT-Metalium kernel recipes: a Mandelbrot fractal explorer, real-time audio signal processing pipeline, and custom image filter stack. Each recipe is a complete kernel project with full source in the lesson.

Links

📖 Tenstorrent Cookbook Overview 📖 Recipe 3: Mandelbrot Fractal Explorer 📖 Recipe 2: Audio Signal Processing 📖 Recipe 4: Custom Image Filters

cookbook mandelbrot audio image-processing metalium demo

Works on

wormhole blackhole

tt-bh-linux

official★ featured

C · GPL-2.0 · 55⭐ ·

Linux demo for the Tenstorrent Blackhole P100/P150 card RISC-V cores. Boot a real Linux kernel on the 16 high-performance RISC-V cores built into the Blackhole chip.

Links

📦 Repo 🌐 Website

Releases

LATEST v0.11 2026-04-13T15:10:59Z Release notes ↗

⬇ tt-bh-disk-image.zip ⬇ tt-bh-linux.zip

4 previous releases

v0.10 2026-02-11T22:41:22Z

v0.9 2025-10-14T20:56:23Z

v0.5 2025-10-01T15:40:57Z

v0.4 2025-08-09T18:05:10Z

See all releases on GitHub ↗

linux risc-v blackhole bare-metal boot

Works on

blackhole

TT Console

official★ featured

Browser-based cloud console for exploring AI on Tenstorrent hardware. Run LLM inference, image and video generation, and browse the supported model catalog in-browser — backed by Tenstorrent accelerators. Cloud hardware access and advanced workflows (deployments, agents) available in staged rollout.

Links

🌐 console.tenstorrent.com

cloud console inference playground llm image-generation video-generation demo

Works on

wormhole blackhole

tt-metal

official

C++ · Apache-2.0 · 1518⭐ ·

TT-NN operator library and TT-Metalium low-level kernel programming model. The primary SDK for developing on Tenstorrent hardware — from high-level tensor ops to bare-metal RISC-V kernels.

Links

📦 Repo 🌐 Website

Releases

LATEST v0.72.0 2026-06-09T01:30:48Z Release notes ↗

🐧 tt-metalium-dev_0.72.0.ubuntu22.04_amd64.deb 🐧 tt-metalium-dev_0.72.0.ubuntu24.04_amd64.deb 🐧 tt-metalium-examples_0.72.0.ubuntu22.04_amd64.deb 🐧 tt-metalium-examples_0.72.0.ubuntu24.04_amd64.deb +15 more

5 previous releases

v0.73.0-dev20260615pre 2026-06-15T04:51:14Z

v0.72.0-rc9pre 2026-06-15T05:34:46Z

v0.73.0-dev20260614pre 2026-06-14T03:17:11Z

v0.72.0-rc8pre 2026-06-14T05:15:58Z

v0.73.0-dev20260613pre 2026-06-13T19:28:30Z

See all releases on GitHub ↗

metalium ttnn sdk kernels core

Works on

grayskull wormhole blackhole ttsim

tt-buda

official

Python · Apache-2.0 · 314⭐ ·

TT-BUDA: Tenstorrent's original Python compiler and runtime for AI workloads. Legacy stack — tt-forge is the recommended successor, but tt-buda has the largest model demo library.

Links

📦 Repo

Releases

LATEST v0.19.3 2024-09-24T21:01:08Z Release notes ↗

⬇ pybuda-gs-v0.19.3-ubuntu-20-04-amd64-python3.8.zip ⬇ pybuda-gs-v0.19.3-ubuntu-22-04-amd64-python3.10.zip ⬇ pybuda-wh.b0-v0.19.3-ubuntu-20-04-amd64-python3.8.zip ⬇ pybuda-wh.b0-v0.19.3-ubuntu-22-04-amd64-python3.10.zip

4 previous releases

v0.18.2 2024-07-18T15:58:39Z

v0.17.0-alpha 2024-06-05T20:07:29Z

v0.15.0-alpha 2024-05-23T19:53:00Z

v0.12.3 2024-05-10T22:25:40Z

See all releases on GitHub ↗

legacy compiler pytorch buda

Works on

grayskull wormhole

tt-forge

official

Python · Apache-2.0 · 289⭐ ·

Tenstorrent's MLIR-based compiler frontend. Enables running AI workloads from PyTorch, ONNX, and other frameworks on all Tenstorrent hardware configurations through an open-source, general, and performant compiler.

Links

📦 Repo 🌐 Website

Releases

LATEST 1.2.0 2026-05-28T09:59:56Z Release notes ↗

5 previous releases

1.3.0.dev20260615003539 2026-06-15T01:20:39Z

1.3.0.dev20260614003409 2026-06-14T01:53:07Z

1.3.0.dev20260613003624 2026-06-13T01:28:10Z

1.3.0.dev20260609002802 2026-06-09T01:16:05Z

1.3.0.dev20260607003211 2026-06-07T01:27:37Z

See all releases on GitHub ↗

mlir compiler pytorch onnx frontend

Works on

wormhole blackhole ttsim

tt-mlir

official

C++ · Apache-2.0 · 280⭐ ·

Tenstorrent MLIR compiler — the core compiler infrastructure shared by tt-forge and other frontends. Handles graph optimization, lowering, and code generation for Tensix hardware.

Links

📦 Repo 🌐 Website

Releases

5 releases

0.9.0.dev20260221pre 2026-02-21T04:31:50Z

0.9.0.dev20260220pre 2026-02-20T04:34:35Z

0.9.0.dev20260219pre 2026-02-19T04:37:24Z

0.9.0.dev20260218pre 2026-02-18T04:38:21Z

0.9.0.dev20260217pre 2026-02-17T04:37:09Z

See all releases on GitHub ↗

mlir compiler backend optimization

Works on

wormhole blackhole

riscv-ocelot

official ⑂ riscv-boom/riscv-boom

SystemVerilog · Apache-2.0 · 255⭐ ·

The Berkeley Out-of-Order Machine with V-EXT (RISC-V Vector Extension) support. Tenstorrent's research-grade out-of-order RISC-V core with vector extension.

Links

📦 Repo

risc-v out-of-order vector-extension processor-design

ttsim

official

C++ · Apache-2.0 · 122⭐ ·

Fast full-system simulator of Tenstorrent Wormhole and Blackhole hardware. Runs TT-Metalium workloads on any Linux/x86_64 system without physical silicon. Bit-exact results relative to hardware.

Links

📦 Repo 📖 Lesson

Releases

LATEST v1.8.3 2026-06-13T17:31:51Z Release notes ↗

4 previous releases

v1.8.2 2026-06-11T20:20:19Z

v1.8.1 2026-06-10T17:33:34Z

v1.8.0 2026-06-09T17:23:20Z

v1.7.3 2026-06-05T22:44:41Z

See all releases on GitHub ↗

simulator no-hardware bit-exact wormhole blackhole

Works on

ttsim

whisper

official ⑂ chipsalliance/VeeR-ISS

C++ · Apache-2.0 · 88⭐ ·

RISC-V Instruction Set Simulator (ISS) used by Tenstorrent for processor verification. Powers the co-simulation architecture checker.

Links

📦 Repo

Releases

LATEST 1.861 2026-05-11T15:44:36Z Release notes ↗

See all releases on GitHub ↗

risc-v iss simulator verification

tt-xla

official

Python · Apache-2.0 · 68⭐ ·

PJRT device plugin for Tenstorrent hardware. Enables JAX, PyTorch/XLA, and other XLA-based frameworks to target TT accelerators.

Links

📦 Repo 📖 JAX and PyTorch/XLA on Tenstorrent 🌐 Website

Releases

LATEST 1.2.0 2026-05-28T09:50:59Z Release notes ↗

5 previous releases

1.3.0.dev20260615003539 2026-06-15T01:13:05Z

1.3.0.dev20260614003409 2026-06-14T01:44:58Z

1.3.0.dev20260613003624 2026-06-13T01:16:13Z

1.3.0.dev20260612003634 2026-06-12T01:14:41Z

1.3.0.dev20260611003559 2026-06-11T01:13:24Z

See all releases on GitHub ↗

xla pjrt jax pytorch

Works on

wormhole blackhole

RiESCUE

official

Python · Apache-2.0 · 66⭐ ·

RISC-V Directed Test Framework and Compliance Suite. Comprehensive test infrastructure for verifying RISC-V processor implementations against the specification.

Links

📦 Repo 🌐 Website

Releases

LATEST v1.7.0 2025-12-03T19:29:44Z Release notes ↗

4 previous releases

v1.5.0 2025-11-17T21:58:14Z

v1.3.0 2025-11-06T20:12:13Z

v1.1.2 2025-10-16T17:21:43Z

v0.2.5 2025-07-10T00:59:12Z

See all releases on GitHub ↗

risc-v testing compliance verification

tt-kmd

official

C · GPL-2.0 · 65⭐ ·

Tenstorrent kernel module driver. The Linux kernel module required to interface with Tenstorrent PCIe accelerator cards.

Links

📦 Repo

Releases

LATEST ttkmd-2.9.0 2026-06-09T13:25:19Z Release notes ↗

⬇ tenstorrent-dkms-2.9.0-1.noarch.rpm 🐧 tenstorrent-dkms_2.9.0_all.deb

🐧 apt install ttkmd

⚙ Requires PPA — setup instructions ↗

4 previous releases

ttkmd-2.9.99-testingpre 2026-06-12T17:39:45Z

ttkmd-2.9.0-rc1pre 2026-05-26T19:10:10Z

ttkmd-2.8.0 2026-04-06T18:58:39Z

ttkmd-2.8.0-rc1pre 2026-04-04T01:30:29Z

See all releases on GitHub ↗

kernel-module driver linux pcie

Works on

grayskull wormhole blackhole

tt-buda-demos

official

Python · Apache-2.0 · 64⭐ ·

Repository of model demos using TT-Buda. The largest collection of pre-compiled model examples for Tenstorrent hardware — BERT, ResNet, YOLO, GPT-2, Whisper, and many more.

Links

📦 Repo

demos models bert resnet yolo gpt2

Works on

grayskull wormhole

tt-forge-onnx

official

Python · Apache-2.0 · 64⭐ ·

ONNX graph compiler for Tenstorrent hardware. Optimizes and transforms ONNX model graphs for efficient execution on Tensix accelerators. Used as a backend by tt-forge for ONNX model ingestion.

Links

📦 Repo

Releases

LATEST 1.2.0 2026-05-28T09:57:37Z Release notes ↗

5 previous releases

1.3.0.dev20260615011951 2026-06-15T01:41:50Z

1.3.0.dev20260614012704 2026-06-14T01:45:56Z

1.3.0.dev20260613011732 2026-06-13T01:37:06Z

1.3.0.dev20260612012108 2026-06-12T01:42:56Z

1.3.0.dev20260611014638 2026-06-11T02:07:54Z

See all releases on GitHub ↗

onnx compiler graph-optimization mlir

Works on

wormhole blackhole

tt-smi

official

Python · Apache-2.0 · 61⭐ ·

Tenstorrent System Management Interface — monitor device telemetry, issue board-level resets, and inspect hardware health. The nvidia-smi equivalent for Tenstorrent hardware.

Links

📦 Repo

Releases

LATEST v5.3.0 2026-06-12T15:35:05Z Release notes ↗

🐍 tt_smi-5.3.0-py3-none-any.whl

🐍 pip install tt-smi

4 previous releases

v5.2.0 2026-05-14T17:26:26Z

v5.1.1 2026-05-12T22:18:05Z

v5.1.0 2026-05-11T16:23:13Z

v5.0.1 2026-04-24T11:39:48Z

See all releases on GitHub ↗

📋 Changelog

# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## 3.0.26 - 29/07/25
- Added single tray galaxy reset option
- Bumped luwen from 0.7.5 -> 0.7.10
  - Chip detect now doesn't wait for eth to train for the 6U galaxy's, allowing multi tray resets to happen independently
- Updated readme with the new reset option

## 3.0.25 - 29/07/25
- Added packaging

## 3.0.24 - 04/07/25
- Now users have 2 galay reset modes available
  - glx_reset: resets the galaxy, informs users if there has been an eth failure
  - glx_reset_auto: resets the galaxy upto 3 times if eth failures are detected

## 3.0.23 - 03/07/25
- Bumped luwen 0.7.3 -> 0.7.5 to fix cargo lock compatibilty issue

## 3.0.22 - 02/07/25
- Bumped tt-tools-common 1.4.16 -> 1.4.17
- Bumped luwen 0.7.2 -> 0.7.3
- Bumped smi 3.0.21 -> 3.0.22

## 3.0.21 - 26/06/25

- Added option to not re-init chips after reset
- Updated galaxy 6u reset option from --ubb_reset to -glx_reset
- Removed the a3 arc message before doing a 6u reset, meaning we can reset even when chips are not pcie accessible
- Added eth link check and return failure if any of the eth links have a LINK_INACTIVE_FAIL_DUMMY_PACKET failure

## 3.0.20 - 04/06/25

- Chore - bumped tt-tools-common version to fix driver version check for compatability with tt-kmd 2.0.0

## 3.0.19 - 30/04/25

- Fixed an issue preventing the telemetry thread from being dispatched when the user clicked tab 2

## 3.0.18 - 22/05/25

- Added BH and WH UBB board type support
- Removed the dependency on tt-tools-common for this info

## 3.0.17 - 13/05/25

- Added proper telemetry heartbeat checks for Grayskull

## 3.0.16 - 12/05/25

- Used new ResetTypes from tools-common to simplify reset code
- Added a heartbeat spinner to the telemetry pane. We expect this spinner to update about twice per second. If the spinner is not moving, this indicates new telemetry is not being fetched.

## 3.0.15 - 24/04/25

- Patch for the ubb_reset to just discover local only post reset. Looks like eth port status 2 has been re-used to mean connected and pyluwen waits for it to clear, leading to eth timeout.

## 3.0.14 - 21/04/25

- Added wh ubb reset via command line `tt-smi --ubb_reset`. Intention is that this command line option will be removed and integrated into `tt-smi -r` after we update board detection with the correct external naming.
- Removed some unused imports and code - no functional changes

## 3.0.13 - 21/03/25

- Removed get\_sw\_versions

## 3.0.12 - 21/03/25

- Chore - bumped luwen version to include eth fw version check fix

## 3.0.11 - 13/03/25

- Chore - bumped luwen version to include enable chips with external connections but no routing

## 3.0.10 - 10/03/25

- Chore - bumped luwen version to include protoc lib detection check

## 3.0.9 - 07/03/25

- Chore - bumped luwen v

monitoring telemetry smi hardware-management

Works on

grayskull wormhole blackhole

tt-inference-server

official

Python · Apache-2.0 · 58⭐ ·

Production-ready model serving for Tenstorrent hardware with OpenAI-compatible REST API. Supports continuous batching, multiple models, and all TT hardware configurations.

Links

📦 Repo 📖 Production Inference lesson (VSCode Toolkit)

Releases

LATEST v0.16.0 2026-06-12T18:21:42Z Release notes ↗

⬇ v0.16.0-release_artifacts.zip

4 previous releases

v0.15.0 2026-05-29T15:55:11Z

v0.14.0 2026-05-15T22:34:02Z

v0.13.0 2026-04-24T20:21:26Z

v0.10.1 2026-04-08T09:58:17Z

See all releases on GitHub ↗

serving openai-compatible production rest-api

Works on

wormhole blackhole quietbox galaxy

ttnn-visualizer

official

TypeScript · Apache-2.0 · 52⭐ ·

Comprehensive tool for visualizing and analyzing model execution on Tenstorrent hardware. Interactive graphs, memory plots, tensor details, buffer overviews, operation flow graphs, and multi-instance support.

Links

📦 Repo

Releases

LATEST v0.89.0 2026-06-10T18:50:24Z Release notes ↗

🐍 ttnn_visualizer-0.89.0-py3-none-any.whl

🐍 pip install ttnn-visualizer

4 previous releases

v0.88.0 2026-06-03T20:23:29Z

v0.87.0 2026-05-27T17:30:12Z

v0.86.0 2026-05-20T18:34:19Z

v0.85.0 2026-05-13T20:31:18Z

See all releases on GitHub ↗

visualization profiling memory operations graphs

Works on

wormhole blackhole

tt-llk

official

C++ · Apache-2.0 · 52⭐ · Jun 5, 2025

Tenstorrent Low-Level Kernels: the C++ library that directly programs the RISC-V cores inside each Tensix compute engine. TRISC0 (unpack), TRISC1 (math/FPU/SFPU), and TRISC2 (pack) are all programmed through this layer — it is the interface between TT-Metal kernel code and bare silicon.

Links

📦 Repo 📝 Top-level architecture overview

tensix risc-v llk trisc brisc ncrisc low-level compute-engine

Works on

grayskull wormhole blackhole

tt-lang

official

Python · Apache-2.0 · 51⭐ ·

Python-based DSL that sits between TT-NN and TT-Metalium — expresses custom fused kernels with progressive disclosure, compiling directly to Tensix. Ships an integrated functional simulator (no hardware needed), line-by-line performance metrics, and AI-agent-friendly tooling. Two packages: tt-lang (compiler + hardware, requires ttnn) and tt-lang-sim (simulator only, works on Linux/macOS without Tenstorrent hardware).

Links

📦 Repo 🌐 Website 📖 Introduction to tt-lang

📋 Changelog

# Changelog

All notable changes to TT-Lang will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## Version 1.1.1

### Compiler

- Fix for live-interval boundary computation (issue [#536](../../issues/536))
- Fix for all-zero results in FP32 reductions (issue # [#533](../../issues/533))
- Fix for inferred `pop` and `push` (issues [#536](../../issues/536), [#554](../../issues/554))
- Fix for write pointer tracking on pipe sender accross iterations (issue [#578](../../issues/578))
- Fix to report data type mismatch error
- Fix to report DFB over allocation error (issue [#511](../../issues/511))
- Support for pipenet predicates `is_src`, `is_dst` and `is_active` (issue [#541](../../issues/541))
- Support for `ttl.math.typecast`

### Simulator

- Support for inferred `pop`, `push` and `copy`'s transfer handle `wait`
- Support for pipenet predicates `is_src`, `is_dst` and `is_active`
- Support `all_gather`
- Support `bfloat8_b`
- Improved/actionable error messages
- Improved performance by simulating math in FP32

### Infrastructure

- TT-Lang installable with `pip install tt-lang` for full installation and `pip install tt-lang-sim` for simulator only
- [Matmul benchmarks](benchmarks/matmul/README.md)

## Version 1.0.0

### Compiler

- Support `+=` syntax in conjunction with dot product (`@`) lowered to packer L1 accumulation
- Support implicit temporary compute-kernel-local DFBs
- Support `ttl.Pipenet`
- Support implicit `ttl.Block.push` and `ttl.Block.pop`
- Support implicit `ttl.Transfer.wait`
- Support for `expm1`, `exp2`, `ceil`, `sign`, `gelu`, `silu`, `hardsigmoid`, `square`, `softsign`, `signbit`, `frac`, `trunc` in `ttl.math`

### Simulator

- Support for `ttl.GroupTransfer`
- SPMD and mesh device simulation support
- Support for `ttnn.all_reduce` CCLs
- Use tracing to report statistics with `tt-lang-sim-stats`
- Remote L1 reads/writes statistics

### Examples and documentation
- Matmul tutorial

## Version 0.1.8

### Compiler

- Support for dot product operator (`@`) with lowering to [`ckernel::matmul_block`](https://docs.tenstorrent.com/tt-metal/v0.55.0/tt-metalium/tt_metal/apis/kernel_apis/compute/matmul_block.html)
- Support for fusing matmul and certain elementwise operations
- Support lowering to `pack_tile_block`
- Support for `ttl.math.fill`, `ttl.math.reduce_sum`, `ttl.math.reduce_max`, and `ttl.math.transpose`
- Support for arbitrary sub-blocking including dot product K-dimension to allow maximizing L1 usage and reuse
- Support for `sin`, `cos`, `tan`, `asin`, `acos`, `atan` in `ttl.math`
- Support for L1 sharded tensors
- Support for tensors with BF8 data type
- SPMD support (`ttnn.open_mesh_device`)

### Simulator

- Track L1 space and number of DFBs usage and warn when exceeded
- Support for tensors with row-major layout
- Support for L1 sharded tensors

### Examples and documentat

Install

🐍 pip install tt-lang 🐍 pip install tt-lang-sim

dsl python kernels tt-lang simulator kernel-fusion

Works on

wormhole blackhole ttsim

TT-Studio

official

TypeScript · Apache-2.0 · 48⭐ ·

Web-based GUI for deploying and chatting with AI models on Tenstorrent hardware. Handles all technical setup automatically — deploy models, run inference, and explore capabilities through a simple browser interface.

Links

📦 Repo

Releases

LATEST v2.6.0 2026-05-20T17:04:32Z Release notes ↗

4 previous releases

v2.5.0 2026-04-20T17:03:48Z

v2.4.1 2026-03-24T15:09:57Z

v2.1.0 2025-10-04T01:33:59Z

v2.0.1 2025-07-21T19:53:40Z

See all releases on GitHub ↗

web-ui gui models chat deployment

Works on

wormhole blackhole quietbox

WallaBMC

official

C · Apache-2.0 · 46⭐ ·

Lightweight BMC (Baseboard Management Controller) for STM32 and similar MCUs, with Web UI, Redfish API, and HTTPS support. Built on Zephyr RTOS. Used in Tenstorrent systems.

Links

📦 Repo

bmc stm32 redfish zephyr embedded

tt-umd

official

C++ · Apache-2.0 · 43⭐ ·

User-mode driver for Tenstorrent hardware. The userspace layer that sits between the kernel module and higher-level SDKs.

Links

📦 Repo

Releases

LATEST v0.9.6 2026-06-03T10:59:12Z Release notes ↗

🐍 tt_umd-0.9.6-cp310-cp310-manylinux_2_28_aarch64.whl 🐍 tt_umd-0.9.6-cp310-cp310-manylinux_2_28_x86_64.whl 🐍 tt_umd-0.9.6-cp311-cp311-manylinux_2_28_aarch64.whl 🐍 tt_umd-0.9.6-cp311-cp311-manylinux_2_28_x86_64.whl +12 more

4 previous releases

v0.9.5-dev.260424pre 2026-04-30T10:50:47Z

v0.9.4pre 2026-03-18T21:42:11Z

v0.9.3pre 2026-02-24T18:29:21Z

v0.9.1pre 2026-01-23T22:54:19Z

See all releases on GitHub ↗

📋 Changelog

# Changelog

## [0.9.5] - 2026-05-12

### Changed

Hardware hang detection for NOC and PCIe.
Tracy profiler integration with instrumentation across TLB, PCIe and sysmem paths.
DeviceProtocol ported to TTDevice, including DMA migration.
SocDescriptor split into static (SocArchDescriptor) and runtime parts.
LITERAL coordinate system in CoreCoord.
Multicast to all TENSIX cores.
SMN support.
SWEmuleChip software emulation chip and Quasar simulation support (incl. 4GB TLB).
Unified UmdException/UMD_ASSERT/UMD_THROW error handling across the codebase.

## [0.9.4] - 2026-03-18

### Changed

TopologyDiscoveryOptions refactoring.
TopologyDiscoveryOption to retrain ETH links on 6u.
TLBs for TTsim.
DRAM retrain support.
DeviceProtocol changes.
Simulator in TTDevice changes.
ETH heartbeat check.

## [0.9.3] - 2026-02-24

### Changed

Sigbus safe read write API.
Remove 4U related code.
Implement BH SPI as well, so full SPI support.
P150 expects harvested cores.
TT_VISIBLE_DEVICES uses logical IDs.

## [0.9.2] - 2026-02-09

### Changed

SPI interface for Wormhole.
PCI BDF based sorting and filtering.
Multicast PCI DMA.
Support Blackhole loudbox.
Many code fixes and test enhancements.

## [0.9.1] - 2026-01-23

### Changed

Started publishing to pypi.

## [0.9.0] - 2026-01-23

### Changed

Warm reset notification and callback implementation.

## [0.8.6] - 2026-01-20

### Changed

Make predicting ETH FW from CMFW optional in TopologyDiscovery.

## [0.8.4] - 2026-01-16

### Changed

Use older manylinux image

## [0.8.3] - 2026-01-15

### Changed

Reverted remote discovery issue

## [0.8.2] - 2026-01-15

### Changed

Support warm reset without secondary bus reset.
Expose subsystem vendor id.

## [0.8.1] - 2026-01-15

### Changed

Support dma functions on TTDevice layer

## [0.8.0] - 2026-01-14

### Changed

Many functional fixes and minor changes.
Final fixes needed for integration into tt-smi.
Also contains adjustments needed for integration into exalens.

## [0.7.0] - 2025-11-29

### Changed

Changed to a more generic arc_msg API.

## [0.6.0] - 2025-11-24

### Changed

Change the usage of TLBs such that KMD is in control of TLB allocation instead of UMD.
TLBs are now allocated using KMD's dedicated API.

## [0.5.3] - 2025-11-14

### Changed

Added generation of .deb and .rpm packages.
Added three separate packages (runtime, development and python).

## [0.5.1] - 2025-11-12

### Changed

Manylinux builds and Pypi test publishing.
Many smaller fixes and improvements.

## [0.4.0] - 2025-10-18

### Changed

Removed old type names.

## [0.3.0] - 2025-10-17

### Changed

Many smaller fixes and improvements.
TTsim support improvements.
JTAG support improvement.
Fixing CMake install path.
Further work on integrating new KMD TLBs.

## [0.2.0] - 2025-09-15

### Changed

A couple of smaller fixes and improvements, including L2CPU harvesting, fixes for new FW. Better TTSim support. Further JTAG support.
Introduced new soft reset API.
Introduced lite fabric initial version.

user-mode-driver umd hardware-interface

Works on

grayskull wormhole blackhole

tt-system-firmware

official

C · Apache-2.0 · 39⭐ ·

System firmware for Tenstorrent hardware. Low-level system initialization and control firmware that runs on-device.

Links

📦 Repo 🌐 Website

Releases

LATEST v19.11.0 2026-06-11T14:56:55Z Release notes ↗

📦 fw-pack-v19.11.0-recovery.tar.gz 📦 manufacturing-artifacts-v19.11.0.tar.gz 🐧 tt-firmware_19.11.0-ubuntu.1_all.deb ⬇ tt-system-firmware-v19.11.0.zip

4 previous releases

v19.11.0-rc1pre 2026-06-05T19:29:28Z

v19.10.0 2026-06-01T13:21:59Z

v19.10.0-rc2pre 2026-05-27T12:36:30Z

v19.10.0-rc1pre 2026-05-08T19:27:36Z

See all releases on GitHub ↗

firmware system embedded

Works on

wormhole blackhole

luwen

official

Rust · Apache-2.0 · 34⭐ ·

Tenstorrent system interface library written in Rust. Low-level Rust bindings for communicating with and managing TT hardware.

Links

📦 Repo

Releases

LATEST v0.8.5 2026-03-30T21:03:56Z Release notes ↗

🦀 cargo add luwen

4 previous releases

v0.8.4 2026-03-26T19:34:59Z

v0.8.3 2026-03-26T16:02:34Z

v0.8.2 2026-03-23T18:58:20Z

v0.8.1 2025-12-17T21:16:21Z

See all releases on GitHub ↗

rust system-interface low-level bindings

Works on

grayskull wormhole blackhole

tt-tvm

official

Python · Apache-2.0 · 31⭐ ·

TVM for Tenstorrent ASICs. Brings the Apache TVM compiler stack to Tenstorrent hardware, enabling model compilation from TensorFlow, PyTorch, ONNX, and more.

Links

📦 Repo

tvm compiler tensorflow onnx

Works on

grayskull wormhole blackhole

tensix-isa-simulator

official

C++ · Apache-2.0 · 29⭐ ·

ISA-level simulator for the Tensix compute engine. Simulates the matrix, vector, and scalar units inside each Tensix core.

Links

📦 Repo

tensix isa simulator compute-engine

Works on

ttsim

tt-torch

official

Python · Apache-2.0 · 25⭐ ·

Frontend integration for PyTorch with tt-mlir. Compile PyTorch models directly to Tenstorrent hardware via torch.compile integration.

Links

📦 Repo 🌐 Website

Releases

LATEST 0.4.0 2025-09-29T22:23:47Z Release notes ↗

🐍 tt_torch-0.4.0-cp311-cp311-linux_x86_64.whl

5 previous releases

0.5.0.dev20251008pre 2025-10-08T05:36:07Z

0.5.0.dev20251007pre 2025-10-07T04:22:29Z

0.5.0.dev20251006pre 2025-10-06T04:21:23Z

0.5.0.dev20251005pre 2025-10-05T04:38:19Z

0.5.0.dev20251004pre 2025-10-04T04:22:15Z

See all releases on GitHub ↗

pytorch torch-compile frontend

Works on

wormhole blackhole

tt-firmware

official

Apache-2.0 · 24⭐ ·

Tenstorrent firmware repository. Board management and control firmware for Tenstorrent accelerator cards.

Links

📦 Repo

Releases

LATEST v19.6.0 2026-02-20T16:53:34Z Release notes ↗

🐧 apt install tt-firmware

⚙ Requires PPA — setup instructions ↗

4 previous releases

v19.5.0 2026-02-04T18:22:15Z

v19.4.2 2026-01-05T23:32:14Z

v19.4.1 2025-12-19T17:06:37Z

v19.4.0 2025-12-16T05:38:23Z

See all releases on GitHub ↗

firmware bmc board-management

Works on

wormhole blackhole

tt-installer

official

Shell · Apache-2.0 · 23⭐ ·

Install the complete Tenstorrent software stack with one command. Handles drivers, firmware, Python environment, and SDK setup automatically.

Links

📦 Repo 📖 Modern Setup lesson (VSCode Toolkit)

Releases

LATEST v2.2.1 2026-03-16T18:54:29Z Release notes ↗

4 previous releases

v2.2.0 2026-03-10T19:52:29Z

v2.1.0 2026-01-14T19:34:46Z

v2.0.0 2025-12-05T20:38:41Z

v1.11.0 2025-12-02T20:02:43Z

See all releases on GitHub ↗

installation setup one-command getting-started

Works on

wormhole blackhole

tt-exalens

official

Python · Apache-2.0 · 21⭐ ·

Low-level hardware debugger for Tenstorrent devices. Inspect register state, memory contents, and kernel execution at the hardware level.

Links

📦 Repo

Releases

LATEST v0.3.23 2026-06-11T14:42:44Z Release notes ↗

🐍 tt_exalens-0.3.23-cp310-cp310-manylinux_2_34_aarch64.whl 🐍 tt_exalens-0.3.23-cp310-cp310-manylinux_2_34_x86_64.whl 🐍 tt_exalens-0.3.23-cp311-cp311-manylinux_2_34_aarch64.whl 🐍 tt_exalens-0.3.23-cp311-cp311-manylinux_2_34_x86_64.whl +4 more

🐍 pip install tt-exalens

4 previous releases

v0.3.22pre 2026-06-10T08:04:28Z

v0.3.21pre 2026-06-05T13:13:06Z

v0.3.20pre 2026-05-30T10:14:29Z

v0.3.19pre 2026-05-13T09:25:52Z

See all releases on GitHub ↗

debugger low-level hardware registers

Works on

wormhole blackhole

tt-topology

official

Python · Apache-2.0 · 16⭐ ·

Configure Ethernet routing on multi-card Tenstorrent systems. Flash NB cards to use specific ETH routing configurations for scale-out deployments.

Links

📦 Repo

Releases

LATEST v1.2.19 2026-02-26T21:14:41Z Release notes ↗

🐧 tt-topology_1.2.19_all-ubuntu-22.04.deb 🐧 tt-topology_1.2.19_all-ubuntu-24.04.deb 🐧 tt-topology_1.2.19_all-ubuntu-latest.deb 🐍 tt_topology-1.2.19-py3-none-any.whl

🐍 pip install tt-topology

4 previous releases

v1.2.18 2026-01-30T22:07:17Z

v1.2.17 2026-01-29T19:20:46Z

v1.2.16 2025-12-08T16:43:42Z

v1.2.15 2025-11-04T16:03:26Z

See all releases on GitHub ↗

📋 Changelog

# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## 1.2.11 - 17/06/2025

### Updated

- Updated mesh coord generation to be connection type agnostic
- Added failure and exit if mesh type detected, but not enough connections
- Added warning in README about lack of supoort for BH and 6U boards

## 1.2.10 - 05/06/2025

### Updated

- Bumped tt-tools-common version to fix driver version check for compatability with tt-kmd 2.0.0

## 1.2.9 - 30/05/2025

### Updated

- Bug fix for https://github.com/tenstorrent/tt-topology/issues/39. Now the tool will use a DFS longest path to determine a linear layout if its not a fully connected graph.
- Updated initial device detection - now it needs full noc access for octopus and list options

## 1.2.8 - 08/05/2025

### Updated

- Fixed issue where tool would fail when PCI interfaces don't start from ID 0
- Now using actual PCI interface IDs from devices instead of assuming sequential numbering

## 1.2.7 - 07/05/2025

### Updated

- Use tools-common 1.4.15
- Use type checking in octopus reset

## 1.2.6 - 05/05/2025

### Updated

- Bug fix: added "ignore-eth" flag to first chip detect to avoid eth training loops forever and truly detect pcie only chips
- Chore: bumped luwen

## 1.2.5 - 15/04/2025

### Updated

- When flashing to isolated mode, we now flash the WH ethernet ports to a disabled state,
  in order to prevent their use.

## 1.2.4 - 02/04/2025

### Updated

- You can now run `tt-topology -l isolated` to flash cards to the default (non-connected) state
- Users are now warned about missing or loose cables

## 1.2.3 - 21/03/2025

### Fixed

- Bumped luwen (0.6.2 -> 0.6.3) to include eth version check bug for TG setup

## 1.2.2 - 13/03/2025

### Fixed

- Bumped luwen version to make it more robust against eth fw updates

## 1.2.1 - 13/03/2025

### Fixed

- Moved the spi reads after the reset to increase stability during M3 L2R copy
- Bumped luwen version

## 1.2.0 - 06/03/2025

### Fixed

- Updated how local eth board info is calculated to make it agnostic to eth fw version
- bumped tt-tools-common version
- Added traceback printing when catching exceptions in main.

## 1.1.5 - 14/05/2024

### Updated

- Bumped luwen (0.3.8) and tt_tools_common (1.4.3) lib versions
- Removed unused python libraries

## 1.1.4 - 25/03/2024

### Fixed
- Changed detect_chips with detect_chips_with_callback to enable detailed debug info.

## 1.1.3 - 22/03/2024

### Fixed
- Bumped tt-tools-common version to avoid pip discrepancy.

## 1.1.2 - 22/03/2024

### Fixed
- Fixed command line bug when no args are provided.

## 1.1.1 - 21/03/2024

### Fixed
- Fixed reference to pyluwen lib

## 1.1.0 - 12/03/2024

### Added
- Octopus Configuration (4 n150s connected to 1 galaxy)


## 1.0.2 - 12/03/2024

### Fixed
- Dependency bug with tt_tools

topology ethernet multi-card routing

Works on

wormhole blackhole

tt-npe

official

C++ · Apache-2.0 · 14⭐ ·

Network-on-chip Performance Estimator for Tenstorrent Tensix-based devices. Model and estimate NoC utilization before running kernels on hardware.

Links

📦 Repo

noc performance estimator profiling

Works on

wormhole blackhole

tt-blacksmith

official

Python · Apache-2.0 · 13⭐ ·

Optimized training recipes for a variety of ML models on Tenstorrent hardware, powered by the TT-Forge compiler stack. Reference implementations for fine-tuning and training from scratch.

Links

📦 Repo 🌐 Website

training fine-tuning recipes pytorch

Works on

wormhole blackhole

tt-example-apps

official

Jupyter Notebook · Apache-2.0 · 13⭐ ·

End-to-end AI applications running on Tenstorrent AI accelerators. Complete application examples from retrieval-augmented generation to image generation pipelines.

Links

📦 Repo

rag applications end-to-end examples

Works on

wormhole blackhole

tt-flash

official

Python · Apache-2.0 · 13⭐ ·

Tenstorrent firmware update utility. Flash new firmware onto Tenstorrent accelerator cards from the command line.

Links

📦 Repo

Releases

LATEST v3.8.0 2026-06-01T18:04:27Z Release notes ↗

🐍 tt_flash-3.8.0-py3-none-any.whl

🐍 pip install tt-flash

4 previous releases

v3.7.0 2026-05-15T19:32:29Z

v3.6.5 2026-04-16T19:43:11Z

v3.6.4 2026-04-10T14:44:44Z

v3.6.3 2026-04-08T15:38:59Z

See all releases on GitHub ↗

📋 Changelog

# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## 3.4.0 - 30/07/25

- Bump pyyaml 6.0.1 -> 6.0.2
- Improve error message formatting
- No longer have to use --force for flashing BH cards

## 3.3.5 - 03/07/25

- Bump luwen 0.7.3 -> 0.7.5

## 3.3.4 - 02/07/25

- Bump tt-tools-common 1.4.16 -> 1.4.17
- Bump luwen 0.6.4 -> 0.7.3

## 3.3.3 - 05/06/2025

- Bumped tt-tools-common version to fix driver version check for compatability with tt-kmd 2.0.0

## 3.3.2 - 14/05/2025

- Bump tt-tools-common version to latest

## 3.2.0 - 12/03/2025

### Updated

- luwen version bump to bring inline with tt-smi; provides stability fixes

## 3.1.3 - 06/03/2025

### Added

- luwen version bump to include bh arc init checks

## 3.1.2 - 28/02/2025

### Added

- Support for more BH cards: p100a, p150, and p150c

## 3.1.1 - 06/01/2025

### Updated

- Bumped luwen version to accomodate Maturin updates

## 3.1.0 - 29/10/2024

### Added

- Support for flashing the BH tt-boot-fs file format
- Bumped luwen version to 0.4.6 to allow resets when chip is inaccessible

## 3.0.2 - 17/10/2024

### Fixed
- Unbound variable when exception is thrown when getting current fw-version

## 3.0.1 - 16/10/2024

### Changed
- Bumped luwen version to 0.4.5 to resolve false positives on bad chip detection

## 3.0.0 - 23/08/2024

- NO BREAKING CHANGES! Major version bump to signify new generation of product.
- Added support for p100

## 2.2.0 - 19/07/2024

### Updated
- Added support for an alternative spi flash configuration via a new version of luwen

## 2.0.8 - 14/05/2024

### Updated
- Bumped luwen (0.3.8) and tt_tools_common (1.4.3) lib versions

## 2.0.1 - 2.0.7
- Dependency updates

## 2.0.0
- WH flash release

## 1.0.0

- GS flash release

firmware-update flash utility

Works on

grayskull wormhole blackhole

tt-vscode-toolkit

official

TypeScript · Apache-2.0 · 7⭐ · Dec 18, 2025

48 interactive lessons covering the full Tenstorrent developer path — from hardware detection to custom training — with click-to-run commands and hardware auto-detection. Available in VSCode and code-server.

Links

📦 Repo 📖 All 48 lessons 📖 RISC-V Programming Guide

Releases

LATEST v0.0.465 2026-06-09T20:32:59Z Release notes ↗

4 previous releases

v0.0.454 2026-06-05T18:44:37Z

v0.0.453 2026-05-29T17:21:23Z

v0.0.447 2026-05-18T22:27:56Z

v0.0.438 2026-05-11T16:43:12Z

See all releases on GitHub ↗

📋 Changelog

# Changelog

All notable changes to the TT-VSCode-Toolkit will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

---

## [0.0.503] - 2026-06-12
### Fixed
- **Self-review fixes** — 29 issues confirmed by automated adversarial review: corrected MESH_DEVICE enum values in code-fence comments (T3000→T3K, n150→N150, Galaxy→GALAXY throughout vllm-production, image-generation, bounty-program, version-compatibility, step-zero, README); removed `<sup>™/</sup>` HTML injected into yaml/bash code fences (ct3-configuration-patterns, step-zero); fixed api-server print string back to `tt-metal ready`; corrected `p100`→`P100` in hardware-detection and QB_follows prose; completed T3K→T3000 normalization in ttsim/cookbook-particle-life prose callouts; fixed FAQ duplicate stale QB2 paragraph and TTNN/tt-metal prose table cells; fixed bare `tt-metal` prose in image-generation (→ TT-Metalium); updated link display text in cookbook-overview and tt-inference-server (github URL→ product name); clarified Vale config comments (ProductNames.yml T3000 exception, Terminology.yml link-text caveat).

## [0.0.502] - 2026-06-11
### Fixed
- **QB2 → TT-QuietBox 2 in llms.txt** — the LLM context file (consumed by the content website) had 11 prose `QB2` references; all replaced with `TT-QuietBox 2`; URL slugs (`qb2-*`) left untouched.

## [0.0.501] - 2026-06-11
### Fixed
- **QB2 → TT-QuietBox 2 prose normalization** — replaced all `QB2` shorthand in prose with the full `TT-QuietBox 2` product name across `ttsim-twenty-and-ten.md`, `cookbook-particle-life.md`, and `FAQ.md`; lesson title slugs (`qb2-*`) and command IDs left untouched.

## [0.0.500] - 2026-06-11

### Changed

- **Version bump** — increment to 0.0.500 after merging copyedit branch with origin/main; consolidates copyedit normalization (hardware IDs, TT-Metalium™/TT-NN™ trademarks, TT-QuietBox naming) with main's ttsim, AnimateDiff Phase 2.5, and mobile improvements.

---

## [0.0.477] - 2026-05-27

### Changed

- **Prose copyedit pass** — fixed TT-Forge<sup>™</sup> trademark placement in `tt-xla-jax.md`; updated `STYLE_GUIDE.md` hardware casing rules (`n150`/`n300`/`T3000`/`p300c`, capitalized `Galaxy`); normalized hardware IDs and `TTNN`→`TT-NN` in prose and sample output; renamed `TT Metal`→`TT-Metalium` in `tt-inference-server.md`. Extended `normalize-hardware-copy.js` and `normalize-ttnn-copy.js`; added `normalize-tt-metal-copy.js`. Polished `STYLE_GUIDE.md` trademark examples; fixed `normalize-open-source-copy.js` to skip inline code; added `plans/vscode-toolkit-copyedit-pr.md` PR summary.

---

## [0.0.476] - 2026-05-27

### Changed

- **TT-Metalium<sup>™</sup> and TT-NN<sup>™</sup> trademarks** — first prose mention per page now uses `TT-Metalium` and `TT-NN` (trademark, not registered). Updated `scripts/add-tt-product-trademarks.js` and `STYLE_GUIDE.md`; migrated pri

vscode lessons interactive getting-started code-server

Works on

wormhole blackhole quietbox ttsim

tt-toplike

official

Rust · Apache-2.0 · 2⭐ ·

A vibrant htop-style visualizer for Tenstorrent hardware written in Rust. Real-time process and utilization view for TT accelerators.

Links

📦 Repo 🌐 Website

Releases

LATEST v0.6.2 2026-06-08T19:42:59Z Release notes ↗

🐧 tt-toplike-app_0.6.2_amd64_jammy.deb 🐧 tt-toplike-app_0.6.2_amd64_noble.deb 🐧 tt-toplike_0.6.2_amd64_jammy.deb 🐧 tt-toplike_0.6.2_amd64_noble.deb

4 previous releases

v0.6.1 2026-06-02T23:46:12Z

v0.6.0 2026-05-26T23:49:21Z

v0.5.0 2026-04-29T17:39:43Z

v0.4.3 2026-04-25T20:02:37Z

See all releases on GitHub ↗

monitoring htop rust real-time

Works on

wormhole blackhole

tt-local-generator

official

Python · Apache-2.0 · 1⭐ ·

Generate infinite videos and images (and imaginative prompts to inspire them) on Tenstorrent's Quietbox 2. Fully local generative media pipeline.

Links

📦 Repo 🌐 Website

Releases

LATEST v0.7.4 2026-06-09T21:59:39Z Release notes ↗

🐧 tt-local-generator-models-all_0.7.4_all.deb 🐧 tt-local-generator_0.7.4_amd64.deb 🐧 tt-model-animate_0.7.4_all.deb 🐧 tt-model-flux_0.7.4_all.deb +5 more

4 previous releases

v0.3.4 2026-05-26T23:52:15Z

v0.3.3 2026-05-26T17:04:35Z

v0.2.6 2026-05-07T18:20:34Z

v0.2.2 2026-04-27T16:26:06Z

See all releases on GitHub ↗

video-generation image-generation quietbox generative

Works on

quietbox

tt-animatediff

official

Python · Apache-2.0 ·

Generates short, temporally coherent animated GIFs using the AnimateDiff model on Tenstorrent hardware. Phase 1 runs the correct SD 1.4 + MotionAdapter architecture on CPU; Phase 2 accelerates spatial denoising on Blackhole using the TTNN UNet. Produces vibrant 8-frame animations in ~15 s/frame on a P300C.

Links

📦 Repo 📖 Native Video Animation with AnimateDiff (VSCode Toolkit)

Releases

LATEST v0.6.0 2026-06-10T22:16:43Z Release notes ↗

1 previous release

v0.1.0 2026-06-04T22:31:14Z

See all releases on GitHub ↗

animatediff video-generation stable-diffusion diffusion gif blackhole

Works on

blackhole