Community · Open Source · Tenstorrent Ecosystem

A hidden dimension of Tenstorrent awesomeness

A curated directory of projects, tools, models, and research for Tenstorrent hardware — contributed by the community and our team. Browse by category or search across all entries.

128 Projects

12 Categories

Browse by category

🚀 Getting Started

The essential first steps — installer, core SDKs, and guided onboarding

tt-metal official

TT-NN operator library and TT-Metalium low-level kernel programming model. The primary SDK for devel…

tt-forge official

Tenstorrent's MLIR-based compiler frontend. Enables running AI workloads from PyTorch, ONNX, and oth…

tt-vscode-toolkit official

48 interactive lessons covering the full Tenstorrent developer path — from hardware detection to cus…

8 entries Browse →

🤖 AI & Models

Running, serving, and experimenting with AI models

TT Console official

Browser-based cloud console for exploring AI on Tenstorrent hardware. Run LLM inference, image and v…

tt-bio affiliated

Boltz-2 biomolecular model for drug discovery on Tenstorrent Blackhole. Supports single-card and mul…

koyeb/tenstorrent-examples community

Example applications and deployment configurations for running AI workloads on Tenstorrent hardware …

26 entries Browse →

🕵️ AI Agents

Agentic systems and AI assistants running on TT hardware

tt-example-apps official

End-to-end AI applications running on Tenstorrent AI accelerators. Complete application examples fro…

Local AI Agents on Tenstorrent affiliated

Three agentic projects running fully on-device: local AI agents on QuietBox 2, a coding assistant po…

dstack community

Vendor-agnostic orchestration for training, inference, and agentic workloads across NVIDIA, AMD, TPU…

4 entries Browse →

⚙️ Custom Kernels & Low-Level

Metalium/tt-lang kernel authoring; anything sub-compiler

tt-lang official

Python-based DSL that sits between TT-NN and TT-Metalium — expresses custom fused kernels with progr…

Tenstorrent Cookbook: Particle Life Simulator affiliated

Particle Life simulation on Tenstorrent hardware — an emergent-behavior N-body system where simple a…

tt-tiny community

Minimal Python code to access and program the Tenstorrent Blackhole chip directly — George Hotz's ex…

26 entries Browse →

🔨 Compilers & Frontends

Getting PyTorch/JAX/ONNX/CUDA models onto TT hardware

tt-mlir official

Tenstorrent MLIR compiler — the core compiler infrastructure shared by tt-forge and other frontends.…

tt-forge-compiletron affiliated

Compile more than 100 models on tt-forge in a display format suitable for demos. Comprehensive showc…

BarraCUDA community

Open-source CUDA compiler targeting multiple GPU architectures including Tenstorrent. Compiles .cu f…

16 entries Browse →

🛠 Dev Tools & Debugging

Profiling, visualization, and debugging workloads

ttsim official

Fast full-system simulator of Tenstorrent Wormhole and Blackhole hardware. Runs TT-Metalium workload…

tensix-viz affiliated

Hardware topology visualizer for Tenstorrent chips — from individual chip to full cluster. Interacti…

nvtop community

htop-style process monitor for GPUs and AI accelerators. Supports AMD, Apple, Huawei, Intel, NVIDIA,…

24 entries Browse →

🖥 Hardware & System

Drivers, firmware, monitoring, and hardware management

tt-kmd official

Tenstorrent kernel module driver. The Linux kernel module required to interface with Tenstorrent PCI…

tt-qb-lights affiliated

Sync your Tenstorrent Quietbox's RGB lighting to accelerator utilization status. Visual feedback for…

blackhole-py community

Pure Python driver for Tenstorrent Blackhole cards providing direct low-level hardware access withou…

23 entries Browse →

☁️ Cloud & Orchestration

Kubernetes, cloud deployment, and multi-node infrastructure

tt-inference-server official

Production-ready model serving for Tenstorrent hardware with OpenAI-compatible REST API. Supports co…

tt-topology official

Configure Ethernet routing on multi-card Tenstorrent systems. Flash NB cards to use specific ETH rou…

Cloud-Native Support official

Official documentation hub for running Tenstorrent accelerators on Kubernetes. Centers on tt-operato…

7 entries Browse →

🔩 RISC-V & Architecture

ISA, simulation, and running Linux on TT silicon

tt-bh-linux official

Linux demo for the Tenstorrent Blackhole P100/P150 card RISC-V cores. Boot a real Linux kernel on th…

CS Fundamentals on Tenstorrent Hardware affiliated

Seven-module computer science curriculum taught on real Tenstorrent hardware. Covers RISC-V architec…

tt-sim community

Community-built Tenstorrent architecture simulator written in Python. Runs without hardware — useful…

16 entries Browse →

🔬 Research & Papers

Academic papers, theses, and HPC experiments

tt-isa-documentation official

Low-level ISA and microarchitecture documentation for Tenstorrent AI architectures (Grayskull, Wormh…

polaris official

A high-level AI simulator from Tenstorrent for modeling and exploring AI accelerator and workload pe…

tt-tutorial (HPC) community

Tutorial on Tenstorrent hardware for HPC researchers from the RISC-V Testbed project at Edinburgh/EP…

18 entries Browse →

🎮 Games & Demos

Creative, playful, and proof-of-concept projects

tt-animatediff official

Generates short, temporally coherent animated GIFs using the AnimateDiff model on Tenstorrent hardwa…

tt-zork-and-more affiliated

A Tenstorrent fork of Infocom's Zork I (and more!), running a Z-machine interpreter at least four di…

TT-GoL community

Conway's Game of Life implemented on Tenstorrent hardware using TT-Metal kernels.

11 entries Browse →

📚 Guides, Tutorials & Education

Getting-started content, blog posts, lessons, courses

tt-installer official

Install the complete Tenstorrent software stack with one command. Handles drivers, firmware, Python …

Custom Model Training on Tenstorrent affiliated

Eight-lesson series covering the full custom training workflow on TT hardware: dataset fundamentals,…

Programming Tenstorrent Processors community

Deep-dive into the Tenstorrent architecture and Metalium programming model — circular buffers, kerne…

19 entries Browse →

Fresh from Planet Tenstorrent

Planet Tenstorrent is the ecosystem's live feed — new releases, articles, papers, talks, and community posts from across the Tenstorrent world, gathered in one place and updated daily. The latest five:

🏷 release affiliated Jul 20, 2026

tt-bio v0.3.2

moritztng/tt-bio

SaProt-1.3b was running on fabricated weights due to a config mismatch that strict=False happily hid—the checkpoint has 66 layers at 1280 hidden dim, not 40 layers at 2560—and now from_pretrained reads the real config and refuses to build on shape mismatches, bringing the model to near-parity (X_emb=0.99508, just shy of the 0.9987–0.9996 ESMC band due to bf16 accumulation over those 66 residual layers). Perf-gate false positives between machines of the same card type are fixed by adding machine-id (socket.gethostname()) under the card-type baseline block. The release adds multi-card data-parallel fanout for SaProt embeddings via --devices, seeds perf baselines for esmc-300m, esmc-6b, saprot-650m, and boltz2-affinity (activating their regression gates), hardens the Boltz-2 and Protenix-v2 flagship legs with 5+5 seeds and alignment-free metrics (TM-score, CA-lDDT), widens the Boltz-2 affinity leg to three targets with pose and pocket metrics, root-causes and fixes a FKBP12 affinity GAP from unseeded global RNG draws in the worker, and adds HSA (L585) as the first L300–800 pharma-realistic target—plus drops ProteinMPNN entirely (CPU-only, redundant with BoltzGen's inverse-fold path).

tt-bio View release ↗

🏷 release affiliated Jul 19, 2026

tt-bio v0.3.1

moritztng/tt-bio

This release brings two new capabilities that expand tt-bio's toolkit for protein design workflows: SaProt structure-aware embeddings and ProteinMPNN inverse folding for fixed-backbone design. SaProt fuses amino-acid and Foldseek-3Di vocabularies into an ESM-2 encoder, letting you embed proteins with awareness of their 3D structure; ProteinMPNN then lets you generate sequences that fold back to a given backbone—the natural next step after running a structure predictor like Boltz-2 or bringing your own backbone. Both features are fully gated and validated on Blackhole P150a hardware with no performance regression across the existing model lineup (Boltz-2, ESMFold2, Protenix-v2, OpenDDE, ESMC), embedding parity tests confirm numerical accuracy, and ProteinMPNN runs on CPU via data-parallel fanout rather than on-device porting. See docs/proteinmpnn-port.md for details on the design choice.

tt-bio View release ↗

🏷 release affiliated Jul 17, 2026

tt-bio v0.3.0

moritztng/tt-bio

This release brings antibody-antigen co-folding via OpenDDE (built on Protenix-v2 with a structural-token expander) alongside a fused-RoPE attention kernel that speeds up embedding without accuracy loss, plus optional diffusion trace replay that collapses per-step host dispatch overhead across Boltz-2, BoltzGen, Protenix-v2, and OpenDDE. The release also establishes perf and UX regression gates as standing test legs, fixing two release-blocking issues: OF3 pairformer gates now gracefully skip when missing host keys, and the perf gate now compares against hardware-specific baselines so P150a runs aren't falsely judged against P300c numbers. All shipped models pass gate thresholds with no performance regressions observed.

tt-bio View release ↗

🏷 release official Jul 16, 2026

tt-toplike v0.7.40

tenstorrent/tt-toplike

The release introduces HivemindSweeper, a new mode for system monitoring, alongside a shift to distributing packages via ppa.tenstorrent.com for easier installation.

tt-toplike View release ↗

🏷 release official Jul 16, 2026

tt-vscode-toolkit v0.1.19

tenstorrent/tt-vscode-toolkit

The VS Code toolkit now includes a comprehensive guide to joyfully monkeypatching TT-NN without modifying your precious tt-metal checkouts.

tt-vscode-toolkit View release ↗

🪐 Explore Planet Tenstorrent →

About this site · you found it

This site is two old traditions sharing one orbit: an awesome list and a planet. Open source is an ecosystem the way space is — moonshot projects igniting into galaxies of forks and stars, and planet sites keeping the whole universe in view. (Around here the metaphor is load-bearing: we ship hardware called Galaxy.) Both traditions are gifts from decades of that culture, and both deserve some tribute.

🕶 The awesome list

Humans curating links for other humans is the oldest genre on the web — Yahoo! began life in 1994 as "Jerry and David's Guide to the World Wide Web", and the volunteer-run DMOZ / Open Directory Project kept hand-sorted order for two decades. In 2014, Sindre Sorhus distilled that instinct into a GitHub-native microformat with sindresorhus/awesome: one README, a ruthless curation bar ("only awesome things"), and pull requests as the editorial process. The awesome manifesto turned list-making into a commons — thousands of lists, lists of lists, and giants like awesome-python, awesome-selfhosted, and awesome-go. Synth heads are gloriously covered too: awesome-musicdsp, awesome-audio-dsp, awesome-webaudio, and awesome-supercollider. Because the format is halfway to being a database, people have long rendered lists into websites — tt-awesome just commits to the bit: every entry is a JSON file, and the README, this site, data.json, and the feeds are all built from the same source.

🪐 The planet

In the early 2000s, Jeff Waugh and Scott James Remnant wrote Planet, a little Python feed aggregator that river-merged a community's blogs into one page — and free software communities never looked back. Planet GNOME, Planet KDE, Planet Debian, Planet Gentoo, Planet Ubuntu, Planet Fedora (the Red Hat family), and Planet Mozilla are all still ticking decades later, many having passed through Sam Ruby's Planet Venus rewrite along the way. The lineage runs deeper still: before planets there were blogrolls, and before blogrolls, web rings — WebRing was built in 1995 by a teenaged Sage Weil, who grew up to create Ceph, which is about as open-source-full-circle as a story gets. Planet Tenstorrent carries that torch for this ecosystem.

⚡ Why we bother

Curated commons like these only exist because communities keep them alive in the open. That's not nostalgia to us — it's the plan. Open source is our past, present, and future.

🚀 Getting Started

tt-metal official apt*apt* 1592⭐

TT-NN operator library and TT-Metalium low-level kernel programming model. The primary SDK…

tt-forge official 329⭐

Tenstorrent's MLIR-based compiler frontend. Enables running AI workloads from PyTorch, ONN…

tt-buda official 315⭐

TT-BUDA: Tenstorrent's original Python compiler and runtime for AI workloads. Legacy stack…

tt-mlir official 294⭐

Tenstorrent MLIR compiler — the core compiler infrastructure shared by tt-forge and other …

riscv-ocelot official 259⭐

The Berkeley Out-of-Order Machine with V-EXT (RISC-V Vector Extension) support. Tenstorren…

ttsim official 138⭐

Fast full-system simulator of Tenstorrent Wormhole and Blackhole hardware. Runs TT-Metaliu…

riscv_arch_tests official 125⭐

RISC-V architectural self-checking directed tests — randomly-generated register operands a…

tt-isa-documentation official 121⭐

Low-level ISA and microarchitecture documentation for Tenstorrent AI architectures (Graysk…

whisper official 92⭐

RISC-V Instruction Set Simulator (ISS) used by Tenstorrent for processor verification. Pow…

tt-xla official 73⭐

PJRT device plugin for Tenstorrent hardware. Enables JAX, PyTorch/XLA, and other XLA-based…

tt-kmd official apt* 70⭐

Tenstorrent kernel module driver. The Linux kernel module required to interface with Tenst…

RiESCUE official 69⭐

RISC-V Directed Test Framework and Compliance Suite. Comprehensive test infrastructure for…

tt-inference-server official 66⭐

Production-ready model serving for Tenstorrent hardware with OpenAI-compatible REST API. S…

tt-forge-onnx official 65⭐

ONNX graph compiler for Tenstorrent hardware. Optimizes and transforms ONNX model graphs f…

tt-buda-demos official 64⭐

Repository of model demos using TT-Buda. The largest collection of pre-compiled model exam…

tt-smi official pipapt* 62⭐

Tenstorrent System Management Interface — monitor device telemetry, issue board-level rese…

tt-bh-linux official 58⭐

Linux demo for the Tenstorrent Blackhole P100/P150 card RISC-V cores. Boot a real Linux ke…

tt-lang official pippip 55⭐

Python-based DSL that sits between TT-NN and TT-Metalium — expresses custom fused kernels …

tt-llk official 54⭐

Tenstorrent Low-Level Kernels: the C++ library that directly programs the RISC-V cores ins…

Jun 5, 2025

ttnn-visualizer official pip 53⭐

Comprehensive tool for visualizing and analyzing model execution on Tenstorrent hardware. …

WallaBMC official 50⭐

Lightweight BMC (Baseboard Management Controller) for STM32 and similar MCUs, with Web UI,…

TT-Studio official 49⭐

Web-based GUI for deploying and chatting with AI models on Tenstorrent hardware. Handles a…

tt-umd official 45⭐

User-mode driver for Tenstorrent hardware. The userspace layer that sits between the kerne…

tt-system-firmware official 41⭐

System firmware for Tenstorrent hardware. Low-level system initialization and control firm…

polaris official 39⭐

A high-level AI simulator from Tenstorrent for modeling and exploring AI accelerator and w…

luwen official cargoapt* 34⭐

Tenstorrent system interface library written in Rust. Low-level Rust bindings for communic…

tt-tvm official 31⭐

TVM for Tenstorrent ASICs. Brings the Apache TVM compiler stack to Tenstorrent hardware, e…

tensix-isa-simulator official 29⭐

ISA-level simulator for the Tensix compute engine. Simulates the matrix, vector, and scala…

tt-torch official 26⭐

Frontend integration for PyTorch with tt-mlir. Compile PyTorch models directly to Tenstorr…

tt-firmware official 24⭐

Tenstorrent firmware repository. Board management and control firmware for Tenstorrent acc…

tt-installer official 24⭐

Install the complete Tenstorrent software stack with one command. Handles drivers, firmwar…

tt-exalens official pip 21⭐

Low-level hardware debugger for Tenstorrent devices. Inspect register state, memory conten…

tt-blacksmith official 16⭐

Optimized training recipes for a variety of ML models on Tenstorrent hardware, powered by …

tt-topology official pipapt* 16⭐

Configure Ethernet routing on multi-card Tenstorrent systems. Flash NB cards to use specif…

tt-npe official 15⭐

Network-on-chip Performance Estimator for Tenstorrent Tensix-based devices. Model and esti…

tt-flash official pipapt* 14⭐

Tenstorrent firmware update utility. Flash new firmware onto Tenstorrent accelerator cards…

SFPI official apt* 14⭐

Tenstorrent SFPU programming interface — TT-enhanced RISC-V GCC and binutils plus header f…

tt-example-apps official 13⭐

End-to-end AI applications running on Tenstorrent AI accelerators. Complete application ex…

tt-forge-models official 12⭐

A shared repository of model implementations used across TT-Forge frontends — a single sou…

tt-perf-report official 11⭐

Performance report analysis tool for Tenstorrent Metal operations — analyzes perf traces t…

tt-vscode-toolkit official 7⭐

48 interactive lessons covering the full Tenstorrent developer path — from hardware detect…

Dec 18, 2025

tt-tools-common official pipapt* 7⭐

Shared helper library of common utilities used across Tenstorrent system tools such as tt-…

tt-toplike official apt*apt* 6⭐

A vibrant htop-style visualizer for Tenstorrent hardware written in Rust. Real-time proces…

tt-system-tools official apt* 5⭐

System setup and support utilities for Tenstorrent hardware — hugepages-setup configures t…

tt-emule official 4⭐

A C++ software emulator of the Tenstorrent device-level kernel and host APIs. Run tt-metal…

tt-local-generator official 3⭐

Generate infinite videos and images (and imaginative prompts to inspire them) on Tenstorre…

tt-burnin official pipapt* 3⭐

Command-line utility that runs a high power-consumption workload on Tenstorrent devices — …

tt-animatediff official

Generates short, temporally coherent animated GIFs using the AnimateDiff model on Tenstorr…

Cloud-Native Support official

Official documentation hub for running Tenstorrent accelerators on Kubernetes. Centers on …

TT Console official

Browser-based cloud console for exploring AI on Tenstorrent hardware. Run LLM inference, i…

tt-operator official

Kubernetes operator that automates installation and lifecycle management of the full softw…

TT-QuietBox 2 Guide official

Official setup and onboarding guide for the TT-QuietBox 2 — a compact, liquid-cooled AI wo…

ttsim-qemu official

Tenstorrent's fork of QEMU that provides the full-system emulation layer behind ttsim. Mod…

tt-bio affiliated 113⭐

Boltz-2 biomolecular model for drug discovery on Tenstorrent Blackhole. Supports single-ca…

· Jan 31, 2026

grayskull-attention affiliated 38⭐

FlashAttention-style attention kernel implemented entirely in on-chip SRAM on the Tenstorr…

tt-atom affiliated 24⭐

Meta's UMA interatomic potential running on Tenstorrent Blackhole — energy, forces, and st…

tt-lang-models affiliated 7⭐

A growing collection of models that use tt-lang for some or all of their implementation. R…

tt-zork-and-more affiliated 2⭐

A Tenstorrent fork of Infocom's Zork I (and more!), running a Z-machine interpreter at lea…

tt-qb-lights affiliated 2⭐

Sync your Tenstorrent Quietbox's RGB lighting to accelerator utilization status. Visual fe…

diamond affiliated 1⭐

DIAMOND: Atari game-playing agent implemented on Tenstorrent hardware via tt-lang. Diffusi…

gemma4 affiliated 1⭐

Gemma 4 language model implemented in tt-lang (e4b variant) for direct execution on Tensto…

open-oasis affiliated 1⭐

tt-lang inference script for Oasis 500M — an interactive video world model running on Tens…

tt-model-runner affiliated 1⭐

Discover, load, and benchmark models with a GUI and TUI for tt-inference-server. Makes exp…

tt-claw affiliated

A Tenstorrent-powered claw machine that rewards players with real prizes. The QuietBox 2 r…

Local AI Agents on Tenstorrent affiliated

Three agentic projects running fully on-device: local AI agents on QuietBox 2, a coding as…

dflash affiliated

DFlash: Block Diffusion for Flash Speculative Decoding on Tenstorrent hardware using tt-la…

Engram affiliated

A Tenstorrent port of the DeepSeek Engram model using tt-lang. Brings DeepSeek's memory-ef…

Stable Diffusion XL on Tenstorrent affiliated

On-device image generation with Stable Diffusion XL running entirely on Tenstorrent hardwa…

Video Generation on Tenstorrent affiliated

Three lesson-projects covering on-device video synthesis: frame-by-frame diffusion with tt…

tt-forge-compiletron affiliated

Compile more than 100 models on tt-forge in a display format suitable for demos. Comprehen…

Image Classification with TT-Forge affiliated

End-to-end image classification project using TT-Forge — compile and run a PyTorch classif…

tensix-viz affiliated

Hardware topology visualizer for Tenstorrent chips — from individual chip to full cluster.…

tt-warp affiliated

Warp terminal plugin for Tenstorrent — integrates hardware status, model management, and d…

Tensix Grid Playground affiliated

Interactive browser-based visualizer of the Tenstorrent Tensix grid architecture. Explore …

Tenstorrent Cookbook: Conway's Game of Life affiliated

TT-Metalium implementation of Conway's Game of Life as a cookbook recipe. Each generation …

Tenstorrent Cookbook: Particle Life Simulator affiliated

Particle Life simulation on Tenstorrent hardware — an emergent-behavior N-body system wher…

CS Fundamentals on Tenstorrent Hardware affiliated

Seven-module computer science curriculum taught on real Tenstorrent hardware. Covers RISC-…

Custom Model Training on Tenstorrent affiliated

Eight-lesson series covering the full custom training workflow on TT hardware: dataset fun…

Tenstorrent Cookbook: Core Recipes affiliated

Three hands-on TT-Metalium kernel recipes: a Mandelbrot fractal explorer, real-time audio …

nvtop community 10846⭐

htop-style process monitor for GPUs and AI accelerators. Supports AMD, Apple, Huawei, Inte…

dstack community 2185⭐

Vendor-agnostic orchestration for training, inference, and agentic workloads across NVIDIA…

BarraCUDA community 1717⭐

Open-source CUDA compiler targeting multiple GPU architectures including Tenstorrent. Comp…

tt-tiny community 68⭐

Minimal Python code to access and program the Tenstorrent Blackhole chip directly — George…

zyx community 61⭐

A complete ML library and compiler in Rust — "from assembly to neural networks" — with a n…

· Sep 25, 2022

tt-twitch community 29⭐

A Tenstorrent Grayskull kernel written live on Twitch by George Hotz. 120-core grid demons…

koyeb/tenstorrent-examples community 19⭐

Example applications and deployment configurations for running AI workloads on Tenstorrent…

blackhole-py community 16⭐

Pure Python driver for Tenstorrent Blackhole cards providing direct low-level hardware acc…

tenstorrent-tiny-examples community 14⭐

Simple C++ kernel experiments on a GraySkull e75 chip. Hands-on examples for learning the …

ttnn-helloworld-cpp community 14⭐

Minimal working example of using Tenstorrent TTNN in C++. The simplest possible starting p…

tt-sim community 13⭐

Community-built Tenstorrent architecture simulator written in Python. Runs without hardwar…

tt-iree community 12⭐

IREE (Intermediate Representation Execution Environment) ML compiler ported to Tenstorrent…

TT-GoL community 12⭐

Conway's Game of Life implemented on Tenstorrent hardware using TT-Metal kernels.

triton-tenstorrent community 11⭐

OpenAI Triton compiler plugin for Tenstorrent hardware. Write Triton kernels and target Te…

tenstorrent.nix community 8⭐

Nix flake packaging the Tenstorrent software stack for NixOS and Nix users. Reproducible, …

ttMandelbrot community 7⭐

Mandelbrot Set fractal renderer running on Tenstorrent hardware. A classic demo showcasing…

TT-Metal Mini Template community 7⭐

Minimal working CMake project template for starting a new TT-Metal project from scratch. G…

tt-tutorial (HPC) community 7⭐

Tutorial on Tenstorrent hardware for HPC researchers from the RISC-V Testbed project at Ed…

ttPEAK community 6⭐

clpeak-style peak-performance benchmark for Tenstorrent devices using TT-Metalium. Measure…

current community 5⭐

High-level parallel programming framework for Tenstorrent accelerators, abstracting TT-Met…

ttVecAdd community 5⭐

Minimal vector-addition example on Tenstorrent devices using TT-Metalium. A clean hello-wo…

bhx community 5⭐

Boot stock Linux cloud images on the SiFive X280 RISC-V cores inside Tenstorrent Blackhole…

ttas community 4⭐

ttas is a hacker-friendly assembler/disassembler for Tensix on Wormhole. It turns assembly…

tt-tutorial (Korean) community 4⭐

Comprehensive tutorials for the Tenstorrent software stack in Korean. Jupyter notebooks co…

Collective Operations on Wormhole n150 (Sapienza University of Rome) community 4⭐

Master's thesis implementing and benchmarking five allreduce algorithms (Swing, Recursive …

libtt-metal-cxx community 3⭐

Rust crate that exposes the TT-Metal host API through a C++ bridge via cxx.rs — covering d…

tetsuh/tt-metal-community-distro-matrix community 2⭐

A compatibility guardrail that continuously monitors whether [tt-metal](https://github.com…

libtt community 1⭐

A Bazel-built PJRT plugin (libtt.so) providing an XLA backend for Tenstorrent devices. Bun…

tt-splat — matrix-native 3D Gaussian Splatting on Blackhole community 1⭐

3D Gaussian Splatting rewritten to run on the matrix engine: a polynomial splat and order-…

ttPseudoRowMajor community 1⭐

A small TTNN-facing C++ library (ttprm) for running view-shaped tensor work without first …

gsplat_tt community

Port of Gaussian Splatting (3D scene reconstruction from 2D images) to Tenstorrent hardwar…

A Gentle Guide: Tenstorrent Card on Arch Linux with Metalium community

Step-by-step guide to getting a Tenstorrent card running on Arch Linux with the full Metal…

· Jul 7, 2024

Thoughts and Logs After Messing with Tenstorrent Grayskull community

Honest field notes from getting a Grayskull card running and writing first Metalium kernel…

· Jun 2, 2024

Programming Tenstorrent Processors community

Deep-dive into the Tenstorrent architecture and Metalium programming model — circular buff…

· Apr 21, 2025

Tenstorrent Architecture — W&M CSCI654 Advanced Computer Architecture community

Lecture 20 from William & Mary's graduate Computer Architecture course. Frames Tenstorrent…

· Oct 9, 2024

Tenstorrent SFPU Kernel Series — Jason Davies community

Sponsored series of deep technical articles on implementing optimal SFPU kernels for the T…

· Nov 12, 2025

tt-rqm-kernels community

Structured quaternion, rotor, and phase-aware tensor kernels — operations on 3D rotation a…

Attention in SRAM on Tenstorrent Grayskull community

A fused kernel for the Grayskull architecture implementing Transformer self-attention enti…

· Jul 18, 2024

Exploring Fast Fourier Transforms on the Tenstorrent Wormhole community

Ports the Cooley-Tukey FFT algorithm to the Wormhole n300 RISC-V accelerator. The Wormhole…

· Jun 18, 2025

Assessing Tenstorrent Grayskull RISC-V MatMul Acceleration for LLMs community

Evaluates the Tenstorrent Grayskull e75 RISC-V accelerator for matrix multiplication at re…

· May 9, 2025

Porting Strategies for Gravitational N-Body Simulations on Tenstorrent Wormhole community

Evaluates three strategies for scaling an N-body code across multiple Tenstorrent Wormhole…

· May 4, 2026

Accelerating Gravitational N-Body Simulations on Tenstorrent Wormhole community

Accelerates an astrophysical N-body simulation on the Wormhole n300. Achieves 2× speedup a…

Nov 16, 2025

Numerical Kernels on a Spatial Accelerator: Tenstorrent Wormhole community

Implements three numerical kernels and composes them into a conjugate gradient solver on W…

Mar 24, 2026

Accelerating Stencils on the Tenstorrent Grayskull RISC-V Accelerator community

Explores stencil computation on the Grayskull PCIe RISC-V accelerator. Early academic work…

Sep 27, 2024

Stencil Computations on Tenstorrent Wormhole community

Maps 2D 5-point stencil computations onto the Tenstorrent Wormhole RISC-V AI dataflow acce…

May 8, 2026

SwiftNPU: Scalable Shape-Flexible Allocation for Inter-Core Connected NPUs community

Makes multi-tenant NPU sharing practical for Blackhole-class hardware using polynomial-tim…

Apr 27, 2026

TileLoom: Automatic Dataflow Planning for Spatial Dataflow Accelerators community

Compiler system that automatically generates efficient dataflow plans for tile-based langu…

· Dec 17, 2025

Rewriting TTS Inference Economics: Lightning V2 on Tenstorrent vs. NVIDIA L40S community

Shows that Text-to-Speech inference on Tenstorrent Lightning V2 achieves 4× lower cost tha…

· Mar 24, 2026

Tenstorrent Blackhole Architecture Guide community

A 6,500-word community deep dive into the Blackhole p100a architecture: the tile model (Te…

· Feb 28, 2026

🏷 Recent Releases

42 releases

tt-bio affiliated v0.3.2

2026-07-20T03:01:53Z

tt-toplike official v0.7.40

2026-07-16T21:41:41Z

tt-vscode-toolkit official v0.1.19

2026-07-16T21:17:06Z

tt-burnin official v0.4.3

2026-07-16T15:26:18Z

dstack community 0.20.28

2026-07-16T14:58:47Z

tt-system-firmware official v19.12.0

2026-07-16T14:11:49Z

tt-exalens official v0.3.28

2026-07-16T12:05:50Z

ttnn-visualizer official v0.94.0

2026-07-15T19:52:17Z

tt-smi official v6.0.0

2026-07-15T18:49:03Z

SFPI official 7.67.0

2026-07-15T12:23:00Z

ttsim official v1.9.6

2026-07-14T12:29:39Z

BarraCUDA community v5.01

2026-07-14T08:02:45Z

tt-metal official v0.74.0

2026-07-14T01:37:12Z

tt-kmd official ttkmd-2.10.0

2026-07-13T20:07:36Z

tt-installer official v3.4.0

2026-07-13T20:06:51Z

tt-atom affiliated v0.2.0

2026-07-11T07:47:26Z

tt-local-generator official v0.11.0

2026-07-10T16:47:57Z

tt-inference-server official v0.18.0

2026-07-10T15:21:25Z

tt-umd official v0.9.9

2026-07-09T17:03:46Z

TT-Studio official v2.8.0

2026-06-30T22:36:25Z

luwen official bh-mod-v1.0.0

2026-06-30T20:39:53Z

tensix-viz affiliated v1.1.2

2026-06-29T17:42:46Z

tt-forge official 1.3.0

2026-06-29T11:18:12Z

tt-xla official 1.3.0

2026-06-29T11:09:10Z

tt-forge-onnx official 1.3.0

2026-06-29T07:46:11Z

tt-flash official v3.10.0

2026-06-23T19:59:38Z

tt-animatediff official v0.9.0

2026-06-22T19:56:31Z

ttas community v0.1.0

2026-05-28T07:08:35Z

whisper official 1.861

2026-05-11T15:44:36Z

tt-sim community v1.0

2026-05-11T13:07:42Z

tt-bh-linux official v0.11

2026-04-13T15:10:59Z

tt-topology official v1.2.19

2026-02-26T21:14:41Z

tt-firmware official v19.6.0

2026-02-20T16:53:34Z

nvtop community 3.3.2

2026-02-08T17:57:16Z

tt-tools-common official v1.6.0

2025-12-23T21:02:08Z

tt-system-tools official v1.4.1

2025-12-08T17:23:48Z

RiESCUE official v1.7.0

2025-12-03T19:29:44Z

tt-torch official 0.4.0

2025-09-29T22:23:47Z

polaris official pre_perfmodel_merge

2025-09-19T18:16:27Z

riscv_arch_tests official v0.2.0+aligned-access

2025-01-23T17:16:16Z

tt-buda official v0.19.3

2024-09-24T21:01:08Z

zyx community v0.14.0

2024-09-22T13:54:32Z

Select an entry to see details

tt-metal

official

C++ · Apache-2.0 · 1592⭐ ·

TT-NN operator library and TT-Metalium low-level kernel programming model. The primary SDK for developing on Tenstorrent hardware — from high-level tensor ops to bare-metal RISC-V kernels.

Links

📦 Repo 🌐 Website

Releases

LATEST v0.74.0 2026-07-14T01:37:12Z Release notes ↗

🐧 tt-metalium-dev_0.74.0.ubuntu22.04_amd64.deb 🐧 tt-metalium-dev_0.74.0.ubuntu24.04_amd64.deb 🐧 tt-metalium-examples_0.74.0.ubuntu22.04_amd64.deb 🐧 tt-metalium-examples_0.74.0.ubuntu24.04_amd64.deb +15 more

🐧 apt install tt-metalium

⚙ Requires PPA — setup instructions ↗

🐧 apt install tt-nn

⚙ Requires PPA — setup instructions ↗

5 previous releases

v0.76.0-dev20260720pre 2026-07-20T04:46:30Z

v0.76.0-dev20260719pre 2026-07-19T02:29:55Z

v0.76.0-dev20260718pre 2026-07-18T02:21:00Z

v0.75.0-rc1pre 2026-07-17T17:49:13Z

v0.75.0-dev20260717pre 2026-07-17T03:20:46Z

See all releases on GitHub ↗

metalium ttnn sdk kernels core

Works on

grayskull wormhole blackhole ttsim

tt-forge

official

Python · Apache-2.0 · 329⭐ ·

Tenstorrent's MLIR-based compiler frontend. Enables running AI workloads from PyTorch, ONNX, and other frameworks on all Tenstorrent hardware configurations through an open-source, general, and performant compiler.

Links

📦 Repo 🌐 Website

Releases

LATEST 1.3.0 2026-06-29T11:18:12Z Release notes ↗

5 previous releases

1.4.0.dev20260720022630 2026-07-20T03:12:38Z

1.4.0.dev20260719001637 2026-07-19T01:16:38Z

1.4.0.dev20260718001745 2026-07-18T01:24:25Z

1.4.0.dev20260717001916 2026-07-17T01:04:50Z

1.4.0.dev20260716001804 2026-07-16T01:03:19Z

See all releases on GitHub ↗

mlir compiler pytorch onnx frontend

Works on

wormhole blackhole ttsim

tt-buda

official

Python · Apache-2.0 · 315⭐ ·

TT-BUDA: Tenstorrent's original Python compiler and runtime for AI workloads. Legacy stack — tt-forge is the recommended successor, but tt-buda has the largest model demo library.

Links

📦 Repo

Releases

LATEST v0.19.3 2024-09-24T21:01:08Z Release notes ↗

⬇ pybuda-gs-v0.19.3-ubuntu-20-04-amd64-python3.8.zip ⬇ pybuda-gs-v0.19.3-ubuntu-22-04-amd64-python3.10.zip ⬇ pybuda-wh.b0-v0.19.3-ubuntu-20-04-amd64-python3.8.zip ⬇ pybuda-wh.b0-v0.19.3-ubuntu-22-04-amd64-python3.10.zip

4 previous releases

v0.18.2 2024-07-18T15:58:39Z

v0.17.0-alpha 2024-06-05T20:07:29Z

v0.15.0-alpha 2024-05-23T19:53:00Z

v0.12.3 2024-05-10T22:25:40Z

See all releases on GitHub ↗

legacy compiler pytorch buda

Works on

grayskull wormhole

tt-mlir

official

C++ · Apache-2.0 · 294⭐ ·

Tenstorrent MLIR compiler — the core compiler infrastructure shared by tt-forge and other frontends. Handles graph optimization, lowering, and code generation for Tensix hardware.

Links

📦 Repo 🌐 Website

Releases

5 releases

0.9.0.dev20260221pre 2026-02-21T04:31:50Z

0.9.0.dev20260220pre 2026-02-20T04:34:35Z

0.9.0.dev20260219pre 2026-02-19T04:37:24Z

0.9.0.dev20260218pre 2026-02-18T04:38:21Z

0.9.0.dev20260217pre 2026-02-17T04:37:09Z

See all releases on GitHub ↗

mlir compiler backend optimization

Works on

wormhole blackhole

riscv-ocelot

official ⑂ riscv-boom/riscv-boom

SystemVerilog · Apache-2.0 · 259⭐ ·

The Berkeley Out-of-Order Machine with V-EXT (RISC-V Vector Extension) support. Tenstorrent's research-grade out-of-order RISC-V core with vector extension.

Links

📦 Repo

risc-v out-of-order vector-extension processor-design

ttsim

official

C++ · Apache-2.0 · 138⭐ ·

Fast full-system simulator of Tenstorrent Wormhole and Blackhole hardware. Runs TT-Metalium workloads on any Linux/x86_64 system without physical silicon. Bit-exact results relative to hardware.

Links

📦 Repo 📖 Lesson

Releases

LATEST v1.9.6 2026-07-14T12:29:39Z Release notes ↗

4 previous releases

v1.9.5 2026-07-10T22:33:57Z

v1.9.4 2026-07-09T00:42:47Z

v1.9.3 2026-07-02T20:15:30Z

v1.9.2 2026-06-26T22:11:00Z

See all releases on GitHub ↗

simulator no-hardware bit-exact wormhole blackhole

Works on

ttsim

riscv_arch_tests

official

Assembly · Apache-2.0 · 125⭐ ·

RISC-V architectural self-checking directed tests — randomly-generated register operands and data with low-level OS code for test scheduling and self-checking, runnable on a RISC-V design or an ISS such as Whisper or Spike. Generated by an internal Tenstorrent tool from the official RISC-V ISA spec.

Links

📦 Repo

Releases

LATEST v0.2.0+aligned-access 2025-01-23T17:16:16Z Release notes ↗

⬇ release.tar.zip

2 previous releases

v0.2.0 2024-10-03T22:08:11Z

v0.1.1 2024-09-28T03:35:57Z

See all releases on GitHub ↗

riscv testing verification isa architecture

tt-isa-documentation

official

121⭐ ·

Low-level ISA and microarchitecture documentation for Tenstorrent AI architectures (Grayskull, Wormhole, Blackhole) — the authoritative hardware reference beneath the tt-forge / tt-metal software stack.

Links

📦 Repo

isa architecture documentation tensix low-level

Works on

grayskull wormhole blackhole

whisper

official ⑂ chipsalliance/VeeR-ISS

C++ · Apache-2.0 · 92⭐ ·

RISC-V Instruction Set Simulator (ISS) used by Tenstorrent for processor verification. Powers the co-simulation architecture checker.

Links

📦 Repo

Releases

LATEST 1.861 2026-05-11T15:44:36Z Release notes ↗

See all releases on GitHub ↗

risc-v iss simulator verification

tt-xla

official

Python · Apache-2.0 · 73⭐ ·

PJRT device plugin for Tenstorrent hardware. Enables JAX, PyTorch/XLA, and other XLA-based frameworks to target TT accelerators.

Links

📦 Repo 📖 JAX and PyTorch/XLA on Tenstorrent 🌐 Website

Releases

LATEST 1.3.0 2026-06-29T11:09:10Z Release notes ↗

5 previous releases

1.4.0.dev20260720022630 2026-07-20T03:02:46Z

1.4.0.dev20260719001637 2026-07-19T01:04:33Z

1.4.0.dev20260718001745 2026-07-18T00:51:56Z

1.4.0.dev20260717001916 2026-07-17T00:54:10Z

1.4.0.dev20260716001804 2026-07-16T00:53:44Z

See all releases on GitHub ↗

xla pjrt jax pytorch

Works on

wormhole blackhole

tt-kmd

official

C · GPL-2.0 · 70⭐ ·

Tenstorrent kernel module driver. The Linux kernel module required to interface with Tenstorrent PCIe accelerator cards.

Links

📦 Repo

Releases

LATEST ttkmd-2.10.0 2026-07-13T20:07:36Z Release notes ↗

⬇ tenstorrent-dkms-2.10.0-1.noarch.rpm 🐧 tenstorrent-dkms_2.10.0_all.deb

🐧 apt install tenstorrent-dkms

⚙ Requires PPA — setup instructions ↗

4 previous releases

ttkmd-2.10.0-rc2pre 2026-07-10T19:55:51Z

ttkmd-2.10.0-rc1pre 2026-07-01T21:08:05Z

ttkmd-2.9.99-testingpre 2026-06-12T17:39:45Z

ttkmd-2.9.0 2026-06-09T13:25:19Z

See all releases on GitHub ↗

kernel-module driver linux pcie

Works on

grayskull wormhole blackhole

RiESCUE

official

Python · Apache-2.0 · 69⭐ ·

RISC-V Directed Test Framework and Compliance Suite. Comprehensive test infrastructure for verifying RISC-V processor implementations against the specification.

Links

📦 Repo 🌐 Website

Releases

LATEST v1.7.0 2025-12-03T19:29:44Z Release notes ↗

4 previous releases

v1.5.0 2025-11-17T21:58:14Z

v1.3.0 2025-11-06T20:12:13Z

v1.1.2 2025-10-16T17:21:43Z

v0.2.5 2025-07-10T00:59:12Z

See all releases on GitHub ↗

risc-v testing compliance verification

tt-inference-server

official

Python · Apache-2.0 · 66⭐ ·

Production-ready model serving for Tenstorrent hardware with OpenAI-compatible REST API. Supports continuous batching, multiple models, and all TT hardware configurations.

Links

📦 Repo 📖 Production Inference lesson (VSCode Toolkit)

Releases

LATEST v0.18.0 2026-07-10T15:21:25Z Release notes ↗

⬇ v0.18.0-release_artifacts.zip

4 previous releases

v0.17.0 2026-06-26T19:29:31Z

v0.16.0 2026-06-12T18:21:42Z

v0.15.0 2026-05-29T15:55:11Z

v0.14.0 2026-05-15T22:34:02Z

See all releases on GitHub ↗

serving openai-compatible production rest-api

Works on

wormhole blackhole quietbox galaxy

tt-forge-onnx

official

Python · Apache-2.0 · 65⭐ ·

ONNX graph compiler for Tenstorrent hardware. Optimizes and transforms ONNX model graphs for efficient execution on Tensix accelerators. Used as a backend by tt-forge for ONNX model ingestion.

Links

📦 Repo

Releases

LATEST 1.3.0 2026-06-29T07:46:11Z Release notes ↗

5 previous releases

1.4.0.dev20260720031243 2026-07-20T03:45:13Z

1.4.0.dev20260719010816 2026-07-19T02:52:49Z

1.4.0.dev20260718010048 2026-07-18T01:50:44Z

1.4.0.dev20260717012151 2026-07-17T01:59:54Z

1.4.0.dev20260716005809 2026-07-16T01:27:40Z

See all releases on GitHub ↗

onnx compiler graph-optimization mlir

Works on

wormhole blackhole

tt-buda-demos

official

Python · Apache-2.0 · 64⭐ ·

Repository of model demos using TT-Buda. The largest collection of pre-compiled model examples for Tenstorrent hardware — BERT, ResNet, YOLO, GPT-2, Whisper, and many more.

Links

📦 Repo

demos models bert resnet yolo gpt2

Works on

grayskull wormhole

tt-smi

official

Python · Apache-2.0 · 62⭐ ·

Tenstorrent System Management Interface — monitor device telemetry, issue board-level resets, and inspect hardware health. The nvidia-smi equivalent for Tenstorrent hardware.

Links

📦 Repo

Releases

LATEST v6.0.0 2026-07-15T18:49:03Z Release notes ↗

🐍 tt_smi-6.0.0-py3-none-any.whl

🐍 pip install tt-smi

🐧 apt install tt-smi

⚙ Requires PPA — setup instructions ↗

4 previous releases

v5.3.1 2026-07-02T21:24:03Z

v5.3.0 2026-06-12T15:35:05Z

v5.2.0 2026-05-14T17:26:26Z

v5.1.1 2026-05-12T22:18:05Z

See all releases on GitHub ↗

📋 Changelog

# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## 3.0.26 - 29/07/25
- Added single tray galaxy reset option
- Bumped luwen from 0.7.5 -> 0.7.10
  - Chip detect now doesn't wait for eth to train for the 6U galaxy's, allowing multi tray resets to happen independently
- Updated readme with the new reset option

## 3.0.25 - 29/07/25
- Added packaging

## 3.0.24 - 04/07/25
- Now users have 2 galay reset modes available
  - glx_reset: resets the galaxy, informs users if there has been an eth failure
  - glx_reset_auto: resets the galaxy upto 3 times if eth failures are detected

## 3.0.23 - 03/07/25
- Bumped luwen 0.7.3 -> 0.7.5 to fix cargo lock compatibilty issue

## 3.0.22 - 02/07/25
- Bumped tt-tools-common 1.4.16 -> 1.4.17
- Bumped luwen 0.7.2 -> 0.7.3
- Bumped smi 3.0.21 -> 3.0.22

## 3.0.21 - 26/06/25

- Added option to not re-init chips after reset
- Updated galaxy 6u reset option from --ubb_reset to -glx_reset
- Removed the a3 arc message before doing a 6u reset, meaning we can reset even when chips are not pcie accessible
- Added eth link check and return failure if any of the eth links have a LINK_INACTIVE_FAIL_DUMMY_PACKET failure

## 3.0.20 - 04/06/25

- Chore - bumped tt-tools-common version to fix driver version check for compatability with tt-kmd 2.0.0

## 3.0.19 - 30/04/25

- Fixed an issue preventing the telemetry thread from being dispatched when the user clicked tab 2

## 3.0.18 - 22/05/25

- Added BH and WH UBB board type support
- Removed the dependency on tt-tools-common for this info

## 3.0.17 - 13/05/25

- Added proper telemetry heartbeat checks for Grayskull

## 3.0.16 - 12/05/25

- Used new ResetTypes from tools-common to simplify reset code
- Added a heartbeat spinner to the telemetry pane. We expect this spinner to update about twice per second. If the spinner is not moving, this indicates new telemetry is not being fetched.

## 3.0.15 - 24/04/25

- Patch for the ubb_reset to just discover local only post reset. Looks like eth port status 2 has been re-used to mean connected and pyluwen waits for it to clear, leading to eth timeout.

## 3.0.14 - 21/04/25

- Added wh ubb reset via command line `tt-smi --ubb_reset`. Intention is that this command line option will be removed and integrated into `tt-smi -r` after we update board detection with the correct external naming.
- Removed some unused imports and code - no functional changes

## 3.0.13 - 21/03/25

- Removed get\_sw\_versions

## 3.0.12 - 21/03/25

- Chore - bumped luwen version to include eth fw version check fix

## 3.0.11 - 13/03/25

- Chore - bumped luwen version to include enable chips with external connections but no routing

## 3.0.10 - 10/03/25

- Chore - bumped luwen version to include protoc lib detection check

## 3.0.9 - 07/03/25

- Chore - bumped luwen v

monitoring telemetry smi hardware-management

Works on

grayskull wormhole blackhole

tt-bh-linux

official★ featured

C · GPL-2.0 · 58⭐ ·

Linux demo for the Tenstorrent Blackhole P100/P150 card RISC-V cores. Boot a real Linux kernel on the 16 high-performance RISC-V cores built into the Blackhole chip.

Links

📦 Repo 🌐 Website

Releases

LATEST v0.11 2026-04-13T15:10:59Z Release notes ↗

⬇ tt-bh-disk-image.zip ⬇ tt-bh-linux.zip

4 previous releases

v0.10 2026-02-11T22:41:22Z

v0.9 2025-10-14T20:56:23Z

v0.5 2025-10-01T15:40:57Z

v0.4 2025-08-09T18:05:10Z

See all releases on GitHub ↗

linux risc-v blackhole bare-metal boot

Works on

blackhole

tt-lang

official

Python · Apache-2.0 · 55⭐ ·

Python-based DSL that sits between TT-NN and TT-Metalium — expresses custom fused kernels with progressive disclosure, compiling directly to Tensix. Ships an integrated functional simulator (no hardware needed), line-by-line performance metrics, and AI-agent-friendly tooling. Two packages: tt-lang (compiler + hardware, requires ttnn) and tt-lang-sim (simulator only, works on Linux/macOS without Tenstorrent hardware).

Links

📦 Repo 🌐 Website 📖 Introduction to tt-lang

📋 Changelog

# Changelog

All notable changes to TT-Lang will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## Version 1.1.1

### Compiler

- Fix for live-interval boundary computation (issue [#536](../../issues/536))
- Fix for all-zero results in FP32 reductions (issue # [#533](../../issues/533))
- Fix for inferred `pop` and `push` (issues [#536](../../issues/536), [#554](../../issues/554))
- Fix for write pointer tracking on pipe sender accross iterations (issue [#578](../../issues/578))
- Fix to report data type mismatch error
- Fix to report DFB over allocation error (issue [#511](../../issues/511))
- Support for pipenet predicates `is_src`, `is_dst` and `is_active` (issue [#541](../../issues/541))
- Support for `ttl.math.typecast`

### Simulator

- Support for inferred `pop`, `push` and `copy`'s transfer handle `wait`
- Support for pipenet predicates `is_src`, `is_dst` and `is_active`
- Support `all_gather`
- Support `bfloat8_b`
- Improved/actionable error messages
- Improved performance by simulating math in FP32

### Infrastructure

- TT-Lang installable with `pip install tt-lang` for full installation and `pip install tt-lang-sim` for simulator only
- [Matmul benchmarks](benchmarks/matmul/README.md)

## Version 1.0.0

### Compiler

- Support `+=` syntax in conjunction with dot product (`@`) lowered to packer L1 accumulation
- Support implicit temporary compute-kernel-local DFBs
- Support `ttl.Pipenet`
- Support implicit `ttl.Block.push` and `ttl.Block.pop`
- Support implicit `ttl.Transfer.wait`
- Support for `expm1`, `exp2`, `ceil`, `sign`, `gelu`, `silu`, `hardsigmoid`, `square`, `softsign`, `signbit`, `frac`, `trunc` in `ttl.math`

### Simulator

- Support for `ttl.GroupTransfer`
- SPMD and mesh device simulation support
- Support for `ttnn.all_reduce` CCLs
- Use tracing to report statistics with `tt-lang-sim-stats`
- Remote L1 reads/writes statistics

### Examples and documentation
- Matmul tutorial

## Version 0.1.8

### Compiler

- Support for dot product operator (`@`) with lowering to [`ckernel::matmul_block`](https://docs.tenstorrent.com/tt-metal/v0.55.0/tt-metalium/tt_metal/apis/kernel_apis/compute/matmul_block.html)
- Support for fusing matmul and certain elementwise operations
- Support lowering to `pack_tile_block`
- Support for `ttl.math.fill`, `ttl.math.reduce_sum`, `ttl.math.reduce_max`, and `ttl.math.transpose`
- Support for arbitrary sub-blocking including dot product K-dimension to allow maximizing L1 usage and reuse
- Support for `sin`, `cos`, `tan`, `asin`, `acos`, `atan` in `ttl.math`
- Support for L1 sharded tensors
- Support for tensors with BF8 data type
- SPMD support (`ttnn.open_mesh_device`)

### Simulator

- Track L1 space and number of DFBs usage and warn when exceeded
- Support for tensors with row-major layout
- Support for L1 sharded tensors

### Examples and documentat

Install

🐍 pip install tt-lang 🐍 pip install tt-lang-sim

dsl python kernels tt-lang simulator kernel-fusion

Works on

wormhole blackhole ttsim

tt-llk

official

C++ · Apache-2.0 · 54⭐ · Jun 5, 2025

Tenstorrent Low-Level Kernels: the C++ library that directly programs the RISC-V cores inside each Tensix compute engine. TRISC0 (unpack), TRISC1 (math/FPU/SFPU), and TRISC2 (pack) are all programmed through this layer — it is the interface between TT-Metal kernel code and bare silicon.

Links

📦 Repo 📝 Top-level architecture overview

tensix risc-v llk trisc brisc ncrisc low-level compute-engine

Works on

grayskull wormhole blackhole

ttnn-visualizer

official

TypeScript · Apache-2.0 · 53⭐ ·

Comprehensive tool for visualizing and analyzing model execution on Tenstorrent hardware. Interactive graphs, memory plots, tensor details, buffer overviews, operation flow graphs, and multi-instance support.

Links

📦 Repo

Releases

LATEST v0.94.0 2026-07-15T19:52:17Z Release notes ↗

🐍 ttnn_visualizer-0.94.0-py3-none-any.whl

🐍 pip install ttnn-visualizer

4 previous releases

v0.93.0 2026-07-08T16:25:58Z

v0.92.0 2026-07-01T19:39:55Z

v0.91.0 2026-06-24T19:23:11Z

v0.90.0 2026-06-17T21:19:45Z

See all releases on GitHub ↗

visualization profiling memory operations graphs

Works on

wormhole blackhole

WallaBMC

official

C · Apache-2.0 · 50⭐ ·

Lightweight BMC (Baseboard Management Controller) for STM32 and similar MCUs, with Web UI, Redfish API, and HTTPS support. Built on Zephyr RTOS. Used in Tenstorrent systems.

Links

📦 Repo

bmc stm32 redfish zephyr embedded

TT-Studio

official

TypeScript · Apache-2.0 · 49⭐ ·

Web-based GUI for deploying and chatting with AI models on Tenstorrent hardware. Handles all technical setup automatically — deploy models, run inference, and explore capabilities through a simple browser interface.

Links

📦 Repo

Releases

LATEST v2.8.0 2026-06-30T22:36:25Z Release notes ↗

4 previous releases

v2.7.0 2026-06-16T14:57:48Z

v2.6.0 2026-05-20T17:04:32Z

v2.5.0 2026-04-20T17:03:48Z

v2.4.1 2026-03-24T15:09:57Z

See all releases on GitHub ↗

web-ui gui models chat deployment

Works on

wormhole blackhole quietbox

tt-umd

official

C++ · Apache-2.0 · 45⭐ ·

User-mode driver for Tenstorrent hardware. The userspace layer that sits between the kernel module and higher-level SDKs.

Links

📦 Repo

Releases

LATEST v0.9.9 2026-07-09T17:03:46Z Release notes ↗

🐍 tt_umd-0.9.9-cp310-cp310-manylinux_2_28_aarch64.whl 🐍 tt_umd-0.9.9-cp310-cp310-manylinux_2_28_x86_64.whl 🐍 tt_umd-0.9.9-cp311-cp311-manylinux_2_28_aarch64.whl 🐍 tt_umd-0.9.9-cp311-cp311-manylinux_2_28_x86_64.whl +12 more

4 previous releases

v0.9.8pre 2026-07-02T08:57:30Z

v0.9.7pre 2026-06-26T12:53:11Z

v0.9.6pre 2026-06-03T10:59:12Z

v0.9.5-dev.260424pre 2026-04-30T10:50:47Z

See all releases on GitHub ↗

📋 Changelog

# Changelog

## [0.9.5] - 2026-05-12

### Changed

Hardware hang detection for NOC and PCIe.
Tracy profiler integration with instrumentation across TLB, PCIe and sysmem paths.
DeviceProtocol ported to TTDevice, including DMA migration.
SocDescriptor split into static (SocArchDescriptor) and runtime parts.
LITERAL coordinate system in CoreCoord.
Multicast to all TENSIX cores.
SMN support.
SWEmuleChip software emulation chip and Quasar simulation support (incl. 4GB TLB).
Unified UmdException/UMD_ASSERT/UMD_THROW error handling across the codebase.

## [0.9.4] - 2026-03-18

### Changed

TopologyDiscoveryOptions refactoring.
TopologyDiscoveryOption to retrain ETH links on 6u.
TLBs for TTsim.
DRAM retrain support.
DeviceProtocol changes.
Simulator in TTDevice changes.
ETH heartbeat check.

## [0.9.3] - 2026-02-24

### Changed

Sigbus safe read write API.
Remove 4U related code.
Implement BH SPI as well, so full SPI support.
P150 expects harvested cores.
TT_VISIBLE_DEVICES uses logical IDs.

## [0.9.2] - 2026-02-09

### Changed

SPI interface for Wormhole.
PCI BDF based sorting and filtering.
Multicast PCI DMA.
Support Blackhole loudbox.
Many code fixes and test enhancements.

## [0.9.1] - 2026-01-23

### Changed

Started publishing to pypi.

## [0.9.0] - 2026-01-23

### Changed

Warm reset notification and callback implementation.

## [0.8.6] - 2026-01-20

### Changed

Make predicting ETH FW from CMFW optional in TopologyDiscovery.

## [0.8.4] - 2026-01-16

### Changed

Use older manylinux image

## [0.8.3] - 2026-01-15

### Changed

Reverted remote discovery issue

## [0.8.2] - 2026-01-15

### Changed

Support warm reset without secondary bus reset.
Expose subsystem vendor id.

## [0.8.1] - 2026-01-15

### Changed

Support dma functions on TTDevice layer

## [0.8.0] - 2026-01-14

### Changed

Many functional fixes and minor changes.
Final fixes needed for integration into tt-smi.
Also contains adjustments needed for integration into exalens.

## [0.7.0] - 2025-11-29

### Changed

Changed to a more generic arc_msg API.

## [0.6.0] - 2025-11-24

### Changed

Change the usage of TLBs such that KMD is in control of TLB allocation instead of UMD.
TLBs are now allocated using KMD's dedicated API.

## [0.5.3] - 2025-11-14

### Changed

Added generation of .deb and .rpm packages.
Added three separate packages (runtime, development and python).

## [0.5.1] - 2025-11-12

### Changed

Manylinux builds and Pypi test publishing.
Many smaller fixes and improvements.

## [0.4.0] - 2025-10-18

### Changed

Removed old type names.

## [0.3.0] - 2025-10-17

### Changed

Many smaller fixes and improvements.
TTsim support improvements.
JTAG support improvement.
Fixing CMake install path.
Further work on integrating new KMD TLBs.

## [0.2.0] - 2025-09-15

### Changed

A couple of smaller fixes and improvements, including L2CPU harvesting, fixes for new FW. Better TTSim support. Further JTAG support.
Introduced new soft reset API.
Introduced lite fabric initial version.

user-mode-driver umd hardware-interface

Works on

grayskull wormhole blackhole

tt-system-firmware

official

C · Apache-2.0 · 41⭐ ·

System firmware for Tenstorrent hardware. Low-level system initialization and control firmware that runs on-device.

Links

📦 Repo 🌐 Website

Releases

LATEST v19.12.0 2026-07-16T14:11:49Z Release notes ↗

📦 fw-pack-v19.12.0-recovery.tar.gz 📦 manufacturing-artifacts-v19.12.0.tar.gz 🐧 tt-firmware_19.12.0-ubuntu.1_all.deb ⬇ tt-system-firmware-v19.12.0.zip

4 previous releases

v19.12.0-rc3pre 2026-07-14T19:12:20Z

v19.12.0-rc2pre 2026-07-10T20:43:32Z

v19.12.0-rc1pre 2026-06-26T23:40:38Z

v19.11.0 2026-06-11T14:56:55Z

See all releases on GitHub ↗

firmware system embedded

Works on

wormhole blackhole

polaris

official

Python · Apache-2.0 · 39⭐ ·

A high-level AI simulator from Tenstorrent for modeling and exploring AI accelerator and workload performance.

Links

📦 Repo

Releases

LATEST pre_perfmodel_merge 2025-09-19T18:16:27Z Release notes ↗

See all releases on GitHub ↗

simulator performance modeling architecture

luwen

official

Rust · Apache-2.0 · 34⭐ ·

Tenstorrent system interface library written in Rust. Low-level Rust bindings for communicating with and managing TT hardware.

Links

📦 Repo

Releases

LATEST bh-mod-v1.0.0 2026-06-30T20:39:53Z Release notes ↗

🦀 cargo add luwen

🐧 apt install python3-pyluwen

⚙ Requires PPA — setup instructions ↗

4 previous releases

v0.8.5 2026-03-30T21:03:56Z

v0.8.4 2026-03-26T19:34:59Z

v0.8.3 2026-03-26T16:02:34Z

v0.8.2 2026-03-23T18:58:20Z

See all releases on GitHub ↗

rust system-interface low-level bindings

Works on

grayskull wormhole blackhole

tt-tvm

official

Python · Apache-2.0 · 31⭐ ·

TVM for Tenstorrent ASICs. Brings the Apache TVM compiler stack to Tenstorrent hardware, enabling model compilation from TensorFlow, PyTorch, ONNX, and more.

Links

📦 Repo

tvm compiler tensorflow onnx

Works on

grayskull wormhole blackhole

tensix-isa-simulator

official

C++ · Apache-2.0 · 29⭐ ·

ISA-level simulator for the Tensix compute engine. Simulates the matrix, vector, and scalar units inside each Tensix core.

Links

📦 Repo

tensix isa simulator compute-engine

Works on

ttsim

tt-torch

official

Python · Apache-2.0 · 26⭐ ·

Frontend integration for PyTorch with tt-mlir. Compile PyTorch models directly to Tenstorrent hardware via torch.compile integration.

Links

📦 Repo 🌐 Website

Releases

LATEST 0.4.0 2025-09-29T22:23:47Z Release notes ↗

🐍 tt_torch-0.4.0-cp311-cp311-linux_x86_64.whl

5 previous releases

0.5.0.dev20251008pre 2025-10-08T05:36:07Z

0.5.0.dev20251007pre 2025-10-07T04:22:29Z

0.5.0.dev20251006pre 2025-10-06T04:21:23Z

0.5.0.dev20251005pre 2025-10-05T04:38:19Z

0.5.0.dev20251004pre 2025-10-04T04:22:15Z

See all releases on GitHub ↗

pytorch torch-compile frontend

Works on

wormhole blackhole

tt-firmware

official

Apache-2.0 · 24⭐ ·

Tenstorrent firmware repository. Board management and control firmware for Tenstorrent accelerator cards.

Links

📦 Repo

Releases

LATEST v19.6.0 2026-02-20T16:53:34Z Release notes ↗

4 previous releases

v19.5.0 2026-02-04T18:22:15Z

v19.4.2 2026-01-05T23:32:14Z

v19.4.1 2025-12-19T17:06:37Z

v19.4.0 2025-12-16T05:38:23Z

See all releases on GitHub ↗

firmware bmc board-management

Works on

wormhole blackhole

tt-installer

official

Shell · Apache-2.0 · 24⭐ ·

Install the complete Tenstorrent software stack with one command. Handles drivers, firmware, Python environment, and SDK setup automatically.

Links

📦 Repo 📖 Modern Setup lesson (VSCode Toolkit)

Releases

LATEST v3.4.0 2026-07-13T20:06:51Z Release notes ↗

4 previous releases

v3.3.0 2026-07-02T18:16:53Z

v3.2.0 2026-06-29T19:03:11Z

v3.1.0 2026-06-26T18:26:33Z

v3.0.0pre 2026-06-26T16:39:33Z

See all releases on GitHub ↗

installation setup one-command getting-started

Works on

wormhole blackhole

tt-exalens

official

Python · Apache-2.0 · 21⭐ ·

Low-level hardware debugger for Tenstorrent devices. Inspect register state, memory contents, and kernel execution at the hardware level.

Links

📦 Repo

Releases

LATEST v0.3.28 2026-07-16T12:05:50Z Release notes ↗

🐍 tt_exalens-0.3.28-cp310-cp310-manylinux_2_34_aarch64.whl 🐍 tt_exalens-0.3.28-cp310-cp310-manylinux_2_34_x86_64.whl 🐍 tt_exalens-0.3.28-cp311-cp311-manylinux_2_34_aarch64.whl 🐍 tt_exalens-0.3.28-cp311-cp311-manylinux_2_34_x86_64.whl +4 more

🐍 pip install tt-exalens

4 previous releases

v0.3.27pre 2026-07-14T09:36:00Z

v0.3.26pre 2026-07-13T09:11:09Z

v0.3.25pre 2026-07-02T12:02:46Z

v0.3.24pre 2026-06-19T14:17:19Z

See all releases on GitHub ↗

debugger low-level hardware registers

Works on

wormhole blackhole

tt-blacksmith

official

Python · Apache-2.0 · 16⭐ ·

Optimized training recipes for a variety of ML models on Tenstorrent hardware, powered by the TT-Forge compiler stack. Reference implementations for fine-tuning and training from scratch.

Links

📦 Repo 🌐 Website

training fine-tuning recipes pytorch

Works on

wormhole blackhole

tt-topology

official

Python · Apache-2.0 · 16⭐ ·

Configure Ethernet routing on multi-card Tenstorrent systems. Flash NB cards to use specific ETH routing configurations for scale-out deployments.

Links

📦 Repo

Releases

LATEST v1.2.19 2026-02-26T21:14:41Z Release notes ↗

🐧 tt-topology_1.2.19_all-ubuntu-22.04.deb 🐧 tt-topology_1.2.19_all-ubuntu-24.04.deb 🐧 tt-topology_1.2.19_all-ubuntu-latest.deb 🐍 tt_topology-1.2.19-py3-none-any.whl

🐍 pip install tt-topology

🐧 apt install tt-topology

⚙ Requires PPA — setup instructions ↗

4 previous releases

v1.2.18 2026-01-30T22:07:17Z

v1.2.17 2026-01-29T19:20:46Z

v1.2.16 2025-12-08T16:43:42Z

v1.2.15 2025-11-04T16:03:26Z

See all releases on GitHub ↗

📋 Changelog

# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## 1.2.11 - 17/06/2025

### Updated

- Updated mesh coord generation to be connection type agnostic
- Added failure and exit if mesh type detected, but not enough connections
- Added warning in README about lack of supoort for BH and 6U boards

## 1.2.10 - 05/06/2025

### Updated

- Bumped tt-tools-common version to fix driver version check for compatability with tt-kmd 2.0.0

## 1.2.9 - 30/05/2025

### Updated

- Bug fix for https://github.com/tenstorrent/tt-topology/issues/39. Now the tool will use a DFS longest path to determine a linear layout if its not a fully connected graph.
- Updated initial device detection - now it needs full noc access for octopus and list options

## 1.2.8 - 08/05/2025

### Updated

- Fixed issue where tool would fail when PCI interfaces don't start from ID 0
- Now using actual PCI interface IDs from devices instead of assuming sequential numbering

## 1.2.7 - 07/05/2025

### Updated

- Use tools-common 1.4.15
- Use type checking in octopus reset

## 1.2.6 - 05/05/2025

### Updated

- Bug fix: added "ignore-eth" flag to first chip detect to avoid eth training loops forever and truly detect pcie only chips
- Chore: bumped luwen

## 1.2.5 - 15/04/2025

### Updated

- When flashing to isolated mode, we now flash the WH ethernet ports to a disabled state,
  in order to prevent their use.

## 1.2.4 - 02/04/2025

### Updated

- You can now run `tt-topology -l isolated` to flash cards to the default (non-connected) state
- Users are now warned about missing or loose cables

## 1.2.3 - 21/03/2025

### Fixed

- Bumped luwen (0.6.2 -> 0.6.3) to include eth version check bug for TG setup

## 1.2.2 - 13/03/2025

### Fixed

- Bumped luwen version to make it more robust against eth fw updates

## 1.2.1 - 13/03/2025

### Fixed

- Moved the spi reads after the reset to increase stability during M3 L2R copy
- Bumped luwen version

## 1.2.0 - 06/03/2025

### Fixed

- Updated how local eth board info is calculated to make it agnostic to eth fw version
- bumped tt-tools-common version
- Added traceback printing when catching exceptions in main.

## 1.1.5 - 14/05/2024

### Updated

- Bumped luwen (0.3.8) and tt_tools_common (1.4.3) lib versions
- Removed unused python libraries

## 1.1.4 - 25/03/2024

### Fixed
- Changed detect_chips with detect_chips_with_callback to enable detailed debug info.

## 1.1.3 - 22/03/2024

### Fixed
- Bumped tt-tools-common version to avoid pip discrepancy.

## 1.1.2 - 22/03/2024

### Fixed
- Fixed command line bug when no args are provided.

## 1.1.1 - 21/03/2024

### Fixed
- Fixed reference to pyluwen lib

## 1.1.0 - 12/03/2024

### Added
- Octopus Configuration (4 n150s connected to 1 galaxy)


## 1.0.2 - 12/03/2024

### Fixed
- Dependency bug with tt_tools

topology ethernet multi-card routing

Works on

wormhole blackhole

tt-npe

official

C++ · Apache-2.0 · 15⭐ ·

Network-on-chip Performance Estimator for Tenstorrent Tensix-based devices. Model and estimate NoC utilization before running kernels on hardware.

Links

📦 Repo

noc performance estimator profiling

Works on

wormhole blackhole

tt-flash

official

Python · Apache-2.0 · 14⭐ ·

Tenstorrent firmware update utility. Flash new firmware onto Tenstorrent accelerator cards from the command line.

Links

📦 Repo

Releases

LATEST v3.10.0 2026-06-23T19:59:38Z Release notes ↗

🐍 tt_flash-3.10.0-py3-none-any.whl

🐍 pip install tt-flash

🐧 apt install tt-flash

⚙ Requires PPA — setup instructions ↗

4 previous releases

v3.9.0 2026-06-17T07:52:40Z

v3.8.0 2026-06-01T18:04:27Z

v3.7.0 2026-05-15T19:32:29Z

v3.6.5 2026-04-16T19:43:11Z

See all releases on GitHub ↗

📋 Changelog

# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## 3.4.0 - 30/07/25

- Bump pyyaml 6.0.1 -> 6.0.2
- Improve error message formatting
- No longer have to use --force for flashing BH cards

## 3.3.5 - 03/07/25

- Bump luwen 0.7.3 -> 0.7.5

## 3.3.4 - 02/07/25

- Bump tt-tools-common 1.4.16 -> 1.4.17
- Bump luwen 0.6.4 -> 0.7.3

## 3.3.3 - 05/06/2025

- Bumped tt-tools-common version to fix driver version check for compatability with tt-kmd 2.0.0

## 3.3.2 - 14/05/2025

- Bump tt-tools-common version to latest

## 3.2.0 - 12/03/2025

### Updated

- luwen version bump to bring inline with tt-smi; provides stability fixes

## 3.1.3 - 06/03/2025

### Added

- luwen version bump to include bh arc init checks

## 3.1.2 - 28/02/2025

### Added

- Support for more BH cards: p100a, p150, and p150c

## 3.1.1 - 06/01/2025

### Updated

- Bumped luwen version to accomodate Maturin updates

## 3.1.0 - 29/10/2024

### Added

- Support for flashing the BH tt-boot-fs file format
- Bumped luwen version to 0.4.6 to allow resets when chip is inaccessible

## 3.0.2 - 17/10/2024

### Fixed
- Unbound variable when exception is thrown when getting current fw-version

## 3.0.1 - 16/10/2024

### Changed
- Bumped luwen version to 0.4.5 to resolve false positives on bad chip detection

## 3.0.0 - 23/08/2024

- NO BREAKING CHANGES! Major version bump to signify new generation of product.
- Added support for p100

## 2.2.0 - 19/07/2024

### Updated
- Added support for an alternative spi flash configuration via a new version of luwen

## 2.0.8 - 14/05/2024

### Updated
- Bumped luwen (0.3.8) and tt_tools_common (1.4.3) lib versions

## 2.0.1 - 2.0.7
- Dependency updates

## 2.0.0
- WH flash release

## 1.0.0

- GS flash release

firmware-update flash utility

Works on

grayskull wormhole blackhole

SFPI

official

C++ · Apache-2.0 · 14⭐ ·

Tenstorrent SFPU programming interface — TT-enhanced RISC-V GCC and binutils plus header files for programming the Tensix SFPU (vector engine) from kernel code. The compiler toolchain underneath TT-Metalium's SFPU ops.

Links

📦 Repo

Releases

LATEST 7.67.0 2026-07-15T12:23:00Z Release notes ↗

🐧 sfpi_7.67.0_aarch64_debian.deb 🐧 sfpi_7.67.0_x86_64_debian.deb ⬇ sfpi_7.67.0_x86_64_fedora.rpm

🐧 apt install sfpi

⚙ Requires PPA — setup instructions ↗

4 previous releases

7.67.0-strength-49763 2026-07-16T15:51:33Z

7.67.0-combine-49822 2026-07-15T15:25:44Z

7.66.0 2026-07-07T19:10:40Z

7.65.0 2026-07-02T10:35:51Z

See all releases on GitHub ↗

sfpu compiler-toolchain gcc riscv

Works on

wormhole blackhole

tt-example-apps

official

Jupyter Notebook · Apache-2.0 · 13⭐ ·

End-to-end AI applications running on Tenstorrent AI accelerators. Complete application examples from retrieval-augmented generation to image generation pipelines.

Links

📦 Repo

rag applications end-to-end examples

Works on

wormhole blackhole

tt-forge-models

official

Python · Apache-2.0 · 12⭐ ·

A shared repository of model implementations used across TT-Forge frontends — a single source of truth for the models used in testing and benchmarking, rather than duplicating them across frontend repos.

Links

📦 Repo

tt-forge models benchmarking testing inference

tt-perf-report

official

Python · Apache-2.0 · 11⭐ ·

Performance report analysis tool for Tenstorrent Metal operations — analyzes perf traces to surface throughput, bottlenecks, and optimization opportunities.

Links

📦 Repo

performance profiling tt-metal analysis optimization

tt-vscode-toolkit

official

TypeScript · Apache-2.0 · 7⭐ · Dec 18, 2025

48 interactive lessons covering the full Tenstorrent developer path — from hardware detection to custom training — with click-to-run commands and hardware auto-detection. Available in VSCode and code-server.

Links

📦 Repo 📖 All 48 lessons 📖 RISC-V Programming Guide

Releases

LATEST v0.1.19 2026-07-16T21:17:06Z Release notes ↗

4 previous releases

v0.1.17 2026-07-13T15:57:03Z

v0.0.518 2026-06-30T17:57:43Z

v0.0.515 2026-06-23T21:23:15Z

v0.0.514 2026-06-23T20:15:47Z

See all releases on GitHub ↗

📋 Changelog

# Changelog

All notable changes to the TT-VSCode-Toolkit will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

---

## [0.1.19] - 2026-07-16
### Added
- **New advanced lesson "Monkeypatching TT-NN"** (order 16, after "Exploring TT-Metalium") — upgrade-safe, smallest-trace patching of TT-NN / TT-Metalium organized by developer goal (observe; fix a bug early; change a default; add something new; and a last-resort source-diff escape hatch), for the TT-QuietBox 2 case where `ttnn` is an installed package with no `~/tt-metal` source tree. Covers the env + import-order rule, the "patch to change behavior, wrap to add behavior" principle (citing Martin Chang's non-invasive `ttPseudoRowMajor` and upstream-first ggml backend), and an AI-agent verification recipe. Validated on p300c — all 19 lesson patterns exercised against real `ttnn` on a TT-QuietBox 2.
- **New reusable template `content/templates/monkeypatch/tt_patches.py`** — a dependency-free patch harness (save/restore `wrap` and `set_default`, `patched` context manager, `version_at_most` guard that zero-pads unequal-length versions, and a `verify` probe helper) with fail-loud missing-target detection, shipped with a self-contained hardware-free `test_tt_patches.py` and a usage README. The lesson embeds the full harness source in a collapsible section for transparency.
- **`tenstorrent.monkeypatch.copyHarness` command** — copies the harness folder into `~/tt-scratchpad/monkeypatch/`, replacing any prior copy (with confirmation) so it mirrors the shipped template, and offers an "Open tt_patches.py" follow-up. Wired into the lesson as a click-to-copy button.
- **`check:monkeypatch-drift` script** (wired into the pre-commit hook) fails if the `tt_patches.py` source embedded in the lesson diverges from the template file.

### Changed
- Excluded `.superpowers/` from the packaged `.vsix` via `.vscodeignore` (was shipping ~1.9 MB of session artifacts).

## [0.1.18] - 2026-07-13
### Fixed
- **PRD-246 — Jeremy's QB2 testing feedback on the first-inference lesson flow:**
  - `download-model` — fixed the broken "Step 3: Download the Model" skip link. The anchor pointed to `#step-3-download-qwen3-0-6b`, but the "Step 3: Download Qwen3-0.6B" heading slugs to `#step-3-download-qwen3-06b` (the `.` in `0.6B` is dropped, not turned into a hyphen).
  - `download-model` — consolidated the repeated, scattered Hugging Face auth flow. Removed the standalone "Already Authenticated?" pre-check and folded the `hf auth whoami` check into Step 2, so authentication reads as a single sequence (set token → check → log in) instead of appearing in multiple places.
  - `hardware-detection` — marked "Check 4: Device Reset" as optional; it is a recovery action, not part of normal detection, and a healthy device never needs it.
  - `tt-installer` — removed the redundant `tt-smi` hardwar

vscode lessons interactive getting-started code-server

Works on

wormhole blackhole quietbox ttsim

tt-tools-common

official

Python · Apache-2.0 · 7⭐ ·

Shared helper library of common utilities used across Tenstorrent system tools such as tt-smi, tt-flash, and tt-topology. A dependency rather than a standalone tool.

Links

📦 Repo

Releases

LATEST v1.6.0 2025-12-23T21:02:08Z Release notes ↗

🐧 python3-tt-tools-common_1.6.0_all-ubuntu-22.04.deb 🐧 python3-tt-tools-common_1.6.0_all-ubuntu-24.04.deb 🐧 python3-tt-tools-common_1.6.0_all-ubuntu-latest.deb 🐍 tt_tools_common-1.6.0-py3-none-any.whl

🐍 pip install tt-tools-common

🐧 apt install python3-tt-tools-common

⚙ Requires PPA — setup instructions ↗

4 previous releases

v1.5.1 2025-12-23T19:43:35Z

v1.5.0 2025-12-18T19:53:01Z

v1.4.33 2025-12-03T20:23:54Z

v1.4.32 2025-10-30T22:01:40Z

See all releases on GitHub ↗

📋 Changelog

# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## 1.4.17 - 02/07/2025

### Changed
- Loosened requirements on pyproject.toml to make it more compatible in different venvs

## 1.4.15 - 05/05/2025

### Changed
- parse\_reset\_json now returns a ResetInput with stricter typing

## 1.4.14 - 04/02/2025

### Added
- New flags in reset config file generation to disable sw\_version reporting

## 1.4.13 - 23/1/2025

- Removed nr\_hugepages count from compatibility, as hugepages allocation is tricky
  and deserves its own widget elsewhere.

## 1.4.12 - 16/1/2025

- Added TTHostCompatibilityMenu to replace Host Info and Compatibility boxes
- Added a count of nr\_hugepages to the TTHostCompatibilityMenu

## 1.4.11 - 30/12/2025

- Updated Luwen version to fix Maturin issue

## 1.4.10 - 16/12/2024

### Changed
- detect\_chips\_with\_callback now takes a print\_status arg

## 1.4.9 - 11/12/2024

### Changed
- A failed reset now results in a fail exit code on BH

## 1.4.8 - 11/10/2024

### Changed
- Updated reset completion logic to handle the case where the bmfw needs to upgrade itself

## 1.4.7 - 11/10/2024

### Added
- Implemented m3 reset option for Blackhole

### Fixed
- Fixed crash during driver version dection when the "extraversion" field is used
    - i.e. 1.28-bh

## 1.4.6 - 17/07/2024

### Added
- Reset support of Blackhole

## 1.4.5 - 11/07/2024

### Added
- Bump pyluwen library version (v0.3.8 -> v0.3.11)
- Moved pyluwen v0.3.11 to optional dependencies in pyproject.toml

## 1.4.4 - 21/06/2024

### Added
- Version bump of python dependencies in pyproject.toml (dependabot)
    - requests (2.31.0 -> 2.32.0)
    - tqdm (4.66.1 -> 4.66.3)
- Pydantic library version bump (1.* -> >=1.2) to resolve: [TT-SMI issue #27](https://github.com/tenstorrent/tt-smi/issues/27)

## 1.4.3 - 14/05/2024

### Added
- Arm platform check and warning for WH device resets in compatibility menu
- Added check for WH device init after reset and prompt user to reboot host if chips are still non recoverable
- Bumped textual (0.59.0) and luwen (0.3.8) lib versions

## 1.4.2 - 04/04/2024

### Added
- Added "silent" flag to WH and GS resets to make them more versatile for use in other tools

## 1.4.1 - 22/03/2024

### Fixed
- removed pyluwen version to avoid dependency issues in other repos

## 1.4.0 - 19/03/2024

### Added
- detect_device_fallible that will provide feedback about chip state during init

### Fixed
- Update min driver version to 1.26 to perform lds reset
- Reset config file uses dev/tenstorrent id
- Catch JSON errors in reset config parsing
- Make nested dirs when initializing reset config path

## 1.3.0 - 06/03/2024

### Added
- Migrated GS Tensix reset to tools_common
- Migrated all related GS data files
- Functions to fetch arc and eth fw versions from telemetry

library tooling shared-utilities

tt-toplike

official

Rust · Apache-2.0 · 6⭐ ·

A vibrant htop-style visualizer for Tenstorrent hardware written in Rust. Real-time process and utilization view for TT accelerators.

Links

📦 Repo 🌐 Website

Releases

LATEST v0.7.40 2026-07-16T21:41:41Z Release notes ↗

🐧 tt-toplike-app_0.7.40_amd64_jammy.deb 🐧 tt-toplike-app_0.7.40_amd64_noble.deb 📦 tt-toplike-tui-0.7.40-macos-universal.tar.gz ⬇ tt-toplike-tui-0.7.40-windows-x86_64.zip +2 more

🐧 apt install tt-toplike

⚙ Requires PPA — setup instructions ↗

🐧 apt install tt-toplike-app

⚙ Requires PPA — setup instructions ↗

4 previous releases

v0.7.33 2026-07-13T18:39:19Z

v0.7.30 2026-07-08T21:05:55Z

v0.7.0 2026-07-01T18:05:08Z

v0.6.3 2026-06-23T20:17:53Z

See all releases on GitHub ↗

📋 Changelog

# Changelog

The **canonical, complete release log lives in [`debian/changelog`](debian/changelog)** —
that's the file the `.deb` packages are built from and where every release is
recorded in full. This file is a friendly pointer plus a summary of the most
recent releases; it deliberately does not duplicate the whole history.

To see everything:

```bash
less debian/changelog # full history
git tag # released versions
```

## Recent releases

### 0.7.33
- **[i] media/diffusion monitoring** — SkyReels / SDXL / z-image servers
(tt-media-inference-server) now show live telemetry instead of a blank panel.
They expose a `tt_media_server_*` Prometheus namespace (not `vllm:`), so a
dedicated parser reads completed generations, the **in-flight `jobs_in_progress`
gauge**, and per-generation timing. The Feeding snake is reused: headline is
generations/min + seconds-per-gen, the body tracks in-flight jobs, and the
panel shows in-flight/done + stage times (no tok/s). Verified against a live
0.15.0 SkyReels box; dedupes the duplicate series that build emits.
- Fix: the legend / help / explain overlay panel truncated its own text (fixed
42-col width vs 50–66-col content, and `Paragraph` clips instead of wrapping).
It now measures its widest line and sizes to fit, clamped to the terminal.

### 0.7.18
- **`--remote <host[:port]>`** — watch a remote QuietBox's telemetry over a
WebSocket stream (plaintext, unauthed: trusted-LAN only). Every visualization
runs against the remote chips; the process panel and `[i]` inference monitor
still describe the **local** machine.
- Remote hardening: the process panel no longer TT-filters local processes under
`--remote`, the Arcade `⚔` duel is suppressed under `--remote`, and backend
status reports last-frame age / flags a stale stream.
- Packaging: WS support is a default-on `remote` cargo feature (opt out with
`--no-default-features`).

### 0.7.17
- **Arcade duel** — the hero now duels the inference snake: a telemetry-true
tug-of-war strip when a model is serving, the `⚔` marker sliding toward
whichever side dominates (chip power/util vs tokens/s + queue depth).
- Per-device power/temp now shows once as a shared strip instead of per section.
- Memory Castle gains a compact 8-column tier before the fleet-grid fallback.
- 1990s BBS/demoscene ANSI chrome (`╔══[ SECTION ]══▓▒░`), themed under grayskull.

### 0.7.16
- **`/theme grayskull`** — an app-wide grayscale palette (a thousand shades of
grey, cyan/purple accents, hot pink as the only hot color). `/theme default`
restores full color; bare `/theme` toggles.

monitoring htop rust real-time

Works on

wormhole blackhole

tt-system-tools

official

Shell · Apache-2.0 · 5⭐ ·

System setup and support utilities for Tenstorrent hardware — hugepages-setup configures the 1GB hugepages TT ASICs need, and tt-oops collects diagnostic data for troubleshooting. Ships as the tenstorrent-tools deb/rpm.

Links

📦 Repo

Releases

LATEST v1.4.1 2025-12-08T17:23:48Z Release notes ↗

⬇ tenstorrent-tools-1.4.1-1.noarch.rpm 🐧 tenstorrent-tools_1.4.1_all.deb

🐧 apt install tenstorrent-tools

⚙ Requires PPA — setup instructions ↗

4 previous releases

v1.4.0 2025-09-04T20:40:58Z

v1.3.1 2025-05-02T18:17:15Z

upstream/v1.2 2025-04-04T21:10:41Z

upstream/1.1 2024-07-17T18:46:30Z

See all releases on GitHub ↗

hugepages system-setup diagnostics

Works on

grayskull wormhole blackhole

tt-emule

official

C++ · Apache-2.0 · 4⭐ ·

A C++ software emulator of the Tenstorrent device-level kernel and host APIs. Run tt-metal kernel and host code on a standard x86-64 Linux machine — no Tenstorrent hardware required.

Links

📦 Repo

emulator tt-metal no-hardware testing kernels

tt-local-generator

official

Python · Apache-2.0 · 3⭐ ·

Generate infinite videos and images (and imaginative prompts to inspire them) on Tenstorrent's Quietbox 2. Fully local generative media pipeline.

Links

📦 Repo 🌐 Website

Releases

LATEST v0.11.0 2026-07-10T16:47:57Z Release notes ↗

🐧 tt-local-generator-models-all_0.11.0_all.deb 🐧 tt-local-generator_0.11.0_amd64.deb 🐧 tt-model-animate_0.11.0_all.deb 🐧 tt-model-flux_0.11.0_all.deb +7 more

4 previous releases

v0.8.0 2026-06-23T20:13:42Z

v0.7.4 2026-06-09T21:59:39Z

v0.3.4 2026-05-26T23:52:15Z

v0.3.3 2026-05-26T17:04:35Z

See all releases on GitHub ↗

video-generation image-generation quietbox generative

Works on

quietbox

tt-burnin

official

Python · Apache-2.0 · 3⭐ ·

Command-line utility that runs a high power-consumption workload on Tenstorrent devices — used for chip testing, burn-in, and validating a system's power delivery and cooling under sustained load.

Links

📦 Repo

Releases

LATEST v0.4.3 2026-07-16T15:26:18Z Release notes ↗

🐧 tt-burnin_0.4.3_all-ubuntu-22.04.deb 🐧 tt-burnin_0.4.3_all-ubuntu-24.04.deb 🐧 tt-burnin_0.4.3_all-ubuntu-latest.deb 🐍 tt_burnin-0.4.3-py3-none-any.whl

🐍 pip install tt-burnin

🐧 apt install tt-burnin

⚙ Requires PPA — setup instructions ↗

4 previous releases

v0.4.2 2026-07-15T19:45:01Z

v0.4.1 2026-07-15T18:58:42Z

v0.4.0 2026-03-19T16:33:31Z

v0.3.0 2025-12-19T20:42:02Z

See all releases on GitHub ↗

📋 Changelog

# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## 0.2.2 - 31/07/2024

### Added
- Added glx reset support
- Threaded start and end of burnin to increase burnin speed
- Added prints to indicate which chip we are currently running on
- Added support for bh harvesting

## 0.2.1 - 16/01/2024

### Bug fix
- Fix for https://github.com/tenstorrent/tt-burnin/issues/6
- BH reports asic temperature as a signed 16_16 int unlike GS and WH
- Added missing support to report BH asic temperatre

## 0.2.0 - 29/10/2024

### Added
- BH burnin support

## 0.1.1 - 14/05/2024

### Updated

- Bumped luwen (0.3.8) and tt_tools_common (1.4.3) lib versions

## 0.1.0 - 04/04/2024

First release of opensource tt-burnin

### Added
- GS and WH burnin support

burn-in stress-test power hardware-validation

Works on

wormhole blackhole

tt-animatediff

official

Python · Apache-2.0 ·

Generates short, temporally coherent animated GIFs using the AnimateDiff model on Tenstorrent hardware. Phase 1 runs the correct SD 1.4 + MotionAdapter architecture on CPU; Phase 2 accelerates spatial denoising on Blackhole using the TTNN UNet. Produces vibrant 8-frame animations in ~15 s/frame on a P300C.

Links

📦 Repo 📖 Native Video Animation with AnimateDiff (VSCode Toolkit)

Releases

LATEST v0.9.0 2026-06-22T19:56:31Z Release notes ↗

2 previous releases

v0.6.0 2026-06-10T22:16:43Z

v0.1.0 2026-06-04T22:31:14Z

See all releases on GitHub ↗

animatediff video-generation stable-diffusion diffusion gif blackhole

Works on

blackhole

Cloud-Native Support

official

Official documentation hub for running Tenstorrent accelerators on Kubernetes. Centers on tt-operator (the umbrella Helm chart) and covers Node Feature Discovery, kernel-mode driver (tt-kmd) management, firmware flashing, Prometheus telemetry, Fabric Manager topology resolution, Dynamic Resource Allocation, and multi-node scheduling via JobSet and PMIx.

Links

🌐 docs.tenstorrent.com

kubernetes cloud-native helm tt-operator orchestration documentation

Works on

wormhole blackhole

TT Console

official★ featured

Browser-based cloud console for exploring AI on Tenstorrent hardware. Run LLM inference, image and video generation, and browse the supported model catalog in-browser — backed by Tenstorrent accelerators. Cloud hardware access and advanced workflows (deployments, agents) available in staged rollout.

Links

🌐 console.tenstorrent.com

cloud console inference playground llm image-generation video-generation demo

Works on

wormhole blackhole

tt-operator

official

Python ·

Kubernetes operator that automates installation and lifecycle management of the full software stack needed to run Tenstorrent workloads on a cluster. Distributed as an umbrella Helm chart coordinating driver (tt-kmd) management, Node Feature Discovery, firmware flashing, Prometheus telemetry, fabric/topology resolution, and device allocation with multi-node scheduling (JobSet/PMIx).

Links

📦 Repo 🌐 Cloud-Native Support docs

kubernetes helm operator orchestration cloud-native tt-kmd device-plugin

Works on

wormhole blackhole

TT-QuietBox 2 Guide

official

Official setup and onboarding guide for the TT-QuietBox 2 — a compact, liquid-cooled AI workstation with four Blackhole accelerators, an AMD Ryzen CPU, 256GB RAM, and 4TB NVMe. Covers hardware specs, first-boot setup, and hands-on learning paths for running pre-loaded models like Qwen3-32B and serving text, image, video, and speech models via tt-inference-server.

Links

📦 Repo 🌐 TT-QuietBox 2 Guide

quietbox blackhole workstation setup getting-started documentation

Works on

blackhole quietbox

ttsim-qemu

official ⑂ qemu/qemu

C · GPL-2.0 ·

Tenstorrent's fork of QEMU that provides the full-system emulation layer behind ttsim. Models the RISC-V cores and system devices of Wormhole and Blackhole so TT-Metalium workloads can boot and run without physical silicon.

Links

📦 Repo

simulator qemu full-system emulation no-hardware wormhole blackhole

Works on

ttsim

tt-bio

affiliated★ featured

by moritztng · Python · MIT · 113⭐ · Jan 31, 2026

Boltz-2 biomolecular model for drug discovery on Tenstorrent Blackhole. Supports single-card and multi-card configurations — QuietBox (4×) and Galaxy (32×). Approaches physics-based FEP accuracy at 1000× the speed.

Links

📦 Repo 🎤 FOSDEM 2026 — Drug Discovery on Tenstorrent Hardware

Releases

LATEST v0.3.2 2026-07-20T03:01:53Z Release notes ↗

🐍 tt_bio-0.3.2-py3-none-any.whl 📦 tt_bio-0.3.2.tar.gz

4 previous releases

v0.3.1 2026-07-19T12:03:38Z

v0.3.0 2026-07-17T02:31:21Z

v0.2.5 2026-07-11T07:14:22Z

v0.2.4 2026-07-09T22:41:48Z

See all releases on GitHub ↗

📋 Changelog

# Changelog

All notable changes to TT-Bio are recorded here. Versioning is [SemVer](https://semver.org);
releases are cut from a commit that has passed the on-hardware test suite (see `RELEASING.md`).

## [Unreleased]

## [0.3.2] - 2026-07-20

### Fixed
- **SaProt-1.3b config bug** — `CONFIGS["saprot-1.3b"]` carried a fabricated arch
  (hidden=2560 / n_heads=40 / n_layers=40 / intermediate=10240) that does not match
  the real `westlake-repl/SaProt_1.3B_AF2` checkpoint (1280 / 20 / 66 / 5120 — the
  650m width with double the layers). `load_state_dict(..., strict=False)` silently
  masked the mismatch, so the device ran with effectively untrained weights and the
  1.3b leg read as a parity failure. Config corrected; `Saprot.from_pretrained` now
  reads the checkpoint's `config.json` and refuses to build on an arch mismatch. With
  correct shapes saprot-1.3b reaches X_emb=0.99508 / X_logits=0.99895 (deterministic,
  qb1 card 1) — a near-pass; the per-residue embedding PCC lands just below the
  0.9987–0.9996 ESMC band (bf16 accumulation over 66 residual layers), so no clean
  PASS row is added to `docs/pharma-benchmark.md`. See `docs/saprot-parity.md`.
- **Perf-gate within-card-type false positives** — the perf-regression gate keyed
  baselines by card type only, so two machines with the same card type (pc vs qb1,
  both p150a) read as false ~30–36% regressions against each other. Added a machine-id
  layer under card type (`socket.gethostname()`), with backward-compatible fallback
  to the card-type block. `--update-baseline` now writes to the detected machine's
  block.

### Added
- **`tt-bio saprot --devices`** — multi-card data-parallel fanout for SaProt
  embeddings (one pinned worker per card, sequences sharded by length, results
  reassembled in input order), mirroring the ESMC `--devices` path. Row-independent:
  a sequence's output is identical to running it on one card.
- **esmc-300m and esmc-6b perf-gate baselines seeded** (esmc-300m 33.17 seq/s on
  p300c, esmc-6b 3.17 seq/s on p150a), activating the perf-regression legs specced
  in 0.3.1.
- **Release-gate perf + UX coverage for SaProt and Boltz-2 affinity** — both shipped
  in 0.3.1 with accuracy-leg coverage but no perf/UX gate legs; saprot-650m
  (222.69 seq/s, qb1 p150a) and boltz2-affinity (0.014319 affinities/s, p300c)
  baselines seeded.

### Removed
- **ProteinMPNN** — the `tt-bio design` inverse-folding port is dropped entirely.
  It ran CPU-only (dispatch-bound, no TT-card use), duplicated BoltzGen's
  inverse-fold capability, and reimplemented the mature upstream
  `dauparas/ProteinMPNN`. SaProt is untouched.

### Verify / benchmark hardening
- **Boltz-2 and Protenix-v2 ubiquitin flagship legs hardened 2+2 → 5+5 seeds**
  (seeds 0–4 both sides): R and D are now 10 pairwise distances each, so the parity
  verdict is a real statistical statement rather than a single-pair coincidence.
  Both PASS within floor on CA-RMSD and TM-score; CA-lDDT misses on Boltz-2 (a bf16
  narrow

drug-discovery blackhole inference biology multi-card

Works on

blackhole quietbox galaxy

grayskull-attention

affiliated

by moritztng · TeX · MIT · 38⭐ ·

FlashAttention-style attention kernel implemented entirely in on-chip SRAM on the Tenstorrent Grayskull chip using TT-Metalium. Pioneering work in low-level attention on TT hardware.

Links

📦 Repo

attention grayskull metalium sram kernel

Works on

grayskull

tt-atom

affiliated

by moritztng · Python · MIT · 24⭐ ·

Meta's UMA interatomic potential running on Tenstorrent Blackhole — energy, forces, and stress for molecules and periodic materials behind an ASE calculator. Its per-edge Wigner rotation runs as a custom tt-metal kernel for a highest-performance uma-s build.

Links

📦 Repo

Releases

LATEST v0.2.0 2026-07-11T07:47:26Z Release notes ↗

🐍 tt_atom-0.2.0-py3-none-any.whl 📦 tt_atom-0.2.0.tar.gz

1 previous release

v0.1.0 2026-07-08T19:08:02Z

See all releases on GitHub ↗

📋 Changelog

# Changelog

All notable changes to TT-Atom are recorded here. Versioning is [SemVer](https://semver.org);
releases are cut only from a commit that has passed the on-hardware release gate — accuracy
parity, no OOM across the supported size range, and no perf regression (see `RELEASING.md`).

## [Unreleased]

### Fixed
- `tt_atom.batch.MultiCard` now builds the Orb-v3/OrbMol backbone when given Orb weights. The
  worker previously hardcoded the UMA path (`WeightBundle` + eSCN-MD `Backbone`), so pointing it
  at an Orb weights file built the wrong model silently. It now dispatches on the loaded bundle's
  `config` (the same UMA/Orb family split `tt_atom.auto` exposes by name) and runs the
  `Encoder`/`AttentionInteractionLayer`/`EnergyHead` forward for Orb. Verified bit-exact vs the
  single-card `OrbCalculator` on `orb-v3-conservative-inf-omat` (energy diff 0 eV on H2O /
  ethanol / benzene). `tests/test_multicard_orb.py` mirrors the UMA `test_multicard.py` sharded-vs-
  sequential parity shape (auto-skips below 2 cards).

### Notes
- The v0.2.0 scope note below flagged Orb multi-card as "not independently re-run — same
  scheduler as UMA". That understated the gap: at v0.2.0 the worker was UMA-only, so Orb
  multi-card did not work at all (not merely unmeasured). Fixed here.
- No real-weights multi-card *scaling* number is re-reported this pass: pc has a single
  Tenstorrent card, so N>1 scaling cannot be measured on it. The one honest datapoint measured
  here is the per-card baseline: Orb (`conservative-inf-omat`, real weights) on one card at
  ~128-atom Si supercells — 0.37 Medges/s. The earlier 2.95x@4cards figure (commit 43e981b) used
  the synthetic `examples/model_tiny_demo.npz` UMA bundle, not real weights and not Orb, so it is
  not a real-weights scaling number for either family.

## [0.2.0] - 2026-07-11

A second model family, additive to v0.1.0: **Orb-v3** (Orbital Materials) and **OrbMol**, its
charge/spin-conditioned molecular variant. UMA/eSEN code paths are untouched (byte-identical to
v0.1.0) — see full history and numbers in `docs/orb-port.md`.

### Added
- **Orb-v3** (`orb-v3-conservative-inf-omat`, `orb-v3-direct-20-omat`): a non-equivariant,
  attention-MPNN backbone, ported bottom-up (encoder, 5-layer backbone, energy/force/stress
  heads, ZBL pair repulsion, periodic images, disjoint-union batching). None of UMA's four custom
  kernels transfer (Orb has no equivariant hidden representation) — this path runs on stock `ttnn`
  ops only, no source tt-metal build required for Orb-only use.
- **OrbMol** (`orb-v3-conservative-omol`, `orb-v3-direct-omol`): the OMol25-trained, charge/spin-
  conditioned checkpoints. Reuses the Orb-v3 backbone unmodified plus a closed-form, node-only
  charge/spin embedding (zero learned matmuls) — no new forward/backward machinery.
- `OrbTracedEngine` (`tt_atom/orb_trace.py`): trace-capture for the Orb-v3 forward(+analytic-VJP
  backward), refreshing only the two pos-dependent device inputs per MD/

molecular-dynamics interatomic-potential mlip uma ase inference custom-kernel

Works on

blackhole

tt-lang-models

affiliated

by zoecarver · Python · 7⭐ ·

A growing collection of models that use tt-lang for some or all of their implementation. Reference implementations for bringing modern models to the tt-lang DSL.

Links

📦 Repo

tt-lang models dsl reference

tt-zork-and-more

affiliated ⑂ historicalsource/zork1★ featured

by tsingletaryTT · Python · 2⭐ ·

A Tenstorrent fork of Infocom's Zork I (and more!), running a Z-machine interpreter at least four different ways on TT hardware. The most fun you can have with an AI accelerator.

Links

📦 Repo 🌐 Website

zork z-machine interactive-fiction demo fun

tt-qb-lights

affiliated

by tsingletaryTT · Rust · 2⭐ ·

Sync your Tenstorrent Quietbox's RGB lighting to accelerator utilization status. Visual feedback for hardware activity in real time.

Links

📦 Repo

quietbox rgb hardware fun

Works on

quietbox

diamond

affiliated ⑂ eloialonso/diamond

by zoecarver · Python · 1⭐ ·

DIAMOND: Atari game-playing agent implemented on Tenstorrent hardware via tt-lang. Diffusion-based world model for reinforcement learning.

Links

📦 Repo 🌐 Website

atari reinforcement-learning world-model tt-lang

gemma4

affiliated

by zoecarver · Python · 1⭐ ·

Gemma 4 language model implemented in tt-lang (e4b variant) for direct execution on Tenstorrent hardware.

Links

📦 Repo

gemma llm tt-lang inference

Works on

blackhole

open-oasis

affiliated ⑂ etched-ai/open-oasis

by zoecarver · Python · 1⭐ ·

tt-lang inference script for Oasis 500M — an interactive video world model running on Tenstorrent hardware via the tt-lang DSL.

Links

📦 Repo

video world-model oasis tt-lang inference

Works on

blackhole

tt-model-runner

affiliated

by tsingletaryTT · Python · 1⭐ ·

Discover, load, and benchmark models with a GUI and TUI for tt-inference-server. Makes exploring available models on Tenstorrent hardware as easy as browsing a catalog.

Links

📦 Repo

gui tui models inference benchmark

Works on

wormhole blackhole quietbox

tt-claw

affiliated

by tsingletaryTT · Shell ·

A Tenstorrent-powered claw machine that rewards players with real prizes. The QuietBox 2 runs local AI inference to act as an agent controlling the claw hardware — the OpenClaw AI assistant lesson builds directly on this project.

Links

📦 Repo 📖 OpenClaw AI Assistant on QuietBox 2

claw-machine agents hardware quietbox physical on-device

Works on

quietbox

Local AI Agents on Tenstorrent

affiliated★ featured

by ·

Three agentic projects running fully on-device: local AI agents on QuietBox 2, a coding assistant powered by Aider against a local inference server, and the OpenClaw AI assistant on QuietBox 2. No cloud APIs — all inference runs on TT hardware.

Links

📖 Local AI Agents on QuietBox 2 📖 Coding Assistant with Aider

agents local-llm aider coding-assistant quietbox on-device

Works on

affiliated

by tsingletaryTT · Python ·

Compile more than 100 models on tt-forge in a display format suitable for demos. Comprehensive showcase of tt-forge model compatibility.

Links

📦 Repo

📋 Changelog

# Changelog

All notable changes to tt-forge-compiletron are documented here.

## [Unreleased]

### Added
- `docs/kv-cache-bench.md` — teaching companion for the StaticCache KV cache
  benchmark, explaining the two-graph pattern and why static shapes matter

---

## [1.6.0] — 2026-06-30

### Added
- **StaticCache KV cache decode benchmarking** — `bench_decode.py` now compiles
  a second forge graph for the decode step using `transformers.StaticCache`.
  The StaticCache is embedded in `KVDecodeWrapper` as a submodule so forge
  traces K/V tensors as model state and emits `FillCache`/`UpdateCache` ops.
  Falls back to full-recompute for models that don't support `cache_position`.
- `_try_kv_decode()` function — detects model dtype to avoid bfloat16/float32
  mismatches, resolves tokenizer from loader or AutoTokenizer, pre-fills cache
  on CPU before forge compilation.
- Bestiary `decode_note` field now records the method used per model
  ("StaticCache KV cache" vs "no KV cache — full recompute per step").

### Changed
- Decode results updated for all 5 stages — GPT-2 2.30→5.52 tok/s, OPT
  3.98→5.05 tok/s, Phi-2 1.48 tok/s (new), Falcon 3.30 tok/s (new),
  LLaMA-LoRA 2.86 tok/s (new), Gemma-LoRA 2.40 tok/s (new), and more.

---

## [1.5.0] — 2026-06-30

### Added
- **`scripts/bench_decode.py`** — dedicated LLM decode benchmark measuring
  TTFT, prefill tok/s, and decode tok/s for all compiled causal LMs.
  Subprocess isolation + tt-smi health check prevent hardware lockups.
- **Leaderboard columns** — TTFT, Prefill tok/s, Decode tok/s, Params (M)
  replace the old Infer p50 / Throughput columns in `docs/leaderboard.html`.
- 5 benchmark stages: Stage 1 (GPT-2, OPT), Stage 2 (Phi-2, BLOOM, CodeGen),
  Stage 3 (Falcon, Allam, LLaMA-LoRA, Gemma-LoRA), Stage 4 (Qwen 2.5,
  Phi-1 LoRA), Stage 5 (DeepCogito, DeepSeek Coder, frontier models).
- `params_m` field added to all benchmarked bestiary entries.
- `hf:` loader prefix for frontier HuggingFace models loaded without a
  tt-forge-models seed loader.

### Changed
- Bestiary `throughput_unit` relabeled from generic `tok/s` → `prefill_tok/s`
  for all 54 causal LM entries to prevent confusion with decode throughput.

---

## [1.4.0] — 2026-06-29

### Added
- **`scripts/install.sh`** — turn-key smart installer: hardware pre-check,
  hugepages, disk space, forge venv, XLA venv, mesh descriptor probe,
  tt-forge-models clone, stale-shm cleanup. Outputs color-coded summary table.
- **RAM/DRAM budget calculator** — skips models whose weights exceed available
  system RAM + per-chip DRAM; prevents OOM crashes at load time.
- **`scripts/setup-venvs.sh`** — minimal venv setup script for clean Ubuntu
  24.04 installs on Tenstorrent Blackhole hardware.
- Self-contained patches directory — tt-forge-models fixes applied at
  expedition startup without modifying upstream.
- `--ephemeral` / `--evict-failures` flags — evict HF weight cache after
  each model to reclaim disk space on small-storage machines.

### Changed

tt-forge models demo compilation

Image Classification with TT-Forge

affiliated

by ·

End-to-end image classification project using TT-Forge — compile and run a PyTorch classification model on Tenstorrent hardware with no kernel authoring required.

Links

📖 Image Classification with TT-Forge

forge image-classification pytorch compiler inference

Works on

wormhole blackhole

tensix-viz

affiliated★ featured

by tsingletaryTT · JavaScript ·

Hardware topology visualizer for Tenstorrent chips — from individual chip to full cluster. Interactive JavaScript visualization of Tensix core layout and NoC connections.

Links

📦 Repo 🌐 Website

Releases

LATEST v1.1.2 2026-06-29T17:42:46Z Release notes ↗

2 previous releases

v1.1.1 2026-06-26T13:32:52Z

v1.1.0 2026-06-09T22:19:42Z

See all releases on GitHub ↗

📋 Changelog

# Changelog

All notable changes to tensix-viz are documented here.

## [1.1.2] - 2026-06-29

### Fixed

- **`TensixViz.autoInit()` is idempotent for `.tensix-viz-container` elements** (`src/chip.js`)
  1.1.1 made the `[data-viz]` path idempotent but left the legacy single-chip path unguarded. When
  `autoInit()` ran twice (the bundle's self-init plus an explicit host-page call), each
  `.tensix-viz-container` canvas received a second `TensixViz` instance — two animation loops drawing
  on one canvas, which renders as a doubled/overlapping grid. `TensixViz.autoInit()` now skips any
  container already initialized (`container._tensixViz`) and records the instance on it.

### Added

- **Responsive multi-chip canvas** (`tensix-viz.css`)
  `.tv-chip-wrapper canvas { max-width: 100%; height: auto; }` — card/system canvases (created
  without the `.tensix-viz-canvas` class) now scale to fit a narrow column instead of being clipped
  by `.tv-card`'s overflow. Previously this rule had to be patched in by downstream consumers.

## [1.1.1] - 2026-06-25

### Fixed

- **Animation player accepts both script schemas** (`src/chip.js` `_execStep`)
  The player dispatched on `step.step` and read `step.cores` only, so scripts authored with the
  alternate `{ action, coords }` schema ran zero steps — the Play button (and auto-play) appeared
  dead. `_execStep` now dispatches on `step.step || step.action` and falls back `coords → cores`,
  so blocks written in either schema animate.

- **`autoInit()` is idempotent for `[data-viz]` elements** (`src/index.js`)
  `autoInit()` can run more than once (the bundle's self-init plus an explicit call). For `card`
  and `system` vizzes — which append their render into the host element — the second run appended
  a duplicate set of chips. `autoInit()` now skips any element already initialized (`el._tensixViz`).

## [1.1.0] - 2026-06-09

### Fixed

- **Heatmap: non-tensix cells no longer painted by heat overlay** (`src/chip.js` `_drawHeatmap`)
  Commit 76dca80 added `coreType !== 'tensix'` guards to the pre-built artifacts but never to
  the source. The guards are now in `src/chip.js` so the next build preserves them. Without this
  fix, DRAM (col 5 on Wormhole), ETH (row 6 on Wormhole), and PCIe (col 8 on Blackhole) cells
  were colored by the heatmap overlay and could inflate `maxVal`, compressing the visible range
  for all tensix cells.

- **Memory overlay: stale phase not rendered after `reset()` on `showMemory: true` instances**
  (`src/chip.js` `reset()` and constructor)
  After calling `viz.activate(mode)` followed by `viz.reset()` on a canvas created with
  `showMemory: true`, `_memPhase` retained the frozen `_mem` object from the animation closure.
  `reset()` calls `render()` at the end, which caused `_drawMemoryLayer()` to run with stale data,
  producing a faint DRAM glow and L1 fill bars on an otherwise blank chip. `reset()` now sets
  `this._memPhase = null`; the field is also explicitly initialized to `null` in t

visualization topology noc hardware

Works on

wormhole blackhole

Live chip topology

Blackhole · P100 / P150 / P300c · 140 Tensix cores

Wormhole · N150 / N300 · 64 Tensix cores

mode

tt-warp

affiliated

by tsingletaryTT · Python ·

Warp terminal plugin for Tenstorrent — integrates hardware status, model management, and developer workflows directly into the Warp terminal.

Links

📦 Repo

community★ featured

by Syllo · C · GPL-3.0 · 10846⭐ ·

htop-style process monitor for GPUs and AI accelerators. Supports AMD, Apple, Huawei, Intel, NVIDIA, Qualcomm — and Tenstorrent. Real-time utilization, memory, and process info in a terminal UI.

Links

📦 Repo

Releases

LATEST 3.3.2 2026-02-08T17:57:16Z Release notes ↗

⬇ nvtop-3.3.2-x86_64.AppImage

4 previous releases

3.3.1 2026-01-18T13:12:34Z

3.3.0 2026-01-16T13:28:09Z

3.2.0 2025-03-29T11:26:44Z

3.1.0 2024-02-23T15:04:44Z

See all releases on GitHub ↗

monitoring tui htop process-monitor terminal

Works on

wormhole blackhole

dstack

community★ featured

by dstackai · Python · MPL-2.0 · 2185⭐ ·

Vendor-agnostic orchestration for training, inference, and agentic workloads across NVIDIA, AMD, TPU, and Tenstorrent on clouds, Kubernetes, and bare metal.

Links

📦 Repo 🌐 Website

Releases

LATEST 0.20.28 2026-07-16T14:58:47Z Release notes ↗

4 previous releases

0.20.27 2026-07-09T13:10:13Z

0.20.26 2026-06-25T19:29:18Z

0.20.25 2026-06-18T11:12:02Z

0.20.25rc1pre 2026-06-12T15:38:06Z

See all releases on GitHub ↗

orchestration kubernetes cloud multi-vendor

BarraCUDA

community★ featured

by Zaneham · C · 1717⭐ ·

Open-source CUDA compiler targeting multiple GPU architectures including Tenstorrent. Compiles .cu files to run on AMD and Tenstorrent hardware without modification.

Links

📦 Repo

Releases

LATEST v5.01 2026-07-14T08:02:45Z Release notes ↗

1 previous release

v0.5.0 2026-05-29T04:30:28Z

See all releases on GitHub ↗

cuda compiler cross-platform blackhole

Works on

blackhole

tt-tiny

community★ featured

by geohot · Python · 68⭐ ·

Minimal Python code to access and program the Tenstorrent Blackhole chip directly — George Hotz's exploration of TT hardware programmability with pointed commentary on the architecture.

Links

📦 Repo

blackhole low-level exploration

Works on

blackhole

zyx

community

by zk4x · Rust · LGPL-3.0 · 61⭐ · Sep 25, 2022

A complete ML library and compiler in Rust — "from assembly to neural networks" — with a native Tenstorrent backend (src/backend/tenstorrent), autograd, custom kernels, multi-backend support, and Python bindings.

Links

📦 Repo 🌐 Website

Releases

LATEST v0.14.0 2024-09-22T13:54:32Z Release notes ↗

4 previous releases

v0.12.0 2024-03-10T09:41:35Z

v0.11.3 2023-10-08T12:12:13Z

v0.11.2 2023-10-08T11:55:14Z

v0.11.1 2023-09-28T14:06:33Z

See all releases on GitHub ↗

rust ml-compiler tensor-library autograd backend

tt-twitch

community

by geohot · C++ · 29⭐ ·

A Tenstorrent Grayskull kernel written live on Twitch by George Hotz. 120-core grid demonstration of live kernel programming.

Links

📦 Repo

grayskull kernel live-coding demo

Works on

community★ featured

by mesham · Python · 13⭐ ·

Community-built Tenstorrent architecture simulator written in Python. Runs without hardware — useful for researchers and developers exploring the Tensix architecture offline.

Links

📦 Repo

Releases

LATEST v1.0 2026-05-11T13:07:42Z Release notes ↗

See all releases on GitHub ↗

simulator architecture no-hardware research

tt-iree

community★ featured

by swote-git · C++ · Apache-2.0 · 12⭐ ·

IREE (Intermediate Representation Execution Environment) ML compiler ported to Tenstorrent AI accelerators. Brings the IREE compiler ecosystem to TT hardware.

Links

📦 Repo

iree compiler mlir inference

Works on

wormhole blackhole

TT-GoL

community

by JushBJJ · C++ · 12⭐ ·

Conway's Game of Life implemented on Tenstorrent hardware using TT-Metal kernels.

Links

📦 Repo

game-of-life demo kernels

triton-tenstorrent

community★ featured

by kernelize-ai · C++ · 11⭐ ·

OpenAI Triton compiler plugin for Tenstorrent hardware. Write Triton kernels and target Tensix cores — brings the Triton ML kernel ecosystem to TT devices.

Links

📦 Repo

triton openai-triton compiler kernels

Works on

wormhole blackhole

tenstorrent.nix

community

by RossComputerGuy · Nix · LGPL-2.1 · 8⭐ ·

Nix flake packaging the Tenstorrent software stack for NixOS and Nix users. Reproducible, declarative installation of TT drivers and tools.

Links

📦 Repo

nix nixos packaging flake reproducible

ttMandelbrot

community

by marty1885 · C · 0BSD · 7⭐ ·

Mandelbrot Set fractal renderer running on Tenstorrent hardware. A classic demo showcasing parallel compute on Tensix cores.

Links

📦 Repo

community

by marty1885 · C++ · ISC · 5⭐ ·

Minimal vector-addition example on Tenstorrent devices using TT-Metalium. A clean hello-world for the TT-Metal kernel programming model in C++.

Links

📦 Repo

vector-add example metalium hello-world

bhx

community★ featured

by olofj · Rust · 5⭐ ·

Boot stock Linux cloud images on the SiFive X280 RISC-V cores inside Tenstorrent Blackhole AI accelerators. Per-card Rust daemon with virtio-mmio block/net/console and U-Boot/EFI support.

Links

📦 Repo

📋 Changelog

# Changelog

Notable changes per release. Format loosely follows
[Keep a Changelog](https://keepachangelog.com/en/1.1.0/);
this project does not yet promise SemVer compatibility on the RPC
wire format or library API surface (we're not 1.0).

## Unreleased

V2 virtio-dispatch redesign. The kick ring + completion ring + host-
side throttle that grew up around #184 are gone; in their place is a
per-(slot, queue) dirty bitmap in BRISC L1. The bitmap is level-
sensitive — guest QUEUE_NOTIFY storms coalesce into a single set
byte, so the dispatch path can't fall behind under any burst. Wire
incompatible with 0.9.0; `TENSIX_PROTOCOL_VERSION` bumped 4 → 5.

### Added

- **V2 dirty-bitmap dispatch** (`#187` / `#188` / `#189`). BRISC
  writes 1 to `CTRL_OFF_DIRTY[slot][queue]` on every guest
  QUEUE_NOTIFY; the daemon's `Dispatcher` clears the byte and
  dispatches each pass. Replaces V1's 2048-entry kick ring +
  daemon-side `consume_kick_ring_pass` consumer.
- **V2 processed-cursor table** at `CTRL_OFF_PROCESSED`. Daemon
  publishes `used.idx` after each successful dispatch so
  warm-resume reads cursors directly without re-probing guest
  DRAM.
- **`bhx_notify_events_total`, `bhx_dispatch_passes_total`,
  `bhx_dispatch_queues_drained`** Prometheus counters surface the
  new dispatch path. The burst regression test (`scripts/
  soak_virtio_burst.py`) asserts `dispatch_passes_total > 0` to
  confirm the workload reached the new path.
- **`scripts/soak_virtio_burst.py`** — multi-queue burst regression
  test. Sustains 16-job direct=1 fio randwrite + a tight
  `printf` loop to `/dev/console`, samples `/metrics` every 1 s,
  and verifies the daemon log contains zero
  `kick.*drop|rescue|throttle.*ENGAGE` matches.
- **`DaemonState.chip_reset_this_session`** flag — gates
  `maybe_opportunistic_reset_board` so 4-way parallel cold boots
  reset the chip exactly once, not once per L2CPU. Without this
  the second-and-later resets blip the chip while earlier-booted
  L2CPUs hold mmap pages, SIGBUSing their workers.
- **`Dispatcher` (was `KickPoller`)** with documented testability
  seam (`CtrlL1Access` trait); `drain_dirty_bitmap` is unit-tested
  against an in-memory L1 fake covering all five visit/clear
  semantics cases plus the address-formula pins.

### Changed

- **`KickPoller` → `Dispatcher`**, plus `kick_poller` → `dispatcher`
  field on `DaemonState`, `tensix-kick-poller` → `tensix-dispatcher`
  thread name, `[kick-poller]` → `[dispatcher]` log tag,
  `kicks_consumed` → `dispatches_total`,
  `last_kick_slot_queue` → `last_dispatch_slot_queue`. Pure
  rename; no behavior change. V1 vocabulary scrubbed throughout
  the codebase (firmware, daemon, scripts, docs).
- **`CTRL_SIZE` shrinks 36 KiB → 4 KiB**. V2 footprint is ~1.5 KiB;
  the rest is reserved for future fields.
- **Stats-page offsets repacked** — V1 `STATS_OFF_KICK_DROPS`,
  `STATS_OFF_COMPL_EVENTS`, `STATS_OFF_LAST_COMPL` retired with
  V1 (#190); deprecated PRECAP / BLINDCAP / POSTCAP slots dropp

blackhole risc-v linux boot virtio

Works on

blackhole

ttas

community

by Zaneham · C · Apache-2.0 · 4⭐ ·

ttas is a hacker-friendly assembler/disassembler for Tensix on Wormhole. It turns assembly into the exact 32-bit words the hardware runs, and turns binaries back into readable instructions using the same shared instruction table.

Links

📦 Repo

Releases

LATEST v0.1.0 2026-05-28T07:08:35Z Release notes ↗

1 previous release

v0.0.1 2026-05-27T15:19:11Z

See all releases on GitHub ↗

assembler

Works on

wormhole

tt-tutorial (Korean)

community

by changh95 · Jupyter Notebook · 4⭐ ·

Comprehensive tutorials for the Tenstorrent software stack in Korean. Jupyter notebooks covering the full developer path from hardware setup to model inference.

Links

📦 Repo

tutorial korean jupyter getting-started

Works on

wormhole

Collective Operations on Wormhole n150 (Sapienza University of Rome)

community

by Charles Heron (Sapienza University of Rome) · 4⭐ ·

Master's thesis implementing and benchmarking five allreduce algorithms (Swing, Recursive Doubling, Bandwidth Optimal, Latency Optimal, Shared Memory) on the Wormhole n150. Bandwidth Optimal achieved best performance, approaching within 2× of theoretical optimal.

Links

📦 Repo

allreduce collective-ops wormhole mpi bandwidth

Works on

wormhole

libtt-metal-cxx

community

by Knight-Ops · Rust · 3⭐ ·

Rust crate that exposes the TT-Metal host API through a C++ bridge via cxx.rs — covering device management, program/kernel creation (from source file or inline string), circular buffers, semaphores, runtime arguments, sharded buffers, and MeshDevice workflows, with hardware-backed integration tests.

Links

📦 Repo

rust bindings cxx tt-metal ffi host-api

Works on

wormhole blackhole

tetsuh/tt-metal-community-distro-matrix

community

by tetsuh · Python · Apache-2.0 · 2⭐ ·

A compatibility guardrail that continuously monitors whether [tt-metal](https://github.com/tenstorrent/tt-metal) and the official [tt-installer](https://github.com/tenstorrent/tt-installer) build successfully on community Linux distributions that are not part of Tenstorrent's official CI.

Links

📦 Repo

libtt

community

by Philipp Moritz · 1⭐ ·

A Bazel-built PJRT plugin (libtt.so) providing an XLA backend for Tenstorrent devices. Bundles the tt-xla PJRT implementation with tt-mlir and tt-metal into a single shared object so JAX code runs on Tenstorrent hardware, with patches so sglang-jax works out of the box.

Links

📦 Repo

xla pjrt jax bazel sglang

tt-splat — matrix-native 3D Gaussian Splatting on Blackhole

community

by kinginu · Python · Apache-2.0 · 1⭐ ·

tt-splat — matrix-native 3D Gaussian Splatting on Blackhole preview

3D Gaussian Splatting rewritten to run on the matrix engine: a polynomial splat and order-independent weighted-sum blending replace exp and depth-sorted alpha, so the pipeline becomes GEMM → activation → GEMM. Renderer + trainer, trained device-resident on a Blackhole p150a.

Links

📦 Repo

3d-gaussian-splatting 3dgs rendering matrix-engine weighted-sum-rendering

Works on

community★ featured

by · Feb 28, 2026

A 6,500-word community deep dive into the Blackhole p100a architecture: the tile model (Tensix, DRAM, SiFive x280 L2CPU, Ethernet, PCIe, NoC arc), firmware startup sequence, MOP micro-op processor, replay buffer, FPU/SFPU sync, and the anatomy of a kernel. From the author of blackhole-py.

Links

📝 anuraagw.me — February 2026

blackhole architecture tensix noc sifive-x280 firmware mop sfpu deep-dive blog

Works on

blackhole