Tenstorrent Lessons
Interactive guides for Tenstorrent hardware and software. Use the hardware filter to find lessons for your system.
Your First Inference
Modern Setup with tt-installer 2.0
The fastest way to get started with Tenstorrent! Use tt-installer 2.0 for one-command installation of the full stack including drivers, firmware, tt-metalium containers, and Python environment.
Hardware Detection
Scan for connected Tenstorrent devices and verify they're properly recognized by the system.
Verify Your Setup
Check that your Tenstorrent hardware, TTNN, and optional tt-metal source are ready before running your first model. A diagnostic checkpoint — returns you here after any setup work.
Download Model and Run Inference
Download Qwen3-0.6B (the recommended model — no license gate, works on all hardware) from Hugging Face to run AI workloads on your Tenstorrent hardware. Optionally download Llama-3.1-8B-Instruct for N300+ hardware.
Interactive Chat with Direct API
Build a custom chat application using tt-metal's Generator API directly.
HTTP API Server with Direct API
Create a production-ready Flask API with the model loaded in memory.
Build tt-metal from Source
Clone and build tt-metal from source. Required for Direct API (Generator API) lessons and for running tt-metal examples directly. QB2 and pre-configured images do not ship with ~/tt-metal — start here if Check 3 in Verify Your Setup failed.
Serving & APIs
Production Inference with tt-inference-server
Deploy Llama-3.1-8B on any Tenstorrent hardware in minutes — N150, N300, T3K, P100, p300c, or QB2. tt-inference-server automates Docker image selection, model download, and server startup with a single command. OpenAI-compatible API ready immediately.
Production Inference with vLLM
Deploy with vLLM - OpenAI-compatible APIs, continuous batching, and enterprise features.
Image Generation with Stable Diffusion XL
Generate high-resolution 1024x1024 images using Stable Diffusion XL Base running natively on your Tenstorrent hardware!
Video Generation via Frame-by-Frame Diffusion
Create videos by generating frames with Stable Diffusion on Tenstorrent hardware. Demonstrates hardware scaling from N150 to T3K - same code, faster performance!
Applications
Coding Assistant with Aider
Run a real AI coding assistant (Aider) against your local Tenstorrent vLLM server. One install, one command — then pair-program with your own on-device LLM. Also covers prompt engineering to customize behavior.
Native Video Animation with AnimateDiff
Learn to build standalone packages outside tt-metal! Integrate AnimateDiff temporal attention to create animated videos. Master the complete model bring-up workflow - from research to production. Perfect for subtle motion effects like blinking eyes, screen flickers, and cinemagraphs.
OpenClaw AI Assistant on QuietBox 2
Deploy a local AI assistant with Tenstorrent expertise, memory search, and 70B reasoning powered by your QB2 hardware
Generating Video on QuietBox 2
Go from a fresh QB2 to AI-generated video clips in one session — Wan2.2-T2V-A14B on 4× Blackhole chips with a GTK4 GUI and automated prompt generation
Local AI Agents on QuietBox 2
Build real AI agents in pure Python — web research, codebase navigation, multi-agent pipelines, and stateful text adventures — all running locally on your QB2's 32B/70B models
Custom Training
Understanding Custom Training
Learn the fundamentals of custom training on Tenstorrent hardware. Understand the difference between fine-tuning and training from scratch, explore the tt-train framework, and discover when to use each approach for building specialized AI models.
Dataset Fundamentals
Master dataset creation and validation for fine-tuning. Learn JSONL format, quality guidelines, tokenization concepts, and HuggingFace integration. Create high-quality training datasets that produce excellent model results.
Configuration Patterns
Learn YAML-driven training configuration using tt-blacksmith patterns. Master hyperparameters, device configuration, checkpointing strategies, and logging. Create reproducible, shareable training configurations.
Fine-tuning Basics
Train a character-level language model from scratch on Tenstorrent hardware. Watch NanoGPT learn Shakespeare through progressive training stages. See hierarchical learning in action as models learn structure before vocabulary before fluency.
Multi-Device Training
Scale training to multiple Tenstorrent chips with Data Parallel (DDP). Learn device mesh configuration, gradient synchronization, and performance optimization. Achieve 2-8x speedup on N300, T3K, and Galaxy systems.
Experiment Tracking
Master experiment tracking with file-based logging and Weights & Biases (WandB) integration. Compare hyperparameter variations, visualize training curves, and manage experiments professionally. Make data-driven training decisions.
Model Architecture Basics
Understand transformer architecture components before training from scratch. Learn about embeddings, attention mechanisms, feed-forward networks, and how to design custom architectures. Prepare to build your own models.
Training from Scratch
Build and train a transformer from random initialization. Design nano-trickster (11M params), train on Shakespeare, and watch it learn language patterns from scratch. Compare to random baseline and understand scaling laws.
Cookbook
Tenstorrent Cookbook Overview
Welcome to the Tenstorrent Cookbook! Build 5 complete projects that teach fundamental TT-Metal techniques: Conway's Game of Life, Audio Signal Processing, Mandelbrot Fractals, Image Filters, and Particle Life. Each recipe is a standalone lesson with full source code and visual output.
Recipe 1: Conway's Game of Life
Build Conway's Game of Life using TTNN parallel tile computing. Learn convolution operations, cellular automata, and visual output generation. Includes classic patterns: gliders, blinkers, and the famous Gosper Glider Gun!
Recipe 2: Audio Signal Processing
Build a real-time audio processing pipeline with TTNN. Compute mel-spectrograms, detect beats, extract pitch, and apply creative effects. Foundation for speech recognition models like Whisper!
Recipe 3: Mandelbrot Fractal Explorer
Render beautiful fractals with interactive zoom! Demonstrates GPU-style parallel computation and complex number operations. Perfect for understanding embarrassingly parallel workloads on TT hardware.
Recipe 4: Custom Image Filters
Build a library of creative image filters using 2D convolution. From edge detection to artistic effects - learn the techniques used in ResNet50, MobileNetV2, and ViT models!
Recipe 5: Particle Life Simulator
Simulate emergent complexity from simple particle interactions! Features N² force calculations, multi-species dynamics, and multi-device acceleration for QuietBox systems. Beautiful chaos from simple physics!
Compilers & Frameworks
Image Classification with TT-Forge
Compile a PyTorch MobileNetV2 model for Tenstorrent hardware using forge.compile() — no build required. The forge env is pre-installed: one command to activate, then classify images on real TT silicon.
JAX and PyTorch/XLA on Tenstorrent
Run JAX and PyTorch/XLA computations directly on TT hardware — no install needed. venv-forge ships pjrt_plugin_tt, JAX 0.7.1, and torch-xla pre-installed. Activate, import, and start dispatching tensors to silicon.
Introduction to tt-lang
Write your first tt-lang kernel: a concurrent compute + data-movement program that runs on the Tensix grid. Try it live in the browser via ttlang-sim-lite.
CS Fundamentals
Module 1: RISC-V & Computer Architecture
Von Neumann architecture, fetch-decode-execute cycle, and RISC-V fundamentals. Understand how 880 RISC-V processors work by mastering one.
Module 2: The Memory Hierarchy
Cache locality, bandwidth tradeoffs, and near-memory compute. Experience the memory hierarchy from registers to DRAM and understand why memory is the bottleneck in modern computing.
Module 3: Parallel Computing
Amdahl's Law, SPMD patterns, and data parallelism. Scale from 1 to 880 cores and understand when parallelism helps (and when it doesn't).
Module 4: Networks and Communication
Message passing, network topologies, and routing algorithms. Master the Network-on-Chip that connects 880 cores and understand distributed systems principles on a single chip.
Module 5: Synchronization
Race conditions, barriers, and coordination. Learn explicit synchronization on hardware without cache coherence and understand the challenges of concurrent programming at scale.
Module 6: Abstraction Layers
From Python to machine code. Understand the compilation pipeline, when abstractions help performance, and when they hurt. See the full stack from high-level frameworks to RISC-V silicon.
Module 7: Computational Complexity in Practice
Big-O meets real hardware. See why constants matter, how algorithm-hardware co-design achieves breakthrough performance, and why Flash Attention is "O(n)" in practice. The capstone that ties all modules together.
Advanced
Bounty Program: Model Bring-Up
Learn how to contribute to the Tenstorrent Bounty Program by bringing up new models. Master TT-Metal while becoming part of the open-source ecosystem. Uses the successful Phi-3 contribution as a case study.
Exploring TT-Metalium
Discover what's possible with TT-Metalium! Run TTNN operations immediately, explore the model zoo, and understand the architecture that powers Tenstorrent hardware — from first script to custom kernels.
Deployment
Deploy tt-vscode-toolkit to Koyeb
Deploy your own cloud-based VSCode IDE with the Tenstorrent extension pre-installed. Run on Koyeb with optional N300 hardware access.
Deploy Your Work to Koyeb
Deploy any Python application to Koyeb with Tenstorrent N300 hardware access. Learn production deployment patterns with vLLM and adapt for any application.