Version Compatibility Matrix
Last updated: January 2026
This guide documents validated combinations of hardware, software versions, and configurations for the Tenstorrent ecosystem. Use this to troubleshoot compatibility issues or plan your development environment.
π― Quick Recommendations by Use Case
Just Starting Out (Lessons 1-5)
Hardware: N150 (Wormhole single-chip) tt-metal: Latest from main branch Python: 3.10 (system default on Ubuntu 22.04) Model: Qwen3-0.6B (1.5GB, no HuggingFace token needed)
Production Inference (vLLM)
Hardware: N150/N300/T3K/P100/P150 Deployment: tt-inference-server Docker image (recommended) Alternative: Native installation requires careful version matching
Multi-Chip Development (TT-XLA)
Hardware: N150/N300/T3K/Galaxy Python: 3.11 Installation: Wheel-based (no source build required)
Experimental Compiler (TT-Forge)
Hardware: N150 only (single-chip) Python: 3.11 Build time: 45-60 minutes Requirements: clang-17
π₯οΈ Hardware Configurations
Wormhole Architecture
N150 (Single Chip)
- DRAM: 12GB
- Tensix Cores: 80
- Best for: Development, small models (<2B parameters)
- Recommended models:
- Qwen3-0.6B (0.6B params, 1.5GB) β Primary recommendation
- Gemma 3-1B-IT (1B params, 2GB)
- Llama-3.1-8B-Instruct (8B params, 16GB) β οΈ Tight fit, may exhaust DRAM
- Multi-chip support: No
N300 (Dual Chip)
- DRAM: 24GB (2x 12GB)
- Tensix Cores: 160
- Best for: Medium models (8B parameters)
- Recommended models:
- Llama-3.1-8B-Instruct (8B params, 16GB) β Comfortable
- Qwen3-8B (8B params)
- Multi-chip support: Yes (2 chips)
T3K (8 Chips)
- DRAM: 96GB (8x 12GB)
- Tensix Cores: 640
- Best for: Large models (70B+ parameters)
- Recommended models:
- Llama-3.1-70B
- Large-scale inference workloads
- Multi-chip support: Yes (8 chips)
QuietBox (Wormhole-based)
- Architecture: Wormhole (not Blackhole)
- Configuration: Multi-chip Wormhole system
- Production validation: β Validated for vLLM (Batch 32: 22.1 T/S/U, 707.2 T/S)
- Best for: Production inference deployments
- Reference: Tenstorrent QuietBox
- vLLM compatibility: Fully validated (see README.md performance benchmarks)
Blackhole Architecture
P100 (Single Chip)
- DRAM: ~32GB
- Tensix Cores: 140 (14x10 grid, 13x10 available for compute)
- Best for: Next-generation single-chip performance
- Recommended models:
- Qwen3-0.6B (0.6B params, 1.5GB) β Works great
- Llama-3.1-8B-Instruct (8B params, 16GB) β Comfortable fit
- Enhanced features:
- L1 data cache: 1464 KB with 4x16B cachelines (write-through)
- Enhanced NoC: 64B reads (vs 32B on Wormhole)
- Ethernet: 14 cores with 512KB L1, 2x RISC-V per core
- DRAM: 8 banks with programmable RISC-V, 128KB L1 per bank
- Requirements:
export TT_METAL_ARCH_NAME=blackhole - Multi-chip support: Single chip only
P150 (Configurable: 1, 2, 4, or 8 chips)
- DRAM per chip: ~32GB
- Total DRAM:
- P150 x1: ~32GB (single chip)
- P150 x2: ~64GB (2 chips)
- P150 x4: ~128GB (4 chips)
- P150 x8: ~256GB (8 chips)
- Tensix Cores per chip: 140 (14x10 grid, 13x10 available for compute)
- Best for: Scalable multi-chip deployments (70B+ models)
- Recommended models:
- P150 x1: Llama-3.1-8B β
- P150 x2: Llama-3.1-8B (fast), medium models β
- P150 x4: Llama-3.1-70B β
- P150 x8: 70B+ models, large-scale inference β
- Enhanced features: Same as P100
- Requirements:
export TT_METAL_ARCH_NAME=blackhole - Multi-chip support: Yes (2, 4, or 8 chips via mesh topology)
Galaxy (Multi-Node)
- Configuration: Multiple T3K nodes
- Best for: Massive-scale distributed training/inference
- Multi-chip support: Yes (distributed)
π¦ Software Stack Versions
Core Components (Lessons 1-10)
| Component | Version | Python | Installation Method | Notes |
|---|---|---|---|---|
| tt-metal | Latest (main branch) | 3.10 | Source build | Core low-level API |
| TTNN | Bundled with tt-metal | 3.10 | Included | High-level neural network ops |
| OpenMPI ULFM | 5.0.7 | N/A | System package | Required for all hardware |
| PyTorch | 2.x | 3.10 | pip (in venv) | ML framework |
| Transformers | Latest | 3.10 | pip (in venv) | HuggingFace models |
Environment variables required:
export TT_METAL_HOME=~/tt-metal
export PYTHONPATH=$TT_METAL_HOME:$PYTHONPATH
export LD_LIBRARY_PATH=/opt/openmpi-v5.0.7-ulfm/lib:$LD_LIBRARY_PATH
export MESH_DEVICE=N150 # or N300, T3K, P100, P150, GALAXY
vLLM Production Inference (Lesson 7)
| Deployment Method | Hardware | Status | Notes |
|---|---|---|---|
| tt-inference-server (Docker) | N150/N300/T3K/P100/P150 | β Recommended | Pre-validated configurations |
| Native installation | N150/N300/T3K | β οΈ Advanced | Version compatibility challenges |
Docker method (validated):
# Uses pre-built image with matched versions
# See Lesson 6 for tt-inference-server
# See Lesson 7 for manual vLLM setup
Native installation compatibility matrix:
| Hardware | tt-metal | vLLM | Status | Notes |
|---|---|---|---|---|
| N150 | Latest (main) | Docker image | β Validated | Use Docker |
| N150 | Specific commits | Native build | β οΈ Complex | Requires model_specs_output.json matching |
| N300+ | Latest (main) | Docker image | β Validated | Use Docker |
Known issues with native installation:
- PyTorch type hint incompatibilities
- vLLM version mismatches with tt-metal changes
- Complex dependency chains
- Recommendation: Use Docker unless you have specific requirements for native installation
Environment variables (vLLM):
export VLLM_TARGET_DEVICE=tt
export VLLM_CONFIGURE_LOGGING=1
export VLLM_RPC_TIMEOUT=900000
# For Blackhole (P100/P150):
export TT_METAL_ARCH_NAME=blackhole
TT-XLA JAX Compiler (Lesson 12)
| Component | Version | Python | Installation Method | Hardware Support |
|---|---|---|---|---|
| TT-XLA | Latest wheel | 3.11 | pip (wheel) | N150/N300/T3K/Galaxy |
| JAX | 0.7.1+ | 3.11 | pip | Required dependency |
| tt-forge | Cloned for demos | 3.11 | git clone | Demo code only |
Status: β Production-ready for multi-chip
Installation:
# Python 3.11 required
python3.11 -m venv ~/tt-xla-venv
source ~/tt-xla-venv/bin/activate
pip install pjrt-plugin-tt --pre --upgrade --extra-index-url https://pypi.eng.aws.tenstorrent.com/
Environment isolation (CRITICAL):
# MUST unset tt-metal variables
unset TT_METAL_HOME
unset LD_LIBRARY_PATH
export PYTHONPATH=~/tt-forge:$PYTHONPATH # For demo imports only
Use the helper script:
source ~/tt-scratchpad/setup-tt-xla.sh
TT-Forge MLIR Compiler (Lesson 11)
| Component | Version | Python | Installation Method | Hardware Support |
|---|---|---|---|---|
| TT-Forge | Source build | 3.11 | cmake (45-60 min) | N150 only |
| clang | 17 | N/A | apt | Required compiler |
| LLVM | Built from submodule | N/A | cmake | 6719 targets (~40 min) |
| JAX | 0.7.1 | 3.11 | pip | Build dependency |
Status: β οΈ Experimental (as of December 2025)
Build requirements:
# Install prerequisites
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt-get update
sudo apt-get install -y python3.11 python3.11-venv python3.11-dev clang-17
# Create compiler symlinks (CRITICAL)
sudo update-alternatives --install /usr/bin/clang clang /usr/bin/clang-17 100
sudo update-alternatives --install /usr/bin/clang++ clang++ /usr/bin/clang++-17 100
Build time: 45-60 minutes (LLVM compilation is slow)
Environment setup (CRITICAL):
# MUST unset tt-metal variables
unset TT_METAL_HOME
unset TT_METAL_VERSION
# MUST use absolute paths (CMake doesn't expand ~)
export TTFORGE_TOOLCHAIN_DIR=/home/$USER/ttforge-toolchain
export TTMLIR_TOOLCHAIN_DIR=/home/$USER/ttmlir-toolchain
export TTFORGE_PYTHON_VERSION=python3.11
export CC=/usr/bin/clang-17
export CXX=/usr/bin/clang++-17
Use the helper script:
source ~/tt-scratchpad/setup-tt-forge.sh
Model support: 169 validated models in tt-forge-models repository
- MobileNetV1/V2/V3 (β Recommended starting point)
- ResNet variants (β Validated)
- Some BERT models (β οΈ Check repository)
Stable Diffusion 3.5 (Lesson 9)
| Component | Hardware | Status | Notes |
|---|---|---|---|
| SD 3.5 Large | N150/N300/T3K/P100 | β Validated | 1024x1024 generation |
| Generation time | N150 | ~2-3 minutes | First run, includes model load |
| Environment | Standard tt-metal | β Works | No special setup needed |
No special version requirements - uses standard tt-metal environment.
π§ Environment Variable Reference
Always Required (Lessons 1-10)
# Point to tt-metal installation
export TT_METAL_HOME=~/tt-metal
# Add tt-metal to Python import path
export PYTHONPATH=$TT_METAL_HOME:$PYTHONPATH
# Add OpenMPI libraries (CRITICAL - #1 most common error)
export LD_LIBRARY_PATH=/opt/openmpi-v5.0.7-ulfm/lib:$LD_LIBRARY_PATH
# Specify hardware type
export MESH_DEVICE=N150 # or N300, T3K, P100, P150, GALAXY
Hardware-Specific
For Blackhole chips (P100/P150):
export TT_METAL_ARCH_NAME=blackhole
Application-Specific
vLLM:
export VLLM_TARGET_DEVICE=tt
export VLLM_CONFIGURE_LOGGING=1
export VLLM_RPC_TIMEOUT=900000
Stable Diffusion (non-interactive):
export NO_PROMPT=1
TT-XLA (isolation required):
unset TT_METAL_HOME
unset LD_LIBRARY_PATH
export PYTHONPATH=~/tt-forge:$PYTHONPATH
TT-Forge (isolation required):
unset TT_METAL_HOME
unset TT_METAL_VERSION
export TTFORGE_TOOLCHAIN_DIR=/home/$USER/ttforge-toolchain
export TTMLIR_TOOLCHAIN_DIR=/home/$USER/ttmlir-toolchain
export TTFORGE_PYTHON_VERSION=python3.11
export CC=/usr/bin/clang-17
export CXX=/usr/bin/clang++-17
π Common Compatibility Issues
Issue 1: "undefined symbol: MPIX_Comm_revoke"
Error:
ImportError: /home/user/tt-metal/build/tt_metal/libtt_metal.so: undefined symbol: MPIX_Comm_revoke
Cause: OpenMPI library path not set
Fix:
export LD_LIBRARY_PATH=/opt/openmpi-v5.0.7-ulfm/lib:$LD_LIBRARY_PATH
Prevalence: #1 most common error in cloud environments
Make permanent:
echo 'export LD_LIBRARY_PATH=/opt/openmpi-v5.0.7-ulfm/lib:$LD_LIBRARY_PATH' >> ~/.bashrc
Issue 2: vLLM Version Mismatch
Error:
TypeError: block_size has unsupported type list[int]
Cause: PyTorch/vLLM type hint incompatibility
Fix: Use Docker image (validated configuration)
# See Lesson 6 for tt-inference-server Docker setup
# See Lesson 7 for manual Docker approach
Alternative: Match specific tt-metal and vLLM commits via model_specs_output.json (advanced)
Issue 3: TT-Forge Import Failure
Error:
ImportError: /path/to/libTTMLIRRuntime.so: undefined symbol: _ZN4ttnn...
Cause: Environment variable pollution (TT_METAL_HOME conflicts)
Fix:
source ~/tt-scratchpad/setup-tt-forge.sh
The script automatically unsets conflicting variables.
Issue 4: CMake Build Errors (TT-Forge)
Error:
CMake Error: CMAKE_C_COMPILER: clang not found
Cause: Compiler symlinks not created
Fix:
sudo update-alternatives --install /usr/bin/clang clang /usr/bin/clang-17 100
sudo update-alternatives --install /usr/bin/clang++ clang++ /usr/bin/clang++-17 100
Issue 5: JAX 0.7.1 Not Found
Error:
ERROR: Could not find a version that satisfies the requirement jax==0.7.1
ERROR: Ignored the following versions that require a different python version: ... Requires-Python >=3.11
Cause: Python version mismatch (needs 3.11)
Fix:
# Install Python 3.11
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt-get update
sudo apt-get install -y python3.11 python3.11-venv python3.11-dev
# Create venv with 3.11
python3.11 -m venv ~/tt-xla-venv
source ~/tt-xla-venv/bin/activate
π Model Size vs Hardware Matrix
Recommended Model-Hardware Combinations
| Model | Parameters | Disk Size | N150 (12GB) | N300 (24GB) | T3K (96GB) |
|---|---|---|---|---|---|
| Qwen3-0.6B | 0.6B | 1.5GB | β Perfect | β Excellent | β Excellent |
| Gemma 3-1B-IT | 1B | 2GB | β Good | β Excellent | β Excellent |
| Llama-3.1-8B | 8B | 16GB | β οΈ Tight | β Good | β Excellent |
| Qwen3-8B | 8B | 16GB | β οΈ Tight | β Good | β Excellent |
| Llama-3.1-70B | 70B | 140GB | β Too large | β Too large | β Good |
Legend:
- β Perfect: Fast, reliable, recommended
- β Good: Works well, stable
- β οΈ Tight: May work but can exhaust DRAM under load
- β Too large: Model won't fit
N150 recommendation: Start with Qwen3-0.6B (0.6B parameters, 1.5GB)
- Ultra-lightweight (13x smaller than Llama-3.1-8B)
- No HuggingFace token needed (ungated)
- Dual thinking modes (reasoning-capable)
- Perfect for learning and many production use cases
π Learning Path Recommendations
Path 1: Beginner (First Time with Tenstorrent)
- Hardware: N150
- Start with: Lessons 1-5 (Direct tt-metal API)
- Model: Qwen3-0.6B
- Time to first inference: ~30 minutes
- Environment: Standard tt-metal (Python 3.10)
Path 2: Production Deployment
- Hardware: N150/N300/T3K depending on model size
- Start with: Lessons 1-5 (understand the stack)
- Then: Lesson 6 (tt-inference-server Docker)
- Model: Match to hardware capacity
- Environment: Docker (validated configurations)
Path 3: Model Developer
- Hardware: N150 (development), scale up for testing
- Start with: Lessons 1-5 (foundation)
- Then: Lesson 13 (Bounty Program contribution workflow)
- Model: Bring your own architecture
- Environment: Standard tt-metal + git workflow
Path 4: Compiler Explorer
- Hardware: N150 (single-chip)
- Start with: Lessons 1-5 (baseline understanding)
- Then: Lesson 12 (TT-XLA, production-ready)
- Optional: Lesson 11 (TT-Forge, experimental)
- Environment: Isolated (separate Python 3.11 venvs)
π Validation Status by Lesson
| Lesson | Hardware Tested | Status | Notes |
|---|---|---|---|
| 1-5 | N150 | β Validated | Zero issues after install_dependencies.sh |
| 6 | N150 | β Validated | tt-inference-server Docker |
| 7 | N150 | β οΈ Docker recommended | Native install has version challenges |
| 8 | N150 | β Validated | VSCode chat integration |
| 9 | N150 | β Validated | Stable Diffusion 3.5, ~2.5 min generation |
| 10 | N150 | β Validated | Coding assistant |
| 11 | N150 | β οΈ Experimental | TT-Forge 45-60 min build, limited model support |
| 12 | N150 | β Validated | TT-XLA wheel install, GPT-2 XL working |
| 13-14 | N150 | π Documentation | Bounty program, RISC-V programming |
| 15 | N150 | β Validated | TT-Metalium cookbook projects |
π Additional Resources
Official Documentation:
- tt-metal: https://github.com/tenstorrent/tt-metal
- tt-inference-server: https://github.com/tenstorrent/tt-inference-server
- TT-XLA: https://github.com/tenstorrent/tt-xla
- TT-Forge: https://github.com/tenstorrent/tt-forge-fe
Community:
- Discord: https://discord.gg/tenstorrent
- Model contributions: Lesson 13 (Bounty Program)
Troubleshooting:
- FAQ page (in this extension)
- Step Zero guide (comprehensive tech stack explanation)
- Lesson-specific debugging sections
Remember: When in doubt, start with the recommended path for your hardware. The most reliable configurations are thoroughly documented in Lessons 1-5, which work on all hardware.