Tenstorrent Developer Extension - FAQ
Frequently Asked Questions - Your quick reference for common questions, troubleshooting, and tips from all lessons.
Table of Contents
- Getting Started
- Environment Reference
- Remote Development & SSH
- Hardware & Detection
- Installation & Setup
- Models & Downloads
- Inference & Serving
- Custom Training
- Compilers & Tools
- Troubleshooting
- Performance & Optimization
- Community & Support
Getting Started
Q: Which lesson should I start with?
A: Start with tt-installer if you're on a fresh system, or Hardware Detection if drivers are already installed. Lessons are organized into 9 categories:
🚀 Your First Inference (7 lessons) tt-installer → Hardware Detection → Verify Installation → Download Model → Interactive Chat → API Server → Build tt-metal
🏭 Serving Models (4 lessons) Production servers (TT-Inference-Server, vLLM) and generation (Image, Video)
🎓 Custom Training (8 lessons) Fine-tune models or train from scratch with tt-train and NanoGPT
🎯 Applications (5 lessons) Coding Assistant, AnimateDiff, TT-QuietBox 2 OpenClaw Assistant, TT-QuietBox 2 Video, TT-QuietBox 2 Local Agents
👨🍳 Tenstorrent Cookbook (6 lessons) Game of Life, Audio, Mandelbrot, Image Filters, Particle Life + Overview
🔧 Compilers & Tools (3 lessons) TT-Forge™, TT-XLA, TT-Lang
🧠 CS Fundamentals (7 lessons) Computer Architecture, Memory, Parallelism, Networks, Synchronization, Abstraction, Complexity
🎓 Advanced Topics (2 lessons) Bounty Program, Explore Metalium
☁️ Deployment (2 lessons) Deploy to Koyeb, Deploy VSCode to Koyeb
Can I skip lessons? Yes — categories are independent. See Environment Reference if a command fails because something isn't set up.
Q: Do I need to complete lessons in order?
A: Not strictly, but:
- Hardware Detection, Verify Installation, and Download Model are foundational - most later lessons assume you've done these
- Interactive Chat through Image Generation build on each other but can be done selectively
- Advanced topics (compilers, RISC-V, bounty program) are more independent
Quick start for experienced users:
- Run Hardware Detection (2 minutes - verify hardware)
- Skip to Production Inference with vLLM (production serving)
- Explore advanced topics (compilers, RISC-V, bounty program)
Q: What's the difference between the different tools?
A: Tenstorrent has several tools serving different purposes:
| Tool | Purpose | When to Use | Maturity |
|---|---|---|---|
| TT-Metalium™ | Low-level framework | Custom kernels, maximum control | Stable |
| vLLM | LLM serving | Production LLM deployment | Production |
| TT-Forge | MLIR compiler | PyTorch models (experimental) | Beta |
| TT-XLA | XLA compiler | JAX/PyTorch (production) | Production |
Simple guide:
- Need to run LLMs? → Production Inference with vLLM
- Want to experiment with PyTorch? → Image Classification with TT-Forge
- Need JAX support? → JAX Inference with TT-XLA
- Building custom kernels? → TT-Metalium (Hardware Detection, Verify Installation, Download Model, RISC-V Programming)
Environment Reference
Jumped to a lesson directly and a command failed? This section maps out every path, venv, and environment variable the lessons assume. Bookmark it.
Q: What is ~/tt-scratchpad and do I need to create it?
A: ~/tt-scratchpad is a working directory the extension creates when you run commands inside VS Code. If you're following lessons on the web site or running commands manually in a terminal, it won't exist yet. Create it yourself:
mkdir -p ~/tt-scratchpad
Most lessons that use it also create subdirectories (e.g. ~/tt-scratchpad/cookbook/mandelbrot/). The mkdir -p in each command handles that — so creating the top-level directory is enough.
Q: Which Python virtual environment do I activate for which lesson?
A: Three environments exist on a typical tt-installer system. Pick the one that matches what you're doing:
| What you're doing | Activate this |
|---|---|
| TT-NN / direct API / TT-Metalium examples / custom training | source ~/tt-metal/python_env/bin/activate |
| vLLM serving | source ~/tt-metal/build/python_env_vllm/bin/activate |
| TT-Forge / TT-XLA / JAX | source ~/tt-forge-venv/bin/activate |
TT-QuietBox 2 / tt-installer container environments: These may be pre-activated via /etc/profile.d/. Check what's active with which python3 before activating another venv.
Can't find a venv?
# Check what exists
ls ~/tt-metal/python_env/bin/activate 2>/dev/null && echo "✓ tt-metal venv"
ls ~/tt-metal/build/python_env_vllm/bin/activate 2>/dev/null && echo "✓ vLLM venv"
ls ~/tt-forge-venv/bin/activate 2>/dev/null && echo "✓ Forge/XLA venv"
If ~/tt-metal/python_env doesn't exist, you need to build tt-metal first → Build tt-metal from Source.
If ~/tt-forge-venv doesn't exist, check /opt/venv-forge:
# /opt/venv-forge exists but ~/tt-forge-venv symlink is missing:
ln -s /opt/venv-forge ~/tt-forge-venv
Q: What is TT_METAL_HOME and when do I need it?
A: TT_METAL_HOME points to your tt-metal source checkout. It is only needed for the Direct API lessons (interactive-chat, api-server, custom training, video generation). It is not needed for vLLM, tt-inference-server, TT-Forge, or TT-XLA.
Set it once per terminal session:
export TT_METAL_HOME=~/tt-metal
export PYTHONPATH=$TT_METAL_HOME/build_Release:$PYTHONPATH
export LD_LIBRARY_PATH=$TT_METAL_HOME/build/lib:$LD_LIBRARY_PATH
Add to ~/.bashrc if you use Direct API regularly:
echo 'export TT_METAL_HOME=~/tt-metal' >> ~/.bashrc
TT-QuietBox 2 users: ~/tt-metal does not exist on TT-QuietBox 2 pre-configured images. Use TT-Inference-Server or vLLM instead. If you specifically need the Direct API, run Build TT-Metalium from Source first.
Forge/XLA users: Unset TT_METAL_HOME before activating venv-forge — leaving it set causes conflicts:
unset TT_METAL_HOME
source ~/tt-forge-venv/bin/activate
Q: Where do models live and why do lessons reference ~/models/?
A: ~/models/ is the conventional location all lessons use. It isn't created automatically — the hf download --local-dir flag creates it on first use.
# Standard layout assumed by all lessons:
~/models/
Qwen3-0.6B/ # HuggingFace format (for vLLM, tt-inference-server)
Qwen3-8B/
Llama-3.1-8B-Instruct/ # HuggingFace format (for vLLM, tt-inference-server)
original/ # Meta format subdirectory (for Direct API / Generator API lessons)
If your models are somewhere else, substitute your path in any --model or --local-dir flag. There is nothing special about ~/models/ — it is just a convention.
Check what you have:
ls ~/models/ 2>/dev/null || echo "No ~/models/ directory yet"
du -sh ~/models/* 2>/dev/null
Q: What Ubuntu version do I need?
A:
| Version | Status |
|---|---|
| Ubuntu 22.04 LTS | ✅ Most tested — preferred by Tenstorrent for stability |
| Ubuntu 24.04 LTS | ✅ Supported — TT-QuietBox 2 ships with 24.04 |
| Ubuntu 20.04 LTS | ⚠️ Deprecated — Metalium cannot be installed |
Most Docker images in lessons are tagged ubuntu-22.04-amd64. They run fine on a 24.04 host — the Ubuntu version in the tag refers to the image, not your host OS.
Check your host:
lsb_release -rs
Q: A lesson says "from Lesson N" or "see Lesson 7" — what lesson is that?
A: Old numbered references map to these lesson IDs:
| "Lesson N" reference | Lesson ID |
|---|---|
| Lesson 1 | hardware-detection |
| Lesson 2 | verify-installation |
| Lesson 3 | download-model |
| Lesson 4 | interactive-chat |
| Lesson 5 | api-server |
| Lesson 6 | tt-inference-server |
| Lesson 7 | vllm-production |
| Lesson 8 | (VSCode Chat — retired) |
| Lesson 9 | image-generation |
Environment Reference
Jumped to a lesson directly and a command failed? This section maps out every path, venv, and environment variable the lessons assume. Bookmark it.
Q: What is ~/tt-scratchpad and do I need to create it?
A: ~/tt-scratchpad is a working directory the extension creates when you run commands inside VS Code. If you're following lessons on the web site or running commands manually in a terminal, it won't exist yet. Create it yourself:
mkdir -p ~/tt-scratchpad
Most lessons that use it also create subdirectories (e.g. ~/tt-scratchpad/cookbook/mandelbrot/). The mkdir -p in each command handles that — so creating the top-level directory is enough.
Q: Which Python virtual environment do I activate for which lesson?
A: Three environments exist on a typical tt-installer system. Pick the one that matches what you're doing:
| What you're doing | Activate this |
|---|---|
| TT-NN / direct API / TT-Metalium examples / custom training | source ~/tt-metal/python_env/bin/activate |
| vLLM serving | source ~/tt-metal/build/python_env_vllm/bin/activate |
| TT-Forge / TT-XLA / JAX | source ~/tt-forge-venv/bin/activate |
QB2 / tt-installer container environments: These may be pre-activated via /etc/profile.d/. Check what's active with which python3 before activating another venv.
Can't find a venv?
# Check what exists
ls ~/tt-metal/python_env/bin/activate 2>/dev/null && echo "✓ tt-metal venv"
ls ~/tt-metal/build/python_env_vllm/bin/activate 2>/dev/null && echo "✓ vLLM venv"
ls ~/tt-forge-venv/bin/activate 2>/dev/null && echo "✓ Forge/XLA venv"
If ~/tt-metal/python_env doesn't exist, you need to build tt-metal first → Build tt-metal from Source.
If ~/tt-forge-venv doesn't exist, check /opt/venv-forge:
# /opt/venv-forge exists but ~/tt-forge-venv symlink is missing:
ln -s /opt/venv-forge ~/tt-forge-venv
Q: What is TT_METAL_HOME and when do I need it?
A: TT_METAL_HOME points to your tt-metal source checkout. It is only needed for the Direct API lessons (interactive-chat, api-server, custom training, video generation). It is not needed for vLLM, tt-inference-server, TT-Forge, or TT-XLA.
Set it once per terminal session:
export TT_METAL_HOME=~/tt-metal
export PYTHONPATH=$TT_METAL_HOME/build_Release:$PYTHONPATH
export LD_LIBRARY_PATH=$TT_METAL_HOME/build/lib:$LD_LIBRARY_PATH
Add to ~/.bashrc if you use Direct API regularly:
echo 'export TT_METAL_HOME=~/tt-metal' >> ~/.bashrc
TT-QuietBox 2 users: ~/tt-metal does not exist on TT-QuietBox 2 pre-configured images. Use TT-Inference-Server or vLLM instead. If you specifically need the Direct API, run Build TT-Metalium from Source first.
Forge/XLA users: Unset TT_METAL_HOME before activating venv-forge — leaving it set causes conflicts:
unset TT_METAL_HOME
source ~/tt-forge-venv/bin/activate
Q: Where do models live and why do lessons reference ~/models/?
A: ~/models/ is the conventional location all lessons use. It isn't created automatically — the hf download --local-dir flag creates it on first use.
# Standard layout assumed by all lessons:
~/models/
Qwen3-0.6B/ # HuggingFace format (for vLLM, tt-inference-server)
Qwen3-8B/
Llama-3.1-8B-Instruct/ # HuggingFace format (for vLLM, tt-inference-server)
original/ # Meta format subdirectory (for Direct API / Generator API lessons)
If your models are somewhere else, substitute your path in any --model or --local-dir flag. There is nothing special about ~/models/ — it is just a convention.
Check what you have:
ls ~/models/ 2>/dev/null || echo "No ~/models/ directory yet"
du -sh ~/models/* 2>/dev/null
Q: What Ubuntu version do I need?
A:
| Version | Status |
|---|---|
| Ubuntu 22.04 LTS | ✅ Most tested — preferred by Tenstorrent for stability |
| Ubuntu 24.04 LTS | ✅ Supported — QB2 ships with 24.04 |
| Ubuntu 20.04 LTS | ⚠️ Deprecated — Metalium cannot be installed |
Most Docker images in lessons are tagged ubuntu-22.04-amd64. They run fine on a 24.04 host — the Ubuntu version in the tag refers to the image, not your host OS.
Check your host:
lsb_release -rs
Q: A lesson says "from Lesson N" or "see Lesson 7" — what lesson is that?
A: Old numbered references map to these lesson IDs:
| "Lesson N" reference | Lesson ID |
|---|---|
| Lesson 1 | hardware-detection |
| Lesson 2 | verify-installation |
| Lesson 3 | download-model |
| Lesson 4 | interactive-chat |
| Lesson 5 | api-server |
| Lesson 6 | tt-inference-server |
| Lesson 7 | vllm-production |
| Lesson 8 | (VSCode Chat — retired) |
| Lesson 9 | image-generation |
Remote Development & SSH
Q: Can I use this extension from my Mac/Windows laptop to access remote Tenstorrent hardware?
A: Yes! Use VSCode's Remote-SSH extension - the industry-standard solution for remote development.
This is the recommended approach for:
- Developing on macOS/Windows while hardware is on Linux
- Working from laptop with hardware in datacenter/cloud
- Team development with shared hardware resources
Why Remote-SSH is perfect for this:
- ✅ Zero extension changes needed - Everything "just works"
- ✅ Transparent experience - Feels like local development
- ✅ All features work - Terminal commands, file operations, debugging
- ✅ Battle-tested - Used by millions of developers daily
Q: How do I set up Remote-SSH for Tenstorrent development?
A: Quick setup guide:
Step 1: Install Remote-SSH extension
- Open VSCode on your local machine (Mac/Windows)
- Open Extensions panel (
Cmd+Shift+XorCtrl+Shift+X) - Search for "Remote - SSH"
- Install the official Microsoft extension
Step 2: Configure SSH connection
Add your Tenstorrent machine to SSH config:
# On your local machine, edit ~/.ssh/config
# (Cmd+Shift+P → "Remote-SSH: Open Configuration File")
Host tenstorrent-dev
HostName 192.168.1.100 # Your hardware machine IP
User ubuntu # Your username
IdentityFile ~/.ssh/id_rsa # Your SSH key
ForwardAgent yes # Optional: Forward SSH agent
Step 3: Connect to remote machine
Cmd+Shift+P(orCtrl+Shift+P) → "Remote-SSH: Connect to Host"- Select "tenstorrent-dev"
- New VSCode window opens connected to remote machine
Step 4: Install Tenstorrent extension on remote
- In the remote VSCode window, go to Extensions
- Search for "Tenstorrent Developer Extension"
- Click "Install in SSH: tenstorrent-dev"
Step 5: Start using lessons!
- All terminal commands run on remote machine
- All file operations work on remote filesystem
- Hardware detection works automatically
- Models download to remote machine
Q: Do the lessons work through Remote-SSH?
A: Yes, perfectly! Remote-SSH makes everything transparent:
What works automatically:
- ✅ All terminal commands run on remote machine
- ✅ File operations (
Read,Write,Edit) work on remote filesystem - ✅ Hardware detection (
tt-smi) works - ✅ Model downloads go to remote machine
- ✅ Inference runs on remote hardware
- ✅ Port forwarding automatic (access servers on localhost)
Example workflow:
- Connect via Remote-SSH from your Mac
- Open Tenstorrent walkthrough (works like local)
- Run Hardware Detection →
tt-smiruns on remote - Download model → Saves to remote
~/models/ - Start vLLM server → Runs on remote, port auto-forwarded
- Test from local browser →
http://localhost:8000works!
No code changes needed - The extension doesn't know or care that you're remote!
Q: What about SSH without Remote-SSH extension?
A: Not recommended. Manual SSH has major problems:
❌ File operations break - Extension reads/writes local filesystem, not remote
❌ Path mismatches - ~/models/ on Mac ≠ ~/models/ on remote
❌ Complex escaping - Terminal commands get mangled through SSH
❌ No port forwarding - Can't access servers on localhost
❌ Poor UX - Feels disconnected, hard to debug
Example of problems:
If you manually SSH in terminal:
# This command in lesson creates file on your MAC, not remote!
cat > ~/tt-scratchpad/script.py << 'EOF'
...
EOF
Then this fails because the file is on the wrong machine:
ssh user@remote python3 ~/tt-scratchpad/script.py
With Remote-SSH: Both operations happen on remote automatically.
Q: Can multiple people share the same remote hardware?
A: Yes, but with considerations:
Shared hardware works best with:
- ✅ Resource coordination - Don't run multiple large models simultaneously
- ✅ User directories - Each user has own
~/models/,~/tt-scratchpad/ - ✅ Port management - Use different ports (8000, 8001, 8002...)
- ✅ Communication - Team chat to coordinate who's using hardware
Limitations:
- ⚠️ Only one model can load on device at a time
- ⚠️ Large models need device reset between users
- ⚠️
/dev/shmshared memory might need cleanup
Best practice for teams:
# User 1
vllm ... --port 8001
# User 2
vllm ... --port 8002
# Each user accesses their own server
curl http://localhost:8001/... # User 1
curl http://localhost:8002/... # User 2
Q: What about Tenstorrent Cloud? Does Remote-SSH work?
A: Yes! Tenstorrent Cloud instances are perfect for Remote-SSH:
Typical setup:
- Get Tenstorrent Cloud instance (pre-configured with hardware)
- Receive SSH credentials
- Add to
~/.ssh/configon your laptop - Connect via Remote-SSH
- Start developing!
Cloud benefits:
- ✅ Pre-installed TT-Metalium and drivers
- ✅ Pre-configured environment
- ✅ No hardware setup needed
- ✅ Access from anywhere
Example cloud SSH config:
Host tt-cloud
HostName cloud.instance.tenstorrent.com
User your-username
IdentityFile ~/.ssh/tt-cloud-key
ForwardAgent yes
Q: Are there performance considerations with Remote-SSH?
A: Remote-SSH is very efficient:
Fast operations (no noticeable latency):
- Terminal commands (SSH is fast)
- File editing (only changes sync)
- Running inference (happens on remote)
- Model downloads (direct from HuggingFace to remote)
What uses bandwidth:
- File tree indexing (one-time)
- Large file transfers (if you copy files between machines)
- Extension updates (rare)
Best practices:
- ✅ Use wired connection or good WiFi
- ✅ Keep large models on remote (don't transfer)
- ✅ Use compression in SSH config:
Compression yes
Real-world experience:
- Feels instant on good connection (10+ Mbps)
- Usable on moderate connection (1-5 Mbps)
- Not recommended on very slow connections (<1 Mbps)
Q: How do I disconnect from remote machine?
A: Several options:
Graceful disconnect:
- Close the remote VSCode window
- Connection closes, remote processes continue running
From command palette:
Cmd+Shift+P→ "Remote-SSH: Close Remote Connection"
Important: vLLM servers keep running after disconnect!
# Before disconnecting, you may want to:
docker ps # Note container IDs
docker stop <container-id> # Stop servers
# Or leave them running and reconnect later
Reconnecting:
- Just repeat: "Remote-SSH: Connect to Host" → Select your host
- Everything exactly as you left it
Hardware & Detection
Q: Can I try Tenstorrent development without hardware?
A: Yes! Use ttsim - Tenstorrent's full-system simulator.
What is ttsim:
- Virtual Wormhole™ or Blackhole® device that runs on any Linux/x86_64 system
- No physical hardware needed
- Slower than silicon but fast enough for learning and experimentation
- Perfect for exploring before purchasing hardware
Quick Start:
# Download simulator (replace vX.Y with latest version)
mkdir -p ~/sim
cd ~/sim
wget https://github.com/tenstorrent/ttsim/releases/latest/download/libttsim_wh.so
# Copy SOC descriptor
cp $TT_METAL_HOME/tt_metal/soc_descriptors/wormhole_b0_80_arch.yaml ~/sim/soc_descriptor.yaml
# Set environment variable
export TT_METAL_SIMULATOR=~/sim/libttsim_wh.so
# Run in slow dispatch mode (required for simulator)
export TT_METAL_SLOW_DISPATCH_MODE=1
# Test it works
cd $TT_METAL_HOME
./build/programming_examples/metal_example_add_2_integers_in_riscv
What you CAN do with ttsim:
- ✅ Learn TT-Metalium programming model
- ✅ Run programming examples and tests
- ✅ Develop and debug kernels
- ✅ Test TT-NN™ operations
- ✅ Explore Tenstorrent architecture
What you CAN'T do (too slow):
- ❌ Run full model inference (vLLM, large models)
- ❌ Production workloads
- ❌ Performance benchmarking
- ❌ Real-time applications
Which lessons work with ttsim:
- Hardware Detection: Partial support -
ttnnworks,tt-smiwon't detect simulated device - Verify Installation: Yes - programming examples work great
- RISC-V Programming: Yes - perfect for learning low-level programming
- Model Inference lessons: No - too slow for practical use (Interactive Chat through Image Generation)
- Compiler lessons: Limited - depends on workload (TT-Forge, TT-XLA)
Resources:
- GitHub: https://github.com/tenstorrent/ttsim
- Releases: https://github.com/tenstorrent/ttsim/releases/latest
Tip: Use ttsim for learning and kernel development, then move to real hardware for model inference and production workloads.
Q: Which hardware do I have?
A: Run this command:
tt-smi -s | grep -o '"board_type": "[^"]*"'
Output tells you:
- n150 - Single Wormhole chip (development, 64K context)
- n300 - Dual Wormhole chips (128K context, TP=2)
- T3000 - Eight Wormhole chips (large models, TP=8)
- p100 - Single Blackhole chip (newer architecture)
- p150 - Dual Blackhole chips (TP=2)
Q: tt-smi says "No devices found" - what do I do?
A: Try these steps in order:
Check PCIe detection:
lspci | grep -i tenstorrentShould show:
Processing accelerators: Tenstorrent Inc.Try with sudo:
sudo tt-smiIf this works, you have a permissions issue.
Reset the device:
tt-smi -rFull cleanup (if still failing):
sudo pkill -9 -f tt-metal sudo pkill -9 -f vllm sudo rm -rf /dev/shm/tenstorrent* /dev/shm/tt_* tt-smi -r
Still not working? Check the Hardware Detection lesson troubleshooting section for detailed steps.
Q: What's the difference between Wormhole and Blackhole?
A:
- Wormhole (n150, n300, T3000) - 2nd generation, well-validated, most models tested
- Blackhole (p100, p150) - Latest generation, newer architecture, some experimental models
For production: Stick with Wormhole (n150/n300/T3000) - more models validated.
For experimentation: Blackhole offers newer features but check model compatibility.
Q: How do I know what my hardware can run?
A: Quick reference:
| Hardware | Max Model Size | Max Context | Multi-chip | Best For |
|---|---|---|---|---|
| n150, p100 | 8B | 64K | No (TP=1) | Development, prototyping |
| n300, p150 | 13B | 128K | Yes (TP=2) | Medium models, multi-user |
| T3000 | 70B+ | 128K | Yes (TP=8) | Large models, production |
Q: What happens to running jobs and hardware utilization when a system suspends?
A: When the system goes into suspend, all running jobs on Tenstorrent hardware are interrupted and effectively terminated, and hardware utilization drops to zero. On resume, the driver re-initializes the device (similar to a reset), so any workloads must be restarted. In normal cases you don't need a full reboot; if the device doesn't come back cleanly, run tt-smi -r (reset) or reboot the host.
Q: After a Linux kernel update, my Tenstorrent device is not detected or TT-Inference-Server reports a pre-release driver version.
A: The kernel module driver (tt-kmd) must be compiled specifically for each kernel version. This normally happens automatically via DKMS when a new kernel is installed, but can silently fail if there are orphaned DKMS entries left behind from old driver versions.
Quick fix:
sudo dkms install tenstorrent/$(dkms status tenstorrent | grep -o '[0-9]\+\.[0-9]\+\.[0-9]\+' | sort -V | tail -1) -k $(uname -r)
sudo modprobe tenstorrent
modinfo tenstorrent | grep version # Confirm correct version loaded
If dkms install fails with "Could not locate dkms.conf", you have orphaned entries from old driver versions that are blocking the auto-build. Clean them up:
# Identify broken entries (any version that prints an error instead of a status line)
dkms status
# Manually remove each broken version
sudo rm -rf /var/lib/dkms/tenstorrent/<broken-version>
# Then retry the install above
Why this happens: When old tt-kmd versions are superseded, their DKMS source directories are sometimes removed without de-registering them from the DKMS registry. These orphaned entries cause dkms autoinstall to abort with an error before it reaches the valid driver version, so the new kernel boots without the module.
After cleanup, kernel upgrades will rebuild the module automatically — no manual intervention needed on future upgrades.
Installation & Setup
Q: How do I verify TT-Metalium is working?
A: Run this quick test:
python3 -c "import ttnn; print('✓ tt-metal ready')"
If it fails:
- Check
PYTHONPATHincludes TT-Metalium directory - Verify TT-Metalium is built:
ls ~/tt-metal/build/lib - Rebuild if needed:
cd ~/tt-metal && ./build_metal.sh
Q: Which Python version do I need?
A:
- Minimum: Python 3.9
- Recommended: Python 3.10+
- For TT-Forge: Python 3.11+ (requirement)
Check your version:
python3 --version
Q: Where should models be installed?
A: Standard locations:
Recommended:
~/models/[model-name]/- Example:
~/models/Llama-3.1-8B-Instruct/ - Used by most lessons
- Example:
HuggingFace cache:
~/.cache/huggingface/hub/- Automatic when using
hf download - Takes more disk space (keeps multiple versions)
- Automatic when using
Both formats needed for some lessons:
- Meta format:
~/models/[model]/original/(for Lessons 3-5) - HuggingFace format:
~/models/[model]/(for Lessons 6-9)
Q: How much disk space do I need?
A: Plan for:
- TT-Metalium: ~5GB (source + build artifacts)
- vLLM: ~20GB (including dependencies)
- Per model:
- Small models (1-3B): 10-15GB
- Medium models (7-8B): 30-40GB
- Large models (70B): 140GB+
Minimum for this extension: 100GB free space
Models & Downloads
Q: Which model should I download first?
A: Qwen3-0.6B — works on all hardware including N150, no license gate, reasoning-capable.
hf download Qwen/Qwen3-0.6B --local-dir ~/models/Qwen3-0.6B
For n300 / p100 / T3000 / TT-QuietBox 2 (more DRAM): Llama-3.1-8B-Instruct is a good next step (requires HuggingFace account and license acceptance):
hf download meta-llama/Llama-3.1-8B-Instruct \
--local-dir ~/models/Llama-3.1-8B-Instruct
⚠️ N150 note: Llama-3.1-8B-Instruct exhausts DRAM on N150. Start with Qwen3-0.6B.
Q: How do I handle HuggingFace authentication?
A: Three options:
Option 1: Environment variable (recommended for scripts)
export HF_TOKEN=your_token_from_huggingface
hf download meta-llama/Llama-3.1-8B-Instruct --local-dir ~/models/Llama-3.1-8B-Instruct
Option 2: Interactive login (recommended for manual use)
hf auth login
# Paste your token when prompted
Option 3: In code
from huggingface_hub import login
login(token="your_token_from_huggingface")
Get a token: https://huggingface.co/settings/tokens
Q: Download failed with "repository not found" - why?
A: Gated models require access request:
- Go to model page on HuggingFace
- Click "Request access" button
- Wait for approval (usually instant for Llama)
- Ensure you're authenticated (see question above)
For Llama models: Must accept Meta's license agreement.
Q: Can I use models from other sources?
A: Yes, but:
- HuggingFace format required for vLLM (Production Inference lessons)
- Meta checkpoint format required for Direct API (Interactive Chat, API Server)
- ONNX/PyTorch format for TT-Forge (Image Classification)
Recommendation: Stick with HuggingFace - most compatible.
Inference & Serving
Q: Which inference method should I use?
A: Depends on your goal:
| Method | Lesson | Best For | Speed (after load) |
|---|---|---|---|
| One-shot demo | Download Model | Testing, verification | 2-5 min per query |
| Interactive chat | Interactive Chat | Learning, prototyping | 1-3 sec per query |
| Flask API | API Server | Simple custom APIs | 1-3 sec per query |
| vLLM | Production Inference | Production serving | 1-3 sec per query |
Quick guide:
- Just testing? → Download Model (one-shot demo)
- Learning/experimenting? → Interactive Chat (interactive)
- Building custom app? → API Server (Flask API)
- Production deployment? → Production Inference with vLLM (vLLM)
Q: Why does first load take 2-5 minutes?
A: Model initialization involves:
- Loading weights from disk (~16GB for Llama-8B)
- Converting to TT-Metalium format
- Distributing to hardware cores
- JIT compilation of kernels
This is normal and only happens once.
Subsequent queries are fast (1-3 seconds) because model stays in memory.
Q: Can I run multiple models simultaneously?
A: On same hardware: No (one model at a time per device)
Workarounds:
- Use model switching (stop one, start another)
- Use multiple hardware devices
- Use different hardware for different models (n150 for model A, n300 for model B)
Q: What does "context length" mean and why does it matter?
A:
- Context length = Maximum tokens (words/subwords) model can process at once
- Includes both input (prompt) + output (response)
Hardware limits:
- n150/p100: 64K tokens (~48K words)
- n300/T3000: 128K tokens (~96K words)
Exceeding context?
RuntimeError: Input sequence length exceeds maximum
Solutions:
- Shorten your prompts
- Use summarization for long documents
- Switch to hardware with larger context support
Q: Getting PyTorch dataclass errors with vLLM - how do I fix them?
A: This error (TypeError: must be called with a dataclass type or instance) is caused by PyTorch version mismatches.
Error looks like:
TypeError: must be called with a dataclass type or instance
# ... torch/_inductor/runtime/hints.py errors
Root cause: vLLM on Tenstorrent hardware requires PyTorch 2.5.0+cpu specifically. Other versions (2.4.x, 2.7.x) cause compatibility issues.
Solution: Recreate your vLLM environment
bash ~/tt-scratchpad/setup-vllm-env.sh
This automated script:
- ✅ Creates environment at correct location (
~/tt-metal/build/python_env_vllm) - ✅ Installs PyTorch 2.5.0+cpu (exact version)
- ✅ Installs all required dependencies
- ✅ Validates installation before completion
Verify your environment:
source ~/activate-vllm-env.sh
python3 -c "import torch; print('PyTorch version:', torch.__version__)"
# Should print: PyTorch version: 2.5.0+cpu
Why the specific version? TT-Metalium hardware drivers are built against PyTorch 2.5.0+cpu APIs. Other versions have incompatible dataclass implementations.
Custom Training
Q: Can I train models on Tenstorrent hardware?
A: Yes! The extension now includes 8 complete Custom Training lessons (CT1-CT8) that are fully validated on hardware.
What's working:
- ✅ From-scratch training: NanoGPT (11M params) - 136 steps in 76 seconds on n150
- ✅ Fine-tuning: Train custom models on your own datasets
- ✅ Complete toolkit: Setup scripts, validation, and tested templates
- ✅ Production-ready: Both training workflows validated end-to-end
Recommended version: TT-Metalium v0.66.0-rc7 (fully tested)
Q: What hardware do I need for training?
A: Training requirements depend on model size:
n150 (Wormhole single-chip):
- ✅ Perfect for NanoGPT (11M params, 6 layers, 384 dim)
- ✅ From-scratch training on Shakespeare, custom datasets
- ❌ TinyLlama-1.1B OOM (needs 2GB DRAM, only 1GB available)
n300+ (Wormhole dual-chip or higher):
- ✅ Everything n150 can do
- ✅ TinyLlama-1.1B fine-tuning (2GB+ DRAM available)
- ✅ Larger models and batch sizes
Recommendation: Start with n150 and NanoGPT to learn the workflow!
Q: What's the difference between fine-tuning and training from scratch?
A:
Fine-tuning (CT4):
- Start with pre-trained model (e.g., TinyLlama-1.1B)
- Train on small custom dataset (50-1000 examples)
- Adapts model to your specific task
- Faster (minutes to hours)
- Good for: Q&A bots, domain-specific assistants
Training from Scratch (CT8):
- Build model from random initialization
- Train on large dataset (Shakespeare, your own data)
- Learn patterns from ground up
- Slower (hours to days)
- Good for: Understanding training deeply, custom architectures
Which should I start with? CT8 (from-scratch) - it's faster on n150 with NanoGPT and teaches fundamentals!
Q: What TT-Metalium version do I need for training?
A: Training requires v0.66.0-rc5 or later
Why:
- v0.64.5 and earlier: C++ tt-train only ❌
- v0.66.0-rc5+: Python ttml module available ✅
- v0.66.0-rc7: Fully validated and recommended ✅
Check your version:
cd $TT_METAL_HOME && git describe --tags
See CT4 and CT8 lessons for complete setup instructions!
Compilers & Tools
Q: What's the difference between TT-Forge and TT-XLA?
A:
| Feature | TT-Forge | TT-XLA |
|---|---|---|
| Status | Experimental | Production-ready |
| Multi-chip | Single only | Yes (TP/DP) |
| Frameworks | PyTorch, ONNX | JAX, PyTorch/XLA |
| Model support | Limited (169 validated) | Broader |
| Installation | Complex (build from source) | Simple (pip) |
When to use TT-Forge:
- Experimenting with PyTorch models
- Learning MLIR compilation
- Working with validated models list
When to use TT-XLA:
- Production multi-chip workloads
- JAX workflows
- Need stability and support
Q: Why did my model fail to compile in TT-Forge?
A: TT-Forge is experimental. Common reasons:
Unsupported operators
- Not all PyTorch ops implemented
- Check tt-forge-models for validated examples
Model architecture
- Very new architectures may not work
- Dynamic shapes not supported
- Control flow limited
Environment variable pollution (most common!)
unset TT_METAL_HOME unset TT_METAL_VERSION # Then try again
Recommendation: Start with MobileNetV2 (Image Classification with TT-Forge default) - known to work.
Q: How do I know if my model is supported?
A:
For TT-Forge:
- Check tt-forge-models repository
- 169 validated models listed
- Start with these before trying others
For vLLM:
- Llama family well-supported (2, 3, 3.1, 3.2)
- Mistral supported
- Qwen supported (needs n300+ for larger models)
- Check documentation for your specific model
For TT-XLA:
- Most JAX/Flax models work
- PyTorch/XLA support growing
- GPT-2 demo included (JAX Inference with TT-XLA)
Troubleshooting
Q: Command failed with "ImportError: undefined symbol"
A: This is almost always environment variable pollution.
Fix:
unset TT_METAL_HOME
unset TT_METAL_VERSION
# Retry your command
Make permanent:
Add to ~/.bashrc:
# Prevent TT-Metalium environment pollution
unset TT_METAL_HOME
unset TT_METAL_VERSION
Why this happens: Different versions of libraries loaded due to environment variables overriding build paths.
Q: vLLM server won't start - what do I check?
A: Systematic debugging:
1. Check environment variables:
echo $TT_METAL_HOME # Should be ~/tt-metal
echo $MESH_DEVICE # Should match your hardware (N150, etc.)
echo $PYTHONPATH # Should include $TT_METAL_HOME
2. Verify model path:
ls ~/models/Llama-3.1-8B-Instruct/config.json
3. Check for other processes:
ps aux | grep -E "tt-metal|vllm"
# Kill if needed:
# pkill -9 -f vllm
4. Verify vLLM installation:
source ~/tt-vllm-venv/bin/activate
python3 -c "import vllm; print(vllm.__version__)"
5. Check device availability:
tt-smi
# Should show your device
Q: "Out of memory" errors - what can I do?
A: Several strategies:
1. Reduce context length:
# Instead of:
--max-model-len 65536
# Try:
--max-model-len 32768
2. Reduce batch size:
# Instead of:
--max-num-seqs 32
# Try:
--max-num-seqs 16
3. Use smaller model:
- 8B → 3B (Llama-3.2-3B)
- 8B → 1B (Llama-3.2-1B)
4. Clear device state:
sudo pkill -9 -f tt-metal
sudo rm -rf /dev/shm/tenstorrent* /dev/shm/tt_*
tt-smi -r
Q: Build failed - where do I look?
A:
TT-Metalium build issues:
cd ~/tt-metal
./build_metal.sh 2>&1 | tee build.log
# Check build.log for errors
Common build failures:
- Missing dependencies:
sudo apt-get install build-essential cmake - Python version: Need 3.9+ (check with
python3 --version) - Disk space: Need 10GB+ free
- Memory: Need 16GB+ RAM for building
TT-Forge build issues:
- Python 3.11 required: Can't use older Python
- clang-17 required:
sudo apt-get install clang-17 - Environment variables: Must unset TT_METAL_HOME first
Q: TT-NN import errors or symbol undefined errors in cloud environments - how do I fix them?
A: After rolling back or updating TT-Metalium versions, TT-NN bindings may become incompatible.
Symptoms:
ImportError: undefined symbol: _ZN2tt9tt_fabric15SetFabricConfigENS0...ImportError: undefined symbol: MPIX_Comm_revoke- TT-NN examples that previously worked now fail
Common Cause: Rolling back or updating TT-Metalium versions (for example, to match specific vLLM compatibility) can break TT-NN bindings.
Solution - Clean Rebuild to Known-Good Version:
Note your original working commit:
cd ~/tt-metal git log --oneline | head -5 # Save the commit hash that was workingCheckout the known-good version:
cd ~/tt-metal git checkout 5143b856eb # Replace with your working commit git submodule update --init --recursiveComplete clean rebuild:
cd ~/tt-metal # Clean all build artifacts rm -rf build build_Release # Reinstall dependencies sudo ./install_dependencies.sh # Rebuild from scratch ./build_metal.shTest TT-NN:
source ~/tt-metal/python_env/bin/activate export LD_LIBRARY_PATH=/opt/openmpi-v5.0.7-ulfm/lib:$LD_LIBRARY_PATH export PYTHONPATH=~/tt-metal:$PYTHONPATH python3 -m ttnn.examples.usage.run_op_on_device
Important Notes:
- The original/untouched TT-Metalium version is often the most stable
- Rolling back to older commits can create incompatible bindings
- Always do a complete clean rebuild after changing commits
- OpenMPI library path is required:
/opt/openmpi-v5.0.7-ulfm/lib
Known-Good Commit (as of Dec 2024):
5143b856eb(Oct 28, 2024) - Stable TT-NN, validated on n150
Q: Getting OpenMPI errors - how do I fix them?
A: OpenMPI library path errors are common and easy to fix.
Symptoms:
- Errors mentioning "libmpi.so" or "OpenMPI"
- "ImportError: cannot open shared object file"
- Commands fail with MPI-related errors
Fix:
export LD_LIBRARY_PATH=/opt/openmpi-v5.0.7-ulfm/lib:$LD_LIBRARY_PATH
Make permanent:
Add to ~/.bashrc:
# OpenMPI library path for Tenstorrent
export LD_LIBRARY_PATH=/opt/openmpi-v5.0.7-ulfm/lib:$LD_LIBRARY_PATH
Then reload:
source ~/.bashrc
Why this happens: The OpenMPI library installation isn't in the system's default library search path, so you need to explicitly tell the dynamic linker where to find it.
Alternative OpenMPI paths: If the above doesn't work, try:
# Find your OpenMPI installation
find /opt -name "libmpi.so*" 2>/dev/null
# Use the directory containing the .so files
export LD_LIBRARY_PATH=/path/to/openmpi/lib:$LD_LIBRARY_PATH
Q: Downloads are slow or failing
A:
Slow downloads:
- HuggingFace throttles anonymous requests
- Solution: Login with
hf auth login - Consider downloading overnight for large models
Failing downloads:
- Check internet connection
- Verify HF authentication (see authentication question above)
- Check disk space:
df -h ~ - Try resuming:
hf download meta-llama/Llama-3.1-8B-Instruct \ --local-dir ~/models/Llama-3.1-8B-Instruct \ --resume-download
Performance & Optimization
Q: How can I speed up inference?
A:
After first load (model in memory):
- Already fast: 1-3 seconds per query typical
- Can't improve much: Hardware-optimized already
For batch processing:
- Use vLLM's batching:
--max-num-seqs 32 - Process multiple requests together
- 3-5x throughput improvement
For lower latency:
- Reduce
max_tokensparameter (shorter responses = faster) - Use smaller model (8B → 3B)
- Consider hardware upgrade (n150 → n300)
Q: What are good vLLM server parameters?
A: Recommended by hardware:
n150 (single chip):
--max-model-len 65536 # Full 64K context
--max-num-seqs 16 # Moderate batching
--block-size 64 # Standard
n300 (dual chip):
--max-model-len 131072 # Full 128K context
--max-num-seqs 32 # Higher batching
--block-size 64
--tensor-parallel-size 2 # Use both chips
T3000 (8 chips):
--max-model-len 131072
--max-num-seqs 64 # High batching
--block-size 64
--tensor-parallel-size 8 # Use all chips
Conservative (if OOM errors):
- Reduce
max-model-lenby 50% - Reduce
max-num-seqsby 50% - Test incrementally
Q: How do I monitor performance?
A:
Token generation speed:
# In vLLM output, look for:
"Generated 150 tokens in 2.5 seconds (60 tokens/sec)"
Server metrics:
# vLLM exposes Prometheus metrics:
curl http://localhost:8000/metrics
System monitoring:
# GPU-like monitoring for TT:
watch -n 1 tt-smi
Load testing:
# Install hey:
go install github.com/rakyll/hey@latest
# Test throughput:
hey -n 100 -c 10 -m POST \
-H "Content-Type: application/json" \
-d '{"model": "...", "messages": [...]}' \
http://localhost:8000/v1/chat/completions
Q: How can I visualize my hardware usage?
A: Four layers of observability, from quick to deep:
1. tt-smi — live telemetry
tt-smi # interactive TUI with chip temp, DRAM usage, firmware version
tt-smi -s # structured JSON snapshot (scriptable)
watch -n 1 tt-smi -s | python3 -c "import sys,json; d=json.load(sys.stdin); print(d)"
2. tt-toplike — htop-style real-time view
An htop-inspired Rust TUI that shows per-chip utilization, process list, temperature, and power draw in real time. Install via pip install tt-toplike or the system package.
tt-toplike
3. ttnn-visualizer — model execution analysis
A web-based tool that loads a TT-Metalium performance trace and renders interactive graphs: operation timelines, memory usage over time, tensor shapes, buffer allocation maps, and the full operation flow graph. Run after a profiled inference pass to understand exactly where time is spent.
4. tensix-viz — chip topology education
An interactive JavaScript canvas visualizer showing the actual Tensix grid layout — which cores are compute vs DRAM vs ETH — and animating what different workload types look like. Useful for building the mental model before you profile.
Community & Support
Q: Where can I get help?
A:
Official channels:
- Discord: https://discord.gg/tenstorrent (most active)
- GitHub Issues:
- Documentation: https://docs.tenstorrent.com
When asking for help, include:
- Hardware type (n150/n300/T3000/p100)
- Error message (full text)
- Command you ran
- Output of
tt-smi - Which lesson you're on
Q: How do I report a bug?
A:
Before reporting:
- Search existing issues on GitHub
- Verify hardware works (
tt-smi) - Try reset (
tt-smi -r) - Check you're on latest TT-Metalium/vLLM
When reporting, include:
Hardware: n150
OS: Ubuntu 22.04
TT-Metalium version: [git rev-parse HEAD output]
vLLM version: [pip show vllm]
Error: [paste full error]
Steps to reproduce: [numbered list]
Good issue = faster fix!
Q: Can I contribute?
A: Yes! Several ways:
- Bring up new models
- Earn rewards
- Official contribution path
2. Documentation
- Fix typos/errors
- Add examples
- Improve tutorials
3. Code contributions
- Bug fixes
- Performance improvements
- New features
Start here:
- Join Discord #contributing channel for guidance
- Ask about "good first issue" opportunities
- Review documentation at https://docs.tenstorrent.com
Q: Is this production-ready?
A: Depends on component:
Production-ready (✅):
- TT-Metalium - Stable, tested
- vLLM - Production-grade serving
- TT-XLA - Production compiler
Experimental (⚠️):
- TT-Forge - Beta, limited model support
- Some models - Check validation status
Recommendation:
- For production: Stick with vLLM + validated models
- For experimentation: Try TT-Forge, new models
- Always test thoroughly before production deployment
Quick Reference
Essential Commands
# Hardware
tt-smi # Check hardware
tt-smi -s # Structured output
tt-smi -r # Reset device
# Model info
ls ~/models/ # List installed models
du -sh ~/models/* # Check model sizes
# Environment
python3 -c "import ttnn; print('✓')" # Test TT-Metalium
hf --version # Check HF CLI
# vLLM
source ~/tt-vllm-venv/bin/activate # Activate venv
curl http://localhost:8000/health # Check server
curl http://localhost:8000/metrics # Get metrics
# Cleanup
sudo pkill -9 -f "tt-metal|vllm" # Kill processes
sudo rm -rf /dev/shm/tt_* # Clear shared memory
tt-smi -r # Reset hardware
Quick Diagnostic
Run this to check your setup:
#!/bin/bash
echo "=== Tenstorrent Diagnostic ==="
echo ""
echo "Hardware:"
tt-smi -s 2>&1 | grep -o '"board_type": "[^"]*"' || echo "❌ No hardware detected"
echo ""
echo "TT-Metalium:"
python3 -c "import ttnn; print('✓ Working')" 2>&1 || echo "❌ Not working"
echo ""
echo "Models:"
ls ~/models/ 2>/dev/null | head -3 || echo "❌ No models found"
echo ""
echo "Disk space:"
df -h ~ | grep -v Filesystem
echo ""
echo "Python:"
python3 --version
Advanced Learning Resources
Q: Where can I learn about low-level RISC-V programming on Tenstorrent hardware?
A: Check out the CS Fundamentals series - Module 1 covers RISC-V & Computer Architecture!
Each Tensix core contains five RISC-V processors (RV32IM ISA):
- BRISC (RISCV_0) - Primary data movement
- NCRISC (RISCV_1) - Network operations
- TRISC0/1/2 - Compute pipeline (unpack, math, pack)
With 176 Tensix cores on Wormhole, that's 880 RISC-V cores you can program directly!
What Module 1 includes:
- ✅ Von Neumann architecture & fetch-decode-execute cycle
- ✅ RISC-V ISA fundamentals
- ✅ Hands-on example: Add two integers in RISC-V assembly
- ✅ Build and run TT-Metalium programming examples
- ✅ Explore kernel source code
- ✅ Comprehensive exploration guide (60+ pages)
Topics covered across 7 CS Fundamentals modules:
- RISC-V architecture and memory maps
- Memory hierarchy and cache locality
- Parallel computing (scale from 1 to 880 cores!)
- NoC (Network-on-Chip) programming
- Synchronization and barriers
- Abstraction layers and compilation
- Computational complexity in practice
Access the series:
- From Welcome page → CS Fundamentals section
- Or start with Module 1: RISC-V & Computer Architecture
View the full guide:
Open RISC-V Exploration Guide- Comprehensive deep-dive documentation
Perfect for:
- Developers who want to understand the hardware at the lowest level
- Embedded systems programmers exploring RISC-V at scale
- Computer architecture enthusiasts
- Anyone optimizing kernel performance
Still Have Questions?
Check:
- Specific lesson troubleshooting sections
- CLAUDE.md for detailed technical info
- Discord #help channel
Remember: Most issues are:
- Environment variables (unset TT_METAL_HOME)
- Permissions (try sudo or add to tenstorrent group)
- Device state (reset with tt-smi -r)
When in doubt:
tt-smi -r
sudo rm -rf /dev/shm/tt_*
# Then retry
Last updated: May 2026 Extension version: 0.0.438
Found an error in this FAQ? Please report it on GitHub or Discord!