Tenstorrent Developer Extension - FAQ
Frequently Asked Questions - Your quick reference for common questions, troubleshooting, and tips from all 48 lessons.
Table of Contents
- Getting Started
- Remote Development & SSH
- Hardware & Detection
- Installation & Setup
- Models & Downloads
- Inference & Serving
- Custom Training
- Compilers & Tools
- Troubleshooting
- Performance & Optimization
- Community & Support
Getting Started
Q: Which lesson should I start with?
A: Start with Hardware Detection if you're brand new. The 48 lessons are organized into 9 categories:
π Your First Inference (5 lessons)
- Hardware Detection β Verify Installation β Download Model β Interactive Chat β API Server
π Serving Models (4 lessons) Production servers (tt-inference-server, vLLM) and generation (Image, Video)
π Custom Training (8 lessons) β NEW! Fine-tune models or train from scratch - validated on hardware with both workflows working!
π― Applications (2 lessons) Coding Assistant, AnimateDiff Video Generation
π¨βπ³ Tenstorrent Cookbook (6 lessons) Game of Life, Audio, Mandelbrot, Image Filters, Particle Life + Overview
π§ Compilers & Tools (2 lessons) TT-Forge, TT-XLA
π§ CS Fundamentals (7 lessons) Computer Architecture, Memory, Parallelism, Networks, Synchronization, Abstraction, Complexity
π Advanced Topics (5 lessons) tt-installer, Bounty Program, Explore Metalium, Koyeb Deployment (2)
Can I skip lessons? Yes! Categories are independent - jump to what interests you.
Q: Do I need to complete lessons in order?
A: Not strictly, but:
- Hardware Detection, Verify Installation, and Download Model are foundational - most later lessons assume you've done these
- Interactive Chat through Image Generation build on each other but can be done selectively
- Advanced topics (compilers, RISC-V, bounty program) are more independent
Quick start for experienced users:
- Run Hardware Detection (2 minutes - verify hardware)
- Skip to Production Inference with vLLM (production serving)
- Explore advanced topics (compilers, RISC-V, bounty program)
Q: What's the difference between the different tools?
A: Tenstorrent has several tools serving different purposes:
| Tool | Purpose | When to Use | Maturity |
|---|---|---|---|
| tt-metal | Low-level framework | Custom kernels, maximum control | Stable |
| vLLM | LLM serving | Production LLM deployment | Production |
| TT-Forge | MLIR compiler | PyTorch models (experimental) | Beta |
| TT-XLA | XLA compiler | JAX/PyTorch (production) | Production |
Simple guide:
- Need to run LLMs? β Production Inference with vLLM
- Want to experiment with PyTorch? β Image Classification with TT-Forge
- Need JAX support? β JAX Inference with TT-XLA
- Building custom kernels? β tt-metal (Hardware Detection, Verify Installation, Download Model, RISC-V Programming)
Remote Development & SSH
Q: Can I use this extension from my Mac/Windows laptop to access remote Tenstorrent hardware?
A: Yes! Use VSCode's Remote-SSH extension - the industry-standard solution for remote development.
This is the recommended approach for:
- Developing on macOS/Windows while hardware is on Linux
- Working from laptop with hardware in datacenter/cloud
- Team development with shared hardware resources
Why Remote-SSH is perfect for this:
- β Zero extension changes needed - Everything "just works"
- β Transparent experience - Feels like local development
- β All features work - Terminal commands, file operations, debugging
- β Battle-tested - Used by millions of developers daily
Q: How do I set up Remote-SSH for Tenstorrent development?
A: Quick setup guide:
Step 1: Install Remote-SSH extension
- Open VSCode on your local machine (Mac/Windows)
- Open Extensions panel (
Cmd+Shift+XorCtrl+Shift+X) - Search for "Remote - SSH"
- Install the official Microsoft extension
Step 2: Configure SSH connection
Add your Tenstorrent machine to SSH config:
# On your local machine, edit ~/.ssh/config
# (Cmd+Shift+P β "Remote-SSH: Open Configuration File")
Host tenstorrent-dev
HostName 192.168.1.100 # Your hardware machine IP
User ubuntu # Your username
IdentityFile ~/.ssh/id_rsa # Your SSH key
ForwardAgent yes # Optional: Forward SSH agent
Step 3: Connect to remote machine
Cmd+Shift+P(orCtrl+Shift+P) β "Remote-SSH: Connect to Host"- Select "tenstorrent-dev"
- New VSCode window opens connected to remote machine
Step 4: Install Tenstorrent extension on remote
- In the remote VSCode window, go to Extensions
- Search for "Tenstorrent Developer Extension"
- Click "Install in SSH: tenstorrent-dev"
Step 5: Start using lessons!
- All terminal commands run on remote machine
- All file operations work on remote filesystem
- Hardware detection works automatically
- Models download to remote machine
Q: Do the lessons work through Remote-SSH?
A: Yes, perfectly! Remote-SSH makes everything transparent:
What works automatically:
- β All terminal commands run on remote machine
- β
File operations (
Read,Write,Edit) work on remote filesystem - β
Hardware detection (
tt-smi) works - β Model downloads go to remote machine
- β Inference runs on remote hardware
- β Port forwarding automatic (access servers on localhost)
Example workflow:
- Connect via Remote-SSH from your Mac
- Open Tenstorrent walkthrough (works like local)
- Run Hardware Detection β
tt-smiruns on remote - Download model β Saves to remote
~/models/ - Start vLLM server β Runs on remote, port auto-forwarded
- Test from local browser β
http://localhost:8000works!
No code changes needed - The extension doesn't know or care that you're remote!
Q: What about SSH without Remote-SSH extension?
A: Not recommended. Manual SSH has major problems:
β File operations break - Extension reads/writes local filesystem, not remote
β Path mismatches - ~/models/ on Mac β ~/models/ on remote
β Complex escaping - Terminal commands get mangled through SSH
β No port forwarding - Can't access servers on localhost
β Poor UX - Feels disconnected, hard to debug
Example of problems:
If you manually SSH in terminal:
# This command in lesson creates file on your MAC, not remote!
cat > ~/tt-scratchpad/script.py << 'EOF'
...
EOF
Then this fails because the file is on the wrong machine:
ssh user@remote python3 ~/tt-scratchpad/script.py
With Remote-SSH: Both operations happen on remote automatically.
Q: Can multiple people share the same remote hardware?
A: Yes, but with considerations:
Shared hardware works best with:
- β Resource coordination - Don't run multiple large models simultaneously
- β
User directories - Each user has own
~/models/,~/tt-scratchpad/ - β Port management - Use different ports (8000, 8001, 8002...)
- β Communication - Team chat to coordinate who's using hardware
Limitations:
- β οΈ Only one model can load on device at a time
- β οΈ Large models need device reset between users
- β οΈ
/dev/shmshared memory might need cleanup
Best practice for teams:
# User 1
vllm ... --port 8001
# User 2
vllm ... --port 8002
# Each user accesses their own server
curl http://localhost:8001/... # User 1
curl http://localhost:8002/... # User 2
Q: What about Tenstorrent Cloud? Does Remote-SSH work?
A: Yes! Tenstorrent Cloud instances are perfect for Remote-SSH:
Typical setup:
- Get Tenstorrent Cloud instance (pre-configured with hardware)
- Receive SSH credentials
- Add to
~/.ssh/configon your laptop - Connect via Remote-SSH
- Start developing!
Cloud benefits:
- β Pre-installed tt-metal and drivers
- β Pre-configured environment
- β No hardware setup needed
- β Access from anywhere
Example cloud SSH config:
Host tt-cloud
HostName cloud.instance.tenstorrent.com
User your-username
IdentityFile ~/.ssh/tt-cloud-key
ForwardAgent yes
Q: Are there performance considerations with Remote-SSH?
A: Remote-SSH is very efficient:
Fast operations (no noticeable latency):
- Terminal commands (SSH is fast)
- File editing (only changes sync)
- Running inference (happens on remote)
- Model downloads (direct from HuggingFace to remote)
What uses bandwidth:
- File tree indexing (one-time)
- Large file transfers (if you copy files between machines)
- Extension updates (rare)
Best practices:
- β Use wired connection or good WiFi
- β Keep large models on remote (don't transfer)
- β
Use compression in SSH config:
Compression yes
Real-world experience:
- Feels instant on good connection (10+ Mbps)
- Usable on moderate connection (1-5 Mbps)
- Not recommended on very slow connections (<1 Mbps)
Q: How do I disconnect from remote machine?
A: Several options:
Graceful disconnect:
- Close the remote VSCode window
- Connection closes, remote processes continue running
From command palette:
Cmd+Shift+Pβ "Remote-SSH: Close Remote Connection"
Important: vLLM servers keep running after disconnect!
# Before disconnecting, you may want to:
docker ps # Note container IDs
docker stop <container-id> # Stop servers
# Or leave them running and reconnect later
Reconnecting:
- Just repeat: "Remote-SSH: Connect to Host" β Select your host
- Everything exactly as you left it
Hardware & Detection
Q: Can I try Tenstorrent development without hardware?
A: Yes! Use ttsim - Tenstorrent's full-system simulator.
What is ttsim:
- Virtual Wormhole or Blackhole device that runs on any Linux/x86_64 system
- No physical hardware needed
- Slower than silicon but fast enough for learning and experimentation
- Perfect for exploring before purchasing hardware
Quick Start:
# Download simulator (replace vX.Y with latest version)
mkdir -p ~/sim
cd ~/sim
wget https://github.com/tenstorrent/ttsim/releases/latest/download/libttsim_wh.so
# Copy SOC descriptor
cp $TT_METAL_HOME/tt_metal/soc_descriptors/wormhole_b0_80_arch.yaml ~/sim/soc_descriptor.yaml
# Set environment variable
export TT_METAL_SIMULATOR=~/sim/libttsim_wh.so
# Run in slow dispatch mode (required for simulator)
export TT_METAL_SLOW_DISPATCH_MODE=1
# Test it works
cd $TT_METAL_HOME
./build/programming_examples/metal_example_add_2_integers_in_riscv
What you CAN do with ttsim:
- β Learn TT-Metal programming model
- β Run programming examples and tests
- β Develop and debug kernels
- β Test TTNN operations
- β Explore Tenstorrent architecture
What you CAN'T do (too slow):
- β Run full model inference (vLLM, large models)
- β Production workloads
- β Performance benchmarking
- β Real-time applications
Which lessons work with ttsim:
- Hardware Detection: Partial support -
ttnnworks,tt-smiwon't detect simulated device - Verify Installation: Yes - programming examples work great
- RISC-V Programming: Yes - perfect for learning low-level programming
- Model Inference lessons: No - too slow for practical use (Interactive Chat through Image Generation)
- Compiler lessons: Limited - depends on workload (TT-Forge, TT-XLA)
Resources:
- GitHub: https://github.com/tenstorrent/ttsim
- Releases: https://github.com/tenstorrent/ttsim/releases/latest
Tip: Use ttsim for learning and kernel development, then move to real hardware for model inference and production workloads.
Q: Which hardware do I have?
A: Run this command:
tt-smi -s | grep -o '"board_type": "[^"]*"'
Output tells you:
- N150 - Single Wormhole chip (development, 64K context)
- N300 - Dual Wormhole chips (128K context, TP=2)
- T3K - Eight Wormhole chips (large models, TP=8)
- P100 - Single Blackhole chip (newer architecture)
- P150 - Dual Blackhole chips (TP=2)
Q: tt-smi says "No devices found" - what do I do?
A: Try these steps in order:
Check PCIe detection:
lspci | grep -i tenstorrentShould show:
Processing accelerators: Tenstorrent Inc.Try with sudo:
sudo tt-smiIf this works, you have a permissions issue.
Reset the device:
tt-smi -rFull cleanup (if still failing):
sudo pkill -9 -f tt-metal sudo pkill -9 -f vllm sudo rm -rf /dev/shm/tenstorrent* /dev/shm/tt_* tt-smi -r
Still not working? Check the Hardware Detection lesson troubleshooting section for detailed steps.
Q: What's the difference between Wormhole and Blackhole?
A:
- Wormhole (N150, N300, T3K) - 2nd generation, well-validated, most models tested
- Blackhole (P100, P150) - Latest generation, newer architecture, some experimental models
For production: Stick with Wormhole (N150/N300/T3K) - more models validated.
For experimentation: Blackhole offers newer features but check model compatibility.
Q: How do I know what my hardware can run?
A: Quick reference:
| Hardware | Max Model Size | Max Context | Multi-chip | Best For |
|---|---|---|---|---|
| N150, P100 | 8B | 64K | No (TP=1) | Development, prototyping |
| N300, P150 | 13B | 128K | Yes (TP=2) | Medium models, multi-user |
| T3K | 70B+ | 128K | Yes (TP=8) | Large models, production |
Q: What happens to running jobs and hardware utilization when a system suspends?
A: When the system goes into suspend, all running jobs on Tenstorrent hardware are interrupted and effectively terminated, and hardware utilization drops to zero. On resume, the driver re-initializes the device (similar to a reset), so any workloads must be restarted. In normal cases you don't need a full reboot; if the device doesn't come back cleanly, run tt-smi -r (reset) or reboot the host.
Q: After a Linux kernel update, my Tenstorrent device is not detected or tt-inference-server reports a pre-release driver version.
A: The kernel module driver (tt-kmd) must be compiled specifically for each kernel version. This normally happens automatically via DKMS when a new kernel is installed, but can silently fail if there are orphaned DKMS entries left behind from old driver versions.
Quick fix:
sudo dkms install tenstorrent/$(dkms status tenstorrent | grep -o '[0-9]\+\.[0-9]\+\.[0-9]\+' | sort -V | tail -1) -k $(uname -r)
sudo modprobe tenstorrent
modinfo tenstorrent | grep version # Confirm correct version loaded
If dkms install fails with "Could not locate dkms.conf", you have orphaned entries from old driver versions that are blocking the auto-build. Clean them up:
# Identify broken entries (any version that prints an error instead of a status line)
dkms status
# Manually remove each broken version
sudo rm -rf /var/lib/dkms/tenstorrent/<broken-version>
# Then retry the install above
Why this happens: When old tt-kmd versions are superseded, their DKMS source directories are sometimes removed without de-registering them from the DKMS registry. These orphaned entries cause dkms autoinstall to abort with an error before it reaches the valid driver version, so the new kernel boots without the module.
After cleanup, kernel upgrades will rebuild the module automatically β no manual intervention needed on future upgrades.
Installation & Setup
Q: How do I verify tt-metal is working?
A: Run this quick test:
python3 -c "import ttnn; print('β tt-metal ready')"
If it fails:
- Check
PYTHONPATHincludes tt-metal directory - Verify tt-metal is built:
ls ~/tt-metal/build/lib - Rebuild if needed:
cd ~/tt-metal && ./build_metal.sh
Q: Which Python version do I need?
A:
- Minimum: Python 3.9
- Recommended: Python 3.10+
- For TT-Forge: Python 3.11+ (requirement)
Check your version:
python3 --version
Q: Where should models be installed?
A: Standard locations:
Recommended:
~/models/[model-name]/- Example:
~/models/Llama-3.1-8B-Instruct/ - Used by most lessons
- Example:
HuggingFace cache:
~/.cache/huggingface/hub/- Automatic when using
hf download - Takes more disk space (keeps multiple versions)
- Automatic when using
Both formats needed for some lessons:
- Meta format:
~/models/[model]/original/(for Lessons 3-5) - HuggingFace format:
~/models/[model]/(for Lessons 6-9)
Q: How much disk space do I need?
A: Plan for:
- tt-metal: ~5GB (source + build artifacts)
- vLLM: ~20GB (including dependencies)
- Per model:
- Small models (1-3B): 10-15GB
- Medium models (7-8B): 30-40GB
- Large models (70B): 140GB+
Minimum for this extension: 100GB free space
Models & Downloads
Q: Which model should I download first?
A: Llama-3.1-8B-Instruct - covered in Download Model.
Why this model:
- β Works on N150 (most common hardware)
- β Good performance for 8B size
- β Supports all lessons (4-9)
- β Well-tested and documented
Download command:
hf download meta-llama/Llama-3.1-8B-Instruct \
--local-dir ~/models/Llama-3.1-8B-Instruct
Q: How do I handle HuggingFace authentication?
A: Three options:
Option 1: Environment variable (recommended for scripts)
export HF_TOKEN=your_token_from_huggingface
hf download meta-llama/Llama-3.1-8B-Instruct --local-dir ~/models/Llama-3.1-8B-Instruct
Option 2: Interactive login (recommended for manual use)
hf auth login
# Paste your token when prompted
Option 3: In code
from huggingface_hub import login
login(token="your_token_from_huggingface")
Get a token: https://huggingface.co/settings/tokens
Q: Download failed with "repository not found" - why?
A: Gated models require access request:
- Go to model page on HuggingFace
- Click "Request access" button
- Wait for approval (usually instant for Llama)
- Ensure you're authenticated (see question above)
For Llama models: Must accept Meta's license agreement.
Q: Can I use models from other sources?
A: Yes, but:
- HuggingFace format required for vLLM (Production Inference lessons)
- Meta checkpoint format required for Direct API (Interactive Chat, API Server)
- ONNX/PyTorch format for TT-Forge (Image Classification)
Recommendation: Stick with HuggingFace - most compatible.
Inference & Serving
Q: Which inference method should I use?
A: Depends on your goal:
| Method | Lesson | Best For | Speed (after load) |
|---|---|---|---|
| One-shot demo | Download Model | Testing, verification | 2-5 min per query |
| Interactive chat | Interactive Chat | Learning, prototyping | 1-3 sec per query |
| Flask API | API Server | Simple custom APIs | 1-3 sec per query |
| vLLM | Production Inference | Production serving | 1-3 sec per query |
Quick guide:
- Just testing? β Download Model (one-shot demo)
- Learning/experimenting? β Interactive Chat (interactive)
- Building custom app? β API Server (Flask API)
- Production deployment? β Production Inference with vLLM (vLLM)
Q: Why does first load take 2-5 minutes?
A: Model initialization involves:
- Loading weights from disk (~16GB for Llama-8B)
- Converting to TT-Metal format
- Distributing to hardware cores
- JIT compilation of kernels
This is normal and only happens once.
Subsequent queries are fast (1-3 seconds) because model stays in memory.
Q: Can I run multiple models simultaneously?
A: On same hardware: No (one model at a time per device)
Workarounds:
- Use model switching (stop one, start another)
- Use multiple hardware devices
- Use different hardware for different models (N150 for model A, N300 for model B)
Q: What does "context length" mean and why does it matter?
A:
- Context length = Maximum tokens (words/subwords) model can process at once
- Includes both input (prompt) + output (response)
Hardware limits:
- N150/P100: 64K tokens (~48K words)
- N300/T3K: 128K tokens (~96K words)
Exceeding context?
RuntimeError: Input sequence length exceeds maximum
Solutions:
- Shorten your prompts
- Use summarization for long documents
- Switch to hardware with larger context support
Q: Getting PyTorch dataclass errors with vLLM - how do I fix them?
A: This error (TypeError: must be called with a dataclass type or instance) is caused by PyTorch version mismatches.
Error looks like:
TypeError: must be called with a dataclass type or instance
# ... torch/_inductor/runtime/hints.py errors
Root cause: vLLM on Tenstorrent hardware requires PyTorch 2.5.0+cpu specifically. Other versions (2.4.x, 2.7.x) cause compatibility issues.
Solution: Recreate your vLLM environment
bash ~/tt-scratchpad/setup-vllm-env.sh
This automated script:
- β
Creates environment at correct location (
~/tt-metal/build/python_env_vllm) - β Installs PyTorch 2.5.0+cpu (exact version)
- β Installs all required dependencies
- β Validates installation before completion
Verify your environment:
source ~/activate-vllm-env.sh
python3 -c "import torch; print('PyTorch version:', torch.__version__)"
# Should print: PyTorch version: 2.5.0+cpu
Why the specific version? TT-Metal hardware drivers are built against PyTorch 2.5.0+cpu APIs. Other versions have incompatible dataclass implementations.
Custom Training
Q: Can I train models on Tenstorrent hardware?
A: Yes! The extension now includes 8 complete Custom Training lessons (CT1-CT8) that are fully validated on hardware.
What's working:
- β From-scratch training: NanoGPT (11M params) - 136 steps in 76 seconds on N150
- β Fine-tuning: Train custom models on your own datasets
- β Complete toolkit: Setup scripts, validation, and tested templates
- β Production-ready: Both training workflows validated end-to-end
Recommended version: tt-metal v0.66.0-rc7 (fully tested)
Q: What hardware do I need for training?
A: Training requirements depend on model size:
N150 (Wormhole single-chip):
- β Perfect for NanoGPT (11M params, 6 layers, 384 dim)
- β From-scratch training on Shakespeare, custom datasets
- β TinyLlama-1.1B OOM (needs 2GB DRAM, only 1GB available)
N300+ (Wormhole dual-chip or higher):
- β Everything N150 can do
- β TinyLlama-1.1B fine-tuning (2GB+ DRAM available)
- β Larger models and batch sizes
Recommendation: Start with N150 and NanoGPT to learn the workflow!
Q: What's the difference between fine-tuning and training from scratch?
A:
Fine-tuning (CT4):
- Start with pre-trained model (e.g., TinyLlama-1.1B)
- Train on small custom dataset (50-1000 examples)
- Adapts model to your specific task
- Faster (minutes to hours)
- Good for: Q&A bots, domain-specific assistants
Training from Scratch (CT8):
- Build model from random initialization
- Train on large dataset (Shakespeare, your own data)
- Learn patterns from ground up
- Slower (hours to days)
- Good for: Understanding training deeply, custom architectures
Which should I start with? CT8 (from-scratch) - it's faster on N150 with NanoGPT and teaches fundamentals!
Q: What tt-metal version do I need for training?
A: Training requires v0.66.0-rc5 or later
Why:
- v0.64.5 and earlier: C++ tt-train only β
- v0.66.0-rc5+: Python ttml module available β
- v0.66.0-rc7: Fully validated and recommended β
Check your version:
cd $TT_METAL_HOME && git describe --tags
See CT4 and CT8 lessons for complete setup instructions!
Compilers & Tools
Q: What's the difference between TT-Forge and TT-XLA?
A:
| Feature | TT-Forge | TT-XLA |
|---|---|---|
| Status | Experimental | Production-ready |
| Multi-chip | Single only | Yes (TP/DP) |
| Frameworks | PyTorch, ONNX | JAX, PyTorch/XLA |
| Model support | Limited (169 validated) | Broader |
| Installation | Complex (build from source) | Simple (pip) |
When to use TT-Forge:
- Experimenting with PyTorch models
- Learning MLIR compilation
- Working with validated models list
When to use TT-XLA:
- Production multi-chip workloads
- JAX workflows
- Need stability and support
Q: Why did my model fail to compile in TT-Forge?
A: TT-Forge is experimental. Common reasons:
Unsupported operators
- Not all PyTorch ops implemented
- Check tt-forge-models for validated examples
Model architecture
- Very new architectures may not work
- Dynamic shapes not supported
- Control flow limited
Environment variable pollution (most common!)
unset TT_METAL_HOME unset TT_METAL_VERSION # Then try again
Recommendation: Start with MobileNetV2 (Image Classification with TT-Forge default) - known to work.
Q: How do I know if my model is supported?
A:
For TT-Forge:
- Check tt-forge-models repository
- 169 validated models listed
- Start with these before trying others
For vLLM:
- Llama family well-supported (2, 3, 3.1, 3.2)
- Mistral supported
- Qwen supported (needs N300+ for larger models)
- Check documentation for your specific model
For TT-XLA:
- Most JAX/Flax models work
- PyTorch/XLA support growing
- GPT-2 demo included (JAX Inference with TT-XLA)
Troubleshooting
Q: Command failed with "ImportError: undefined symbol"
A: This is almost always environment variable pollution.
Fix:
unset TT_METAL_HOME
unset TT_METAL_VERSION
# Retry your command
Make permanent:
Add to ~/.bashrc:
# Prevent TT-Metal environment pollution
unset TT_METAL_HOME
unset TT_METAL_VERSION
Why this happens: Different versions of libraries loaded due to environment variables overriding build paths.
Q: vLLM server won't start - what do I check?
A: Systematic debugging:
1. Check environment variables:
echo $TT_METAL_HOME # Should be ~/tt-metal
echo $MESH_DEVICE # Should match your hardware (N150, etc.)
echo $PYTHONPATH # Should include $TT_METAL_HOME
2. Verify model path:
ls ~/models/Llama-3.1-8B-Instruct/config.json
3. Check for other processes:
ps aux | grep -E "tt-metal|vllm"
# Kill if needed:
# pkill -9 -f vllm
4. Verify vLLM installation:
source ~/tt-vllm-venv/bin/activate
python3 -c "import vllm; print(vllm.__version__)"
5. Check device availability:
tt-smi
# Should show your device
Q: "Out of memory" errors - what can I do?
A: Several strategies:
1. Reduce context length:
# Instead of:
--max-model-len 65536
# Try:
--max-model-len 32768
2. Reduce batch size:
# Instead of:
--max-num-seqs 32
# Try:
--max-num-seqs 16
3. Use smaller model:
- 8B β 3B (Llama-3.2-3B)
- 8B β 1B (Llama-3.2-1B)
4. Clear device state:
sudo pkill -9 -f tt-metal
sudo rm -rf /dev/shm/tenstorrent* /dev/shm/tt_*
tt-smi -r
Q: Build failed - where do I look?
A:
tt-metal build issues:
cd ~/tt-metal
./build_metal.sh 2>&1 | tee build.log
# Check build.log for errors
Common build failures:
- Missing dependencies:
sudo apt-get install build-essential cmake - Python version: Need 3.9+ (check with
python3 --version) - Disk space: Need 10GB+ free
- Memory: Need 16GB+ RAM for building
TT-Forge build issues:
- Python 3.11 required: Can't use older Python
- clang-17 required:
sudo apt-get install clang-17 - Environment variables: Must unset TT_METAL_HOME first
Q: TTNN import errors or symbol undefined errors in cloud environments - how do I fix them?
A: After rolling back or updating tt-metal versions, TTNN bindings may become incompatible.
Symptoms:
ImportError: undefined symbol: _ZN2tt9tt_fabric15SetFabricConfigENS0...ImportError: undefined symbol: MPIX_Comm_revoke- TTNN examples that previously worked now fail
Common Cause: Rolling back or updating tt-metal versions (for example, to match specific vLLM compatibility) can break TTNN bindings.
Solution - Clean Rebuild to Known-Good Version:
Note your original working commit:
cd ~/tt-metal git log --oneline | head -5 # Save the commit hash that was workingCheckout the known-good version:
cd ~/tt-metal git checkout 5143b856eb # Replace with your working commit git submodule update --init --recursiveComplete clean rebuild:
cd ~/tt-metal # Clean all build artifacts rm -rf build build_Release # Reinstall dependencies sudo ./install_dependencies.sh # Rebuild from scratch ./build_metal.shTest TTNN:
source ~/tt-metal/python_env/bin/activate export LD_LIBRARY_PATH=/opt/openmpi-v5.0.7-ulfm/lib:$LD_LIBRARY_PATH export PYTHONPATH=~/tt-metal:$PYTHONPATH python3 -m ttnn.examples.usage.run_op_on_device
Important Notes:
- The original/untouched tt-metal version is often the most stable
- Rolling back to older commits can create incompatible bindings
- Always do a complete clean rebuild after changing commits
- OpenMPI library path is required:
/opt/openmpi-v5.0.7-ulfm/lib
Known-Good Commit (as of Dec 2024):
5143b856eb(Oct 28, 2024) - Stable TTNN, validated on N150
Q: Getting OpenMPI errors - how do I fix them?
A: OpenMPI library path errors are common and easy to fix.
Symptoms:
- Errors mentioning "libmpi.so" or "OpenMPI"
- "ImportError: cannot open shared object file"
- Commands fail with MPI-related errors
Fix:
export LD_LIBRARY_PATH=/opt/openmpi-v5.0.7-ulfm/lib:$LD_LIBRARY_PATH
Make permanent:
Add to ~/.bashrc:
# OpenMPI library path for Tenstorrent
export LD_LIBRARY_PATH=/opt/openmpi-v5.0.7-ulfm/lib:$LD_LIBRARY_PATH
Then reload:
source ~/.bashrc
Why this happens: The OpenMPI library installation isn't in the system's default library search path, so you need to explicitly tell the dynamic linker where to find it.
Alternative OpenMPI paths: If the above doesn't work, try:
# Find your OpenMPI installation
find /opt -name "libmpi.so*" 2>/dev/null
# Use the directory containing the .so files
export LD_LIBRARY_PATH=/path/to/openmpi/lib:$LD_LIBRARY_PATH
Q: Downloads are slow or failing
A:
Slow downloads:
- HuggingFace throttles anonymous requests
- Solution: Login with
hf auth login - Consider downloading overnight for large models
Failing downloads:
- Check internet connection
- Verify HF authentication (see authentication question above)
- Check disk space:
df -h ~ - Try resuming:
hf download meta-llama/Llama-3.1-8B-Instruct \ --local-dir ~/models/Llama-3.1-8B-Instruct \ --resume-download
Performance & Optimization
Q: How can I speed up inference?
A:
After first load (model in memory):
- Already fast: 1-3 seconds per query typical
- Can't improve much: Hardware-optimized already
For batch processing:
- Use vLLM's batching:
--max-num-seqs 32 - Process multiple requests together
- 3-5x throughput improvement
For lower latency:
- Reduce
max_tokensparameter (shorter responses = faster) - Use smaller model (8B β 3B)
- Consider hardware upgrade (N150 β N300)
Q: What are good vLLM server parameters?
A: Recommended by hardware:
N150 (single chip):
--max-model-len 65536 # Full 64K context
--max-num-seqs 16 # Moderate batching
--block-size 64 # Standard
N300 (dual chip):
--max-model-len 131072 # Full 128K context
--max-num-seqs 32 # Higher batching
--block-size 64
--tensor-parallel-size 2 # Use both chips
T3K (8 chips):
--max-model-len 131072
--max-num-seqs 64 # High batching
--block-size 64
--tensor-parallel-size 8 # Use all chips
Conservative (if OOM errors):
- Reduce
max-model-lenby 50% - Reduce
max-num-seqsby 50% - Test incrementally
Q: How do I monitor performance?
A:
Token generation speed:
# In vLLM output, look for:
"Generated 150 tokens in 2.5 seconds (60 tokens/sec)"
Server metrics:
# vLLM exposes Prometheus metrics:
curl http://localhost:8000/metrics
System monitoring:
# GPU-like monitoring for TT:
watch -n 1 tt-smi
Load testing:
# Install hey:
go install github.com/rakyll/hey@latest
# Test throughput:
hey -n 100 -c 10 -m POST \
-H "Content-Type: application/json" \
-d '{"model": "...", "messages": [...]}' \
http://localhost:8000/v1/chat/completions
Q: How can I visualize my hardware usage?
A: Four layers of observability, from quick to deep:
1. tt-smi β live telemetry
tt-smi # interactive TUI with chip temp, DRAM usage, firmware version
tt-smi -s # structured JSON snapshot (scriptable)
watch -n 1 tt-smi -s | python3 -c "import sys,json; d=json.load(sys.stdin); print(d)"
2. tt-toplike β htop-style real-time view
An htop-inspired Rust TUI that shows per-chip utilization, process list, temperature, and power draw in real time. Install via pip install tt-toplike or the system package.
tt-toplike
3. ttnn-visualizer β model execution analysis
A web-based tool that loads a tt-metal performance trace and renders interactive graphs: operation timelines, memory usage over time, tensor shapes, buffer allocation maps, and the full operation flow graph. Run after a profiled inference pass to understand exactly where time is spent.
4. tensix-viz β chip topology education
An interactive JavaScript canvas visualizer showing the actual Tensix grid layout β which cores are compute vs DRAM vs ETH β and animating what different workload types look like. Useful for building the mental model before you profile.
Community & Support
Q: Where can I get help?
A:
Official channels:
- Discord: https://discord.gg/tenstorrent (most active)
- GitHub Issues:
- Documentation: https://docs.tenstorrent.com
When asking for help, include:
- Hardware type (N150/N300/T3K/P100)
- Error message (full text)
- Command you ran
- Output of
tt-smi - Which lesson you're on
Q: How do I report a bug?
A:
Before reporting:
- Search existing issues on GitHub
- Verify hardware works (
tt-smi) - Try reset (
tt-smi -r) - Check you're on latest tt-metal/vLLM
When reporting, include:
Hardware: N150
OS: Ubuntu 22.04
tt-metal version: [git rev-parse HEAD output]
vLLM version: [pip show vllm]
Error: [paste full error]
Steps to reproduce: [numbered list]
Good issue = faster fix!
Q: Can I contribute?
A: Yes! Several ways:
- Bring up new models
- Earn rewards
- Official contribution path
2. Documentation
- Fix typos/errors
- Add examples
- Improve tutorials
3. Code contributions
- Bug fixes
- Performance improvements
- New features
Start here:
- Join Discord #contributing channel for guidance
- Ask about "good first issue" opportunities
- Review documentation at https://docs.tenstorrent.com
Q: Is this production-ready?
A: Depends on component:
Production-ready (β ):
- tt-metal - Stable, tested
- vLLM - Production-grade serving
- TT-XLA - Production compiler
Experimental (β οΈ):
- TT-Forge - Beta, limited model support
- Some models - Check validation status
Recommendation:
- For production: Stick with vLLM + validated models
- For experimentation: Try TT-Forge, new models
- Always test thoroughly before production deployment
Quick Reference
Essential Commands
# Hardware
tt-smi # Check hardware
tt-smi -s # Structured output
tt-smi -r # Reset device
# Model info
ls ~/models/ # List installed models
du -sh ~/models/* # Check model sizes
# Environment
python3 -c "import ttnn; print('β')" # Test tt-metal
hf --version # Check HF CLI
# vLLM
source ~/tt-vllm-venv/bin/activate # Activate venv
curl http://localhost:8000/health # Check server
curl http://localhost:8000/metrics # Get metrics
# Cleanup
sudo pkill -9 -f "tt-metal|vllm" # Kill processes
sudo rm -rf /dev/shm/tt_* # Clear shared memory
tt-smi -r # Reset hardware
Quick Diagnostic
Run this to check your setup:
#!/bin/bash
echo "=== Tenstorrent Diagnostic ==="
echo ""
echo "Hardware:"
tt-smi -s 2>&1 | grep -o '"board_type": "[^"]*"' || echo "β No hardware detected"
echo ""
echo "tt-metal:"
python3 -c "import ttnn; print('β Working')" 2>&1 || echo "β Not working"
echo ""
echo "Models:"
ls ~/models/ 2>/dev/null | head -3 || echo "β No models found"
echo ""
echo "Disk space:"
df -h ~ | grep -v Filesystem
echo ""
echo "Python:"
python3 --version
Advanced Learning Resources
Q: Where can I learn about low-level RISC-V programming on Tenstorrent hardware?
A: Check out the CS Fundamentals series - Module 1 covers RISC-V & Computer Architecture!
Each Tensix core contains five RISC-V processors (RV32IM ISA):
- BRISC (RISCV_0) - Primary data movement
- NCRISC (RISCV_1) - Network operations
- TRISC0/1/2 - Compute pipeline (unpack, math, pack)
With 176 Tensix cores on Wormhole, that's 880 RISC-V cores you can program directly!
What Module 1 includes:
- β Von Neumann architecture & fetch-decode-execute cycle
- β RISC-V ISA fundamentals
- β Hands-on example: Add two integers in RISC-V assembly
- β Build and run tt-metal programming examples
- β Explore kernel source code
- β Comprehensive exploration guide (60+ pages)
Topics covered across 7 CS Fundamentals modules:
- RISC-V architecture and memory maps
- Memory hierarchy and cache locality
- Parallel computing (scale from 1 to 880 cores!)
- NoC (Network-on-Chip) programming
- Synchronization and barriers
- Abstraction layers and compilation
- Computational complexity in practice
Access the series:
- From Welcome page β CS Fundamentals section
- Or start with Module 1: RISC-V & Computer Architecture
View the full guide:
Open RISC-V Exploration Guide- Comprehensive deep-dive documentation
Perfect for:
- Developers who want to understand the hardware at the lowest level
- Embedded systems programmers exploring RISC-V at scale
- Computer architecture enthusiasts
- Anyone optimizing kernel performance
Still Have Questions?
Check:
- Specific lesson troubleshooting sections
- CLAUDE.md for detailed technical info
- Discord #help channel
Remember: Most issues are:
- Environment variables (unset TT_METAL_HOME)
- Permissions (try sudo or add to tenstorrent group)
- Device state (reset with tt-smi -r)
When in doubt:
tt-smi -r
sudo rm -rf /dev/shm/tt_*
# Then retry
Last updated: May 2026 Extension version: 0.0.438
Found an error in this FAQ? Please report it on GitHub or Discord!