Tenstorrent Developer Extension - FAQ

Frequently Asked Questions - Your quick reference for common questions, troubleshooting, and tips from all 48 lessons.


Table of Contents


Getting Started

Q: Which lesson should I start with?

A: Start with Hardware Detection if you're brand new. The 48 lessons are organized into 9 categories:

🚀 Your First Inference (5 lessons)

  1. Hardware Detection → Verify Installation → Download Model → Interactive Chat → API Server

🏭 Serving Models (4 lessons) Production servers (tt-inference-server, vLLM) and generation (Image, Video)

🎓 Custom Training (8 lessons) ⭐ NEW! Fine-tune models or train from scratch - validated on hardware with both workflows working!

🎯 Applications (2 lessons) Coding Assistant, AnimateDiff Video Generation

👨‍🍳 Tenstorrent Cookbook (6 lessons) Game of Life, Audio, Mandelbrot, Image Filters, Particle Life + Overview

🔧 Compilers & Tools (2 lessons) TT-Forge, TT-XLA

🧠 CS Fundamentals (7 lessons) Computer Architecture, Memory, Parallelism, Networks, Synchronization, Abstraction, Complexity

🎓 Advanced Topics (5 lessons) tt-installer, Bounty Program, Explore Metalium, Koyeb Deployment (2)

Can I skip lessons? Yes! Categories are independent - jump to what interests you.

Q: Do I need to complete lessons in order?

A: Not strictly, but:

Quick start for experienced users:

  1. Run Hardware Detection (2 minutes - verify hardware)
  2. Skip to Production Inference with vLLM (production serving)
  3. Explore advanced topics (compilers, RISC-V, bounty program)

Q: What's the difference between the different tools?

A: Tenstorrent has several tools serving different purposes:

Tool Purpose When to Use Maturity
tt-metal Low-level framework Custom kernels, maximum control Stable
vLLM LLM serving Production LLM deployment Production
TT-Forge MLIR compiler PyTorch models (experimental) Beta
TT-XLA XLA compiler JAX/PyTorch (production) Production

Simple guide:


Remote Development & SSH

Q: Can I use this extension from my Mac/Windows laptop to access remote Tenstorrent hardware?

A: Yes! Use VSCode's Remote-SSH extension - the industry-standard solution for remote development.

This is the recommended approach for:

Why Remote-SSH is perfect for this:

Q: How do I set up Remote-SSH for Tenstorrent development?

A: Quick setup guide:

Step 1: Install Remote-SSH extension

  1. Open VSCode on your local machine (Mac/Windows)
  2. Open Extensions panel (Cmd+Shift+X or Ctrl+Shift+X)
  3. Search for "Remote - SSH"
  4. Install the official Microsoft extension

Step 2: Configure SSH connection

Add your Tenstorrent machine to SSH config:

# On your local machine, edit ~/.ssh/config
# (Cmd+Shift+P → "Remote-SSH: Open Configuration File")

Host tenstorrent-dev
  HostName 192.168.1.100        # Your hardware machine IP
  User ubuntu                   # Your username
  IdentityFile ~/.ssh/id_rsa    # Your SSH key
  ForwardAgent yes              # Optional: Forward SSH agent

Step 3: Connect to remote machine

  1. Cmd+Shift+P (or Ctrl+Shift+P) → "Remote-SSH: Connect to Host"
  2. Select "tenstorrent-dev"
  3. New VSCode window opens connected to remote machine

Step 4: Install Tenstorrent extension on remote

  1. In the remote VSCode window, go to Extensions
  2. Search for "Tenstorrent Developer Extension"
  3. Click "Install in SSH: tenstorrent-dev"

Step 5: Start using lessons!

Q: Do the lessons work through Remote-SSH?

A: Yes, perfectly! Remote-SSH makes everything transparent:

What works automatically:

Example workflow:

  1. Connect via Remote-SSH from your Mac
  2. Open Tenstorrent walkthrough (works like local)
  3. Run Hardware Detection → tt-smi runs on remote
  4. Download model → Saves to remote ~/models/
  5. Start vLLM server → Runs on remote, port auto-forwarded
  6. Test from local browser → http://localhost:8000 works!

No code changes needed - The extension doesn't know or care that you're remote!

Q: What about SSH without Remote-SSH extension?

A: Not recommended. Manual SSH has major problems:

File operations break - Extension reads/writes local filesystem, not remote ❌ Path mismatches - ~/models/ on Mac ≠ ~/models/ on remote ❌ Complex escaping - Terminal commands get mangled through SSH ❌ No port forwarding - Can't access servers on localhostPoor UX - Feels disconnected, hard to debug

Example of problems:

If you manually SSH in terminal:

# This command in lesson creates file on your MAC, not remote!
cat > ~/tt-scratchpad/script.py << 'EOF'
...
EOF

Then this fails because the file is on the wrong machine:

ssh user@remote python3 ~/tt-scratchpad/script.py

With Remote-SSH: Both operations happen on remote automatically.

Q: Can multiple people share the same remote hardware?

A: Yes, but with considerations:

Shared hardware works best with:

Limitations:

Best practice for teams:

# User 1
vllm ... --port 8001

# User 2
vllm ... --port 8002

# Each user accesses their own server
curl http://localhost:8001/...  # User 1
curl http://localhost:8002/...  # User 2

Q: What about Tenstorrent Cloud? Does Remote-SSH work?

A: Yes! Tenstorrent Cloud instances are perfect for Remote-SSH:

Typical setup:

  1. Get Tenstorrent Cloud instance (pre-configured with hardware)
  2. Receive SSH credentials
  3. Add to ~/.ssh/config on your laptop
  4. Connect via Remote-SSH
  5. Start developing!

Cloud benefits:

Example cloud SSH config:

Host tt-cloud
  HostName cloud.instance.tenstorrent.com
  User your-username
  IdentityFile ~/.ssh/tt-cloud-key
  ForwardAgent yes

Q: Are there performance considerations with Remote-SSH?

A: Remote-SSH is very efficient:

Fast operations (no noticeable latency):

What uses bandwidth:

Best practices:

Real-world experience:

Q: How do I disconnect from remote machine?

A: Several options:

Graceful disconnect:

From command palette:

Important: vLLM servers keep running after disconnect!

# Before disconnecting, you may want to:
docker ps                    # Note container IDs
docker stop <container-id>   # Stop servers

# Or leave them running and reconnect later

Reconnecting:


Hardware & Detection

Q: Can I try Tenstorrent development without hardware?

A: Yes! Use ttsim - Tenstorrent's full-system simulator.

What is ttsim:

Quick Start:

# Download simulator (replace vX.Y with latest version)
mkdir -p ~/sim
cd ~/sim
wget https://github.com/tenstorrent/ttsim/releases/latest/download/libttsim_wh.so

# Copy SOC descriptor
cp $TT_METAL_HOME/tt_metal/soc_descriptors/wormhole_b0_80_arch.yaml ~/sim/soc_descriptor.yaml

# Set environment variable
export TT_METAL_SIMULATOR=~/sim/libttsim_wh.so

# Run in slow dispatch mode (required for simulator)
export TT_METAL_SLOW_DISPATCH_MODE=1

# Test it works
cd $TT_METAL_HOME
./build/programming_examples/metal_example_add_2_integers_in_riscv

What you CAN do with ttsim:

What you CAN'T do (too slow):

Which lessons work with ttsim:

Resources:

Tip: Use ttsim for learning and kernel development, then move to real hardware for model inference and production workloads.

Q: Which hardware do I have?

A: Run this command:

tt-smi -s | grep -o '"board_type": "[^"]*"'

Output tells you:

Q: tt-smi says "No devices found" - what do I do?

A: Try these steps in order:

  1. Check PCIe detection:

    lspci | grep -i tenstorrent
    

    Should show: Processing accelerators: Tenstorrent Inc.

  2. Try with sudo:

    sudo tt-smi
    

    If this works, you have a permissions issue.

  3. Reset the device:

    tt-smi -r
    
  4. Full cleanup (if still failing):

    sudo pkill -9 -f tt-metal
    sudo pkill -9 -f vllm
    sudo rm -rf /dev/shm/tenstorrent* /dev/shm/tt_*
    tt-smi -r
    

Still not working? Check the Hardware Detection lesson troubleshooting section for detailed steps.

Q: What's the difference between Wormhole and Blackhole?

A:

For production: Stick with Wormhole (N150/N300/T3K) - more models validated.

For experimentation: Blackhole offers newer features but check model compatibility.

Q: How do I know what my hardware can run?

A: Quick reference:

Hardware Max Model Size Max Context Multi-chip Best For
N150, P100 8B 64K No (TP=1) Development, prototyping
N300, P150 13B 128K Yes (TP=2) Medium models, multi-user
T3K 70B+ 128K Yes (TP=8) Large models, production

Q: What happens to running jobs and hardware utilization when a system suspends?

A: When the system goes into suspend, all running jobs on Tenstorrent hardware are interrupted and effectively terminated, and hardware utilization drops to zero. On resume, the driver re-initializes the device (similar to a reset), so any workloads must be restarted. In normal cases you don't need a full reboot; if the device doesn't come back cleanly, run tt-smi -r (reset) or reboot the host.


Installation & Setup

Q: How do I verify tt-metal is working?

A: Run this quick test:

python3 -c "import ttnn; print('✓ tt-metal ready')"

If it fails:

Q: Which Python version do I need?

A:

Check your version:

python3 --version

Q: Where should models be installed?

A: Standard locations:

Both formats needed for some lessons:

Q: How much disk space do I need?

A: Plan for:

Minimum for this extension: 100GB free space


Models & Downloads

Q: Which model should I download first?

A: Llama-3.1-8B-Instruct - covered in Download Model.

Why this model:

Download command:

huggingface-cli download meta-llama/Llama-3.1-8B-Instruct \
  --local-dir ~/models/Llama-3.1-8B-Instruct

Q: How do I handle HuggingFace authentication?

A: Three options:

Option 1: Environment variable (recommended for scripts)

export HF_TOKEN=your_token_from_huggingface
huggingface-cli download meta-llama/Llama-3.1-8B-Instruct --local-dir ~/models/Llama-3.1-8B-Instruct

Option 2: Interactive login (recommended for manual use)

huggingface-cli login
# Paste your token when prompted

Option 3: In code

from huggingface_hub import login
login(token="your_token_from_huggingface")

Get a token: https://huggingface.co/settings/tokens

Q: Download failed with "repository not found" - why?

A: Gated models require access request:

  1. Go to model page on HuggingFace
  2. Click "Request access" button
  3. Wait for approval (usually instant for Llama)
  4. Ensure you're authenticated (see question above)

For Llama models: Must accept Meta's license agreement.

Q: Can I use models from other sources?

A: Yes, but:

Recommendation: Stick with HuggingFace - most compatible.


Inference & Serving

Q: Which inference method should I use?

A: Depends on your goal:

Method Lesson Best For Speed (after load)
One-shot demo Download Model Testing, verification 2-5 min per query
Interactive chat Interactive Chat Learning, prototyping 1-3 sec per query
Flask API API Server Simple custom APIs 1-3 sec per query
vLLM Production Inference Production serving 1-3 sec per query

Quick guide:

Q: Why does first load take 2-5 minutes?

A: Model initialization involves:

  1. Loading weights from disk (~16GB for Llama-8B)
  2. Converting to TT-Metal format
  3. Distributing to hardware cores
  4. JIT compilation of kernels

This is normal and only happens once.

Subsequent queries are fast (1-3 seconds) because model stays in memory.

Q: Can I run multiple models simultaneously?

A: On same hardware: No (one model at a time per device)

Workarounds:

Q: What does "context length" mean and why does it matter?

A:

Hardware limits:

Exceeding context?

RuntimeError: Input sequence length exceeds maximum

Solutions:

Q: Getting PyTorch dataclass errors with vLLM - how do I fix them?

A: This error (TypeError: must be called with a dataclass type or instance) is caused by PyTorch version mismatches.

Error looks like:

TypeError: must be called with a dataclass type or instance
# ... torch/_inductor/runtime/hints.py errors

Root cause: vLLM on Tenstorrent hardware requires PyTorch 2.5.0+cpu specifically. Other versions (2.4.x, 2.7.x) cause compatibility issues.

Solution: Recreate your vLLM environment

bash ~/tt-scratchpad/setup-vllm-env.sh

This automated script:

Verify your environment:

source ~/activate-vllm-env.sh
python3 -c "import torch; print('PyTorch version:', torch.__version__)"
# Should print: PyTorch version: 2.5.0+cpu

Why the specific version? TT-Metal hardware drivers are built against PyTorch 2.5.0+cpu APIs. Other versions have incompatible dataclass implementations.


Custom Training

Q: Can I train models on Tenstorrent hardware?

A: Yes! The extension now includes 8 complete Custom Training lessons (CT1-CT8) that are fully validated on hardware.

What's working:

Recommended version: tt-metal v0.66.0-rc7 (fully tested)

Q: What hardware do I need for training?

A: Training requirements depend on model size:

N150 (Wormhole single-chip):

N300+ (Wormhole dual-chip or higher):

Recommendation: Start with N150 and NanoGPT to learn the workflow!

Q: What's the difference between fine-tuning and training from scratch?

A:

Fine-tuning (CT4):

Training from Scratch (CT8):

Which should I start with? CT8 (from-scratch) - it's faster on N150 with NanoGPT and teaches fundamentals!

Q: What tt-metal version do I need for training?

A: Training requires v0.66.0-rc5 or later

Why:

Check your version:

cd $TT_METAL_HOME && git describe --tags

See CT4 and CT8 lessons for complete setup instructions!


Compilers & Tools

Q: What's the difference between TT-Forge and TT-XLA?

A:

Feature TT-Forge TT-XLA
Status Experimental Production-ready
Multi-chip Single only Yes (TP/DP)
Frameworks PyTorch, ONNX JAX, PyTorch/XLA
Model support Limited (169 validated) Broader
Installation Complex (build from source) Simple (pip)

When to use TT-Forge:

When to use TT-XLA:

Q: Why did my model fail to compile in TT-Forge?

A: TT-Forge is experimental. Common reasons:

  1. Unsupported operators

    • Not all PyTorch ops implemented
    • Check tt-forge-models for validated examples
  2. Model architecture

    • Very new architectures may not work
    • Dynamic shapes not supported
    • Control flow limited
  3. Environment variable pollution (most common!)

    unset TT_METAL_HOME
    unset TT_METAL_VERSION
    # Then try again
    

Recommendation: Start with MobileNetV2 (Image Classification with TT-Forge default) - known to work.

Q: How do I know if my model is supported?

A:

For TT-Forge:

For vLLM:

For TT-XLA:


Troubleshooting

Q: Command failed with "ImportError: undefined symbol"

A: This is almost always environment variable pollution.

Fix:

unset TT_METAL_HOME
unset TT_METAL_VERSION
# Retry your command

Make permanent: Add to ~/.bashrc:

# Prevent TT-Metal environment pollution
unset TT_METAL_HOME
unset TT_METAL_VERSION

Why this happens: Different versions of libraries loaded due to environment variables overriding build paths.

Q: vLLM server won't start - what do I check?

A: Systematic debugging:

1. Check environment variables:

echo $TT_METAL_HOME    # Should be ~/tt-metal
echo $MESH_DEVICE      # Should match your hardware (N150, etc.)
echo $PYTHONPATH       # Should include $TT_METAL_HOME

2. Verify model path:

ls ~/models/Llama-3.1-8B-Instruct/config.json

3. Check for other processes:

ps aux | grep -E "tt-metal|vllm"
# Kill if needed:
# pkill -9 -f vllm

4. Verify vLLM installation:

source ~/tt-vllm-venv/bin/activate
python3 -c "import vllm; print(vllm.__version__)"

5. Check device availability:

tt-smi
# Should show your device

Q: "Out of memory" errors - what can I do?

A: Several strategies:

1. Reduce context length:

# Instead of:
--max-model-len 65536

# Try:
--max-model-len 32768

2. Reduce batch size:

# Instead of:
--max-num-seqs 32

# Try:
--max-num-seqs 16

3. Use smaller model:

4. Clear device state:

sudo pkill -9 -f tt-metal
sudo rm -rf /dev/shm/tenstorrent* /dev/shm/tt_*
tt-smi -r

Q: Build failed - where do I look?

A:

tt-metal build issues:

cd ~/tt-metal
./build_metal.sh 2>&1 | tee build.log
# Check build.log for errors

Common build failures:

TT-Forge build issues:

Q: TTNN import errors or symbol undefined errors in cloud environments - how do I fix them?

A: After rolling back or updating tt-metal versions, TTNN bindings may become incompatible.

Symptoms:

Common Cause: Rolling back or updating tt-metal versions (for example, to match specific vLLM compatibility) can break TTNN bindings.

Solution - Clean Rebuild to Known-Good Version:

  1. Note your original working commit:

    cd ~/tt-metal
    git log --oneline | head -5
    # Save the commit hash that was working
    
  2. Checkout the known-good version:

    cd ~/tt-metal
    git checkout 5143b856eb  # Replace with your working commit
    git submodule update --init --recursive
    
  3. Complete clean rebuild:

    cd ~/tt-metal
    # Clean all build artifacts
    rm -rf build build_Release
    
    # Reinstall dependencies
    sudo ./install_dependencies.sh
    
    # Rebuild from scratch
    ./build_metal.sh
    
  4. Test TTNN:

    source ~/tt-metal/python_env/bin/activate
    export LD_LIBRARY_PATH=/opt/openmpi-v5.0.7-ulfm/lib:$LD_LIBRARY_PATH
    export PYTHONPATH=~/tt-metal:$PYTHONPATH
    python3 -m ttnn.examples.usage.run_op_on_device
    

Important Notes:

Known-Good Commit (as of Dec 2024):

Q: Getting OpenMPI errors - how do I fix them?

A: OpenMPI library path errors are common and easy to fix.

Symptoms:

Fix:

export LD_LIBRARY_PATH=/opt/openmpi-v5.0.7-ulfm/lib:$LD_LIBRARY_PATH

Make permanent: Add to ~/.bashrc:

# OpenMPI library path for Tenstorrent
export LD_LIBRARY_PATH=/opt/openmpi-v5.0.7-ulfm/lib:$LD_LIBRARY_PATH

Then reload:

source ~/.bashrc

Why this happens: The OpenMPI library installation isn't in the system's default library search path, so you need to explicitly tell the dynamic linker where to find it.

Alternative OpenMPI paths: If the above doesn't work, try:

# Find your OpenMPI installation
find /opt -name "libmpi.so*" 2>/dev/null

# Use the directory containing the .so files
export LD_LIBRARY_PATH=/path/to/openmpi/lib:$LD_LIBRARY_PATH

Q: Downloads are slow or failing

A:

Slow downloads:

Failing downloads:

  1. Check internet connection
  2. Verify HF authentication (see authentication question above)
  3. Check disk space: df -h ~
  4. Try resuming:
    huggingface-cli download meta-llama/Llama-3.1-8B-Instruct \
      --local-dir ~/models/Llama-3.1-8B-Instruct \
      --resume-download
    

Performance & Optimization

Q: How can I speed up inference?

A:

After first load (model in memory):

For batch processing:

For lower latency:

Q: What are good vLLM server parameters?

A: Recommended by hardware:

N150 (single chip):

--max-model-len 65536   # Full 64K context
--max-num-seqs 16       # Moderate batching
--block-size 64         # Standard

N300 (dual chip):

--max-model-len 131072  # Full 128K context
--max-num-seqs 32       # Higher batching
--block-size 64
--tensor-parallel-size 2  # Use both chips

T3K (8 chips):

--max-model-len 131072
--max-num-seqs 64       # High batching
--block-size 64
--tensor-parallel-size 8  # Use all chips

Conservative (if OOM errors):

Q: How do I monitor performance?

A:

Token generation speed:

# In vLLM output, look for:
"Generated 150 tokens in 2.5 seconds (60 tokens/sec)"

Server metrics:

# vLLM exposes Prometheus metrics:
curl http://localhost:8000/metrics

System monitoring:

# GPU-like monitoring for TT:
watch -n 1 tt-smi

Load testing:

# Install hey:
go install github.com/rakyll/hey@latest

# Test throughput:
hey -n 100 -c 10 -m POST \
  -H "Content-Type: application/json" \
  -d '{"model": "...", "messages": [...]}' \
  http://localhost:8000/v1/chat/completions

Community & Support

Q: Where can I get help?

A:

Official channels:

When asking for help, include:

  1. Hardware type (N150/N300/T3K/P100)
  2. Error message (full text)
  3. Command you ran
  4. Output of tt-smi
  5. Which lesson you're on

Q: How do I report a bug?

A:

Before reporting:

  1. Search existing issues on GitHub
  2. Verify hardware works (tt-smi)
  3. Try reset (tt-smi -r)
  4. Check you're on latest tt-metal/vLLM

When reporting, include:

Hardware: N150
OS: Ubuntu 22.04
tt-metal version: [git rev-parse HEAD output]
vLLM version: [pip show vllm]
Error: [paste full error]
Steps to reproduce: [numbered list]

Good issue = faster fix!

Q: Can I contribute?

A: Yes! Several ways:

1. Bounty Program

2. Documentation

3. Code contributions

Start here:

Q: Is this production-ready?

A: Depends on component:

Production-ready (✅):

Experimental (⚠️):

Recommendation:


Quick Reference

Essential Commands

# Hardware
tt-smi                                    # Check hardware
tt-smi -s                                # Structured output
tt-smi -r                                # Reset device

# Model info
ls ~/models/                            # List installed models
du -sh ~/models/*                       # Check model sizes

# Environment
python3 -c "import ttnn; print('✓')"   # Test tt-metal
which huggingface-cli                   # Check HF CLI

# vLLM
source ~/tt-vllm-venv/bin/activate      # Activate venv
curl http://localhost:8000/health       # Check server
curl http://localhost:8000/metrics      # Get metrics

# Cleanup
sudo pkill -9 -f "tt-metal|vllm"       # Kill processes
sudo rm -rf /dev/shm/tt_*              # Clear shared memory
tt-smi -r                               # Reset hardware

Quick Diagnostic

Run this to check your setup:

#!/bin/bash
echo "=== Tenstorrent Diagnostic ==="
echo ""
echo "Hardware:"
tt-smi -s 2>&1 | grep -o '"board_type": "[^"]*"' || echo "❌ No hardware detected"
echo ""
echo "tt-metal:"
python3 -c "import ttnn; print('✓ Working')" 2>&1 || echo "❌ Not working"
echo ""
echo "Models:"
ls ~/models/ 2>/dev/null | head -3 || echo "❌ No models found"
echo ""
echo "Disk space:"
df -h ~ | grep -v Filesystem
echo ""
echo "Python:"
python3 --version

Advanced Learning Resources

Q: Where can I learn about low-level RISC-V programming on Tenstorrent hardware?

A: Check out the CS Fundamentals series - Module 1 covers RISC-V & Computer Architecture!

Each Tensix core contains five RISC-V processors (RV32IM ISA):

With 176 Tensix cores on Wormhole, that's 880 RISC-V cores you can program directly!

What Module 1 includes:

Topics covered across 7 CS Fundamentals modules:

Access the series:

View the full guide:

Perfect for:


Still Have Questions?

Check:

  1. Specific lesson troubleshooting sections
  2. CLAUDE.md for detailed technical info
  3. Discord #help channel

Remember: Most issues are:

When in doubt:

tt-smi -r
sudo rm -rf /dev/shm/tt_*
# Then retry

Last updated: January 2026 Extension version: 0.0.283

Found an error in this FAQ? Please report it on GitHub or Discord!