N150 N300 T3K P100 P300C 60 min Draft

Video Generation via Frame-by-Frame Diffusion

The Idea

Generate video one frame at a time, then stitch with ffmpeg.

Rather than a native video generation model, this lesson uses the proven Stable Diffusion 1.4 model that runs on every Tenstorrent chip — from a single N150 to a T3K (8 chips). Each frame is a text-to-image generation pass. A handful of carefully worded prompts becomes a short film.

Hardware this works on:

Requires ~/tt-metal built from source. If you don't have that yet, see Build tt-metal from Source first.


What We'll Build

A short video showcasing "Tenstorrent at the 1964–1965 World's Fair" using:


Prerequisites

Install ffmpeg if needed:

sudo apt-get install -y ffmpeg

Step 1: Set Up Environment

source ~/tt-metal/python_env/bin/activate
export TT_METAL_HOME=~/tt-metal
export PYTHONPATH=$TT_METAL_HOME:$PYTHONPATH
cd ~/tt-metal

For P100 / P300c (Blackhole):

export TT_METAL_ARCH_NAME=blackhole

Step 2: Authenticate with HuggingFace

The demo auto-downloads CompVis/stable-diffusion-v1-4 on first run. You need a HuggingFace account and to be logged in:

hf auth login
# Enter your HuggingFace token when prompted

Verify:

hf auth whoami

Step 3: Create Your Video Prompts File

Create a JSON file describing your 10 frames. Each object has a single "prompt" key.

mkdir -p ~/tt-scratchpad/worldsfair-video
cat > ~/tt-scratchpad/worldsfair-video/prompts.json << 'EOF'
[
  {"prompt": "Tenstorrent pavilion at 1964 World's Fair, futuristic dome architecture, orange and white corporate colors, crowds in 1960s attire, Kodachrome photo"},
  {"prompt": "vintage 1964 corporate display, Tenstorrent AI accelerator prototype, blinking lights, orange circuit boards, businessmen in suits examining technology, documentary photography"},
  {"prompt": "1960s scientist demonstrating Tenstorrent neural network computer, mainframe-style cabinet with orange panels, oscilloscope displays, amazed visitors, retro-futurism"},
  {"prompt": "1964 Tenstorrent brochure design, geometric mid-century modern graphics, orange and teal color scheme, optimistic corporate advertising aesthetic"},
  {"prompt": "Tenstorrent executives presenting at 1964 World's Fair press conference, vintage microphones, presentation boards, journalists with cameras"},
  {"prompt": "children and families interacting with Tenstorrent AI demonstration, 1960s interactive console, colorful buttons and displays, educational exhibit"},
  {"prompt": "Tenstorrent computing center at World's Fair, rows of AI accelerator cabinets, operators in white coats, blinking lights, 1960s corporate technology photography"},
  {"prompt": "Tenstorrent pavilion at night, illuminated dome, World's Fair Unisphere in background, neon signs, vintage night photography"},
  {"prompt": "futuristic prediction display, 1960s interpretation of future technology, retro-futuristic artwork, optimistic mid-century illustration style"},
  {"prompt": "thank you for visiting Tenstorrent, 1964 corporate signage, World's Fair closing ceremony, nostalgic vintage photograph, orange sunset lighting"}
]
EOF

Feel free to replace these with your own theme!


Step 4: Generate Frames

Run the batch demo with your prompts file. The demo downloads CompVis/stable-diffusion-v1-4 on first run (a few hundred MB) then compiles kernels before the first image.

cd ~/tt-metal
pytest --disable-warnings \
  --input-path="$HOME/tt-scratchpad/worldsfair-video/prompts.json" \
  models/demos/wormhole/stable_diffusion/demo/demo.py::test_demo

What happens:

  1. Model weights download (first run only, ~2 min)
  2. Kernel compilation runs (first run only, ~5 min)
  3. Each frame generates in sequence
  4. Images save to the current directory as:
    • input_data_0_512x512_ttnn.png
    • input_data_1_512x512_ttnn.png
    • ...
    • input_data_9_512x512_ttnn.png

Note: The default output path is the directory where you run pytest. Run cd ~/tt-metal first so images land there, then move them afterward.

Expected generation time per frame (after compilation):

First frame is always slower — kernel compilation adds 3–5 minutes on initial run. Subsequent runs use cached compiled kernels and are much faster.


Step 5: Collect Your Frames

Move the generated images to your video directory and rename them sequentially:

mkdir -p ~/tt-scratchpad/worldsfair-video/frames
cd ~/tt-metal

for i in $(seq 0 9); do
  if [ -f "input_data_${i}_512x512_ttnn.png" ]; then
    mv "input_data_${i}_512x512_ttnn.png" \
       ~/tt-scratchpad/worldsfair-video/frames/frame_$(printf "%03d" $i).png
    echo "Moved frame $i"
  fi
done

ls ~/tt-scratchpad/worldsfair-video/frames/

Step 6: Stitch Frames into Video

cd ~/tt-scratchpad/worldsfair-video

# 2 fps = each frame shows for 0.5 seconds → 10 frames = 5 second video
ffmpeg -framerate 2 \
  -pattern_type glob -i 'frames/frame_*.png' \
  -vf 'format=yuv420p,scale=512:512' \
  -c:v libx264 -crf 18 \
  tenstorrent_worldsfair_1964.mp4

echo "Done! Video saved as tenstorrent_worldsfair_1964.mp4"

ffmpeg parameters:

Try -framerate 4 or -framerate 1 for different pacing.


Step 7: Try the Interactive Mode

For a more exploratory workflow — type a prompt, see the image immediately:

cd ~/tt-metal
pytest models/demos/wormhole/stable_diffusion/demo/demo.py::test_interactive_demo

The model stays loaded between prompts. Type a prompt, press Enter, and interactive_512x512_ttnn.png (or interactive_256x256_ttnn.png) appears in the current directory. Type q to exit.

Use this to iterate on your prompt wording before committing to a full 10-frame batch.


Understanding Hardware Scaling

The same demo code runs across hardware tiers — the difference is parallelism:

Hardware Chips Relative Speed Use Case
N150 1 1× (baseline) Development, testing
N300 2 ~2× faster Faster iteration
T3K 8 ~6× faster Production video
P100 / P300c 1 BH ~1× Blackhole validation

Benchmark example (10 frames at 512×512):

This is the TT hardware advantage: write for N150, scale to T3K with zero code changes.


Customize Your Video

Adjust Frame Timing (ffmpeg)

# Slower: 1 fps (1 second per frame)
ffmpeg -framerate 1 -pattern_type glob -i 'frames/frame_*.png' \
  -vf 'format=yuv420p' -c:v libx264 -crf 18 output_slow.mp4

# Faster: 4 fps
ffmpeg -framerate 4 -pattern_type glob -i 'frames/frame_*.png' \
  -vf 'format=yuv420p' -c:v libx264 -crf 18 output_fast.mp4

Add a Crossfade Transition

# Use the minterpolate filter to interpolate between frames
ffmpeg -framerate 2 -pattern_type glob -i 'frames/frame_*.png' \
  -vf 'minterpolate=fps=24:mi_mode=mci,format=yuv420p' \
  -c:v libx264 -crf 18 output_smooth.mp4

Different Themes

Replace the prompts JSON with any theme. Some ideas:


Troubleshooting

"Generation is very slow (>5 min per frame after warmup)"

Likely running on CPU, not TT hardware:

  1. Check TT_METAL_HOME is set: echo $TT_METAL_HOME
  2. Verify venv is activated: which python3 should show tt-metal/python_env/
  3. Check device: tt-smi -s | grep board_type

"Device in bad state after a killed process"

tt-smi -r   # Reset device
# Wait ~30 seconds
# Re-run pytest

"Module not found" errors

source ~/tt-metal/python_env/bin/activate
export PYTHONPATH=$TT_METAL_HOME:$PYTHONPATH

"huggingface-hub not authenticated"

hf auth login

"P300c / QB2: ROW dispatch error"

The SD demo uses ttnn.open_device() without hardcoding dispatch axis, so it should be Blackhole-safe. If you see dispatch errors, ensure TT_METAL_ARCH_NAME=blackhole is set.

"Ran out of frames to resume from"

If a run was interrupted, the demo can re-run and will overwrite existing files. Rename already-completed frames before re-running to preserve them.


Key Takeaways


What's Next?