N150 N300 T3K P100 P150 P300C Galaxy 15 min Blocked

Understanding Custom Training

Welcome to the Custom Training series! This lesson provides a conceptual foundation for understanding how to build and customize AI models on Tenstorrent hardware.

What You'll Learn

Time: 10-15 minutes | Prerequisites: Basic understanding of machine learning concepts


Custom Training vs Inference

So far in this extension, you've learned how to run pre-trained models (inference). Now you'll learn how to create your own models (training).

Inference (What You've Done)

Training (What We'll Build)

Key insight: Training is where the magic happens. A model is just a collection of numbers (weights) until training teaches it what those numbers should be.


Two Paths to Custom Models

Path 1: Fine-Tuning (Lessons CT-2 through CT-6)

Start with a pre-trained model, teach it something new.

When to use:

Example: Take TinyLlama (general language model) and fine-tune it to explain machine learning concepts in creative ways.

Analogy: Like hiring an experienced developer and training them on your company's codebase.

Path 2: Training from Scratch (Lessons CT-7 and CT-8)

Build a model from the ground up.

When to use:

Example: Build a tiny transformer (10-20M parameters) that learns language patterns from scratch.

Analogy: Like teaching yourself programming from first principles.


The Training Framework Ecosystem

Tenstorrent's training ecosystem is designed around clarity and modularity. Here's how the pieces fit together:

tt-metal (Foundation)

tt-train (Training Framework)

tt-blacksmith (Development Patterns)

How they work together:

graph TD
    A[Your Training Script] --> B[tt-train APIHigh-level Training Interface]
    B --> C[tt-metal SDKHardware Operations]
    C --> D[Tenstorrent HardwareN150/N300/T3K/P100/Galaxy]

    E[tt-blacksmith Patterns] -.->|Best PracticesConfig Organization| A

    style A fill:#4A90E2,stroke:#333,stroke-width:2px
    style B fill:#7B68EE,stroke:#333,stroke-width:2px
    style C fill:#7B68EE,stroke:#333,stroke-width:2px
    style D fill:#50C878,stroke:#333,stroke-width:2px
    style E fill:#6C757D,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5

Think of it like web development:


The tt-blacksmith Philosophy

tt-blacksmith isn't just a collection of bounty scripts - it's a framework for making things work on Tenstorrent hardware. Here are its key patterns:

1. Configuration-Driven Everything

Instead of hardcoding values, use YAML configs:

training_config:
  batch_size: 8
  learning_rate: 1e-4
  num_epochs: 3

device_config:
  enable_ddp: False    # Single device
  mesh_shape: [1, 1]

logging_config:
  use_wandb: false     # Optional experiment tracking
  log_level: "INFO"

Why: Easy to experiment, reproduce, and share configurations.

2. Modular Organization

Separate concerns into focused components:

Why: Easier to debug, test, and reuse code.

3. Progressive Enhancement

Start simple, add complexity when needed:

  1. File-based logging → WandB integration
  2. Single device → Multi-device DDP
  3. Fine-tuning → Training from scratch

Why: Learn incrementally, avoid over-engineering.


Understanding the Training Process

Training a model is like teaching through repetition - show examples, measure mistakes, make corrections, repeat. Here's the complete flow:

graph TD
    A[Raw DataText files, datasets] --> B[Prepare DataJSONL format]
    B --> C[Initialize ModelPre-trained OR random weights]

    C --> D{Training LoopMultiple epochs}

    D --> E[Get Batch8-32 examples]
    E --> F[Forward PassModel makes predictions]
    F --> G[Compute LossHow wrong?]
    G --> H[Backward PassCalculate gradients]
    H --> I[Update WeightsOptimizer step]

    I --> J{More Batches?}
    J -->|Yes| E
    J -->|No| K[EvaluationGenerate samples, check quality]

    K --> L[Save CheckpointModel weights + optimizer state]

    L --> M{Continue Training?}
    M -->|Yes, more epochs| D
    M -->|No, training complete| N[DeploymentUse with vLLM for inference]

    style A fill:#4A90E2,stroke:#333,stroke-width:2px
    style B fill:#7B68EE,stroke:#333,stroke-width:2px
    style C fill:#7B68EE,stroke:#333,stroke-width:2px
    style D fill:#E85D75,stroke:#333,stroke-width:3px
    style E fill:#7B68EE,stroke:#333,stroke-width:2px
    style F fill:#7B68EE,stroke:#333,stroke-width:2px
    style G fill:#7B68EE,stroke:#333,stroke-width:2px
    style H fill:#7B68EE,stroke:#333,stroke-width:2px
    style I fill:#7B68EE,stroke:#333,stroke-width:2px
    style K fill:#7B68EE,stroke:#333,stroke-width:2px
    style L fill:#E85D75,stroke:#333,stroke-width:2px
    style N fill:#50C878,stroke:#333,stroke-width:2px

What each step does:

Step 1: Prepare Data

Transform raw text into training format (JSONL with prompt/response pairs). Quality matters more than quantity here.

Step 2: Initialize Model

Either load pre-trained weights (fine-tuning) or start from random numbers (training from scratch). Most of the time, you'll fine-tune.

Step 3: Training Loop (The Core)

This is where learning happens:

  1. Get Batch - Load 8-32 examples from your dataset
  2. Forward Pass - Model makes predictions based on current weights
  3. Compute Loss - Measure how far predictions are from correct answers
  4. Backward Pass - Calculate which direction to adjust each weight
  5. Update Weights - Actually change the model's parameters
  6. Repeat - Do this thousands of times

Think of loss as: A score that goes down as the model gets better. Loss of 2.5 → 1.2 → 0.5 means it's learning.

Step 4: Evaluation

Generate sample outputs to see if the model is improving. This happens every few hundred steps, not every step.

Step 5: Save Checkpoint

Store model weights and training state so you can resume if interrupted or pick the best version later.

Step 6: Deployment

Once training is complete, use your fine-tuned model for inference. Integrate with vLLM (from Lesson 7: vLLM Production) for production serving.


Hardware Considerations

N150 (Single Wormhole Chip)

N300 (Dual Wormhole Chips)

T3K / Blackhole / Galaxy (Advanced)

For this series: We'll focus on N150 (everyone can follow) with N300 examples for scaling.


Training Examples Throughout This Series

This series uses concrete examples to teach transferable principles:

CT-4 (Fine-tuning Basics):

CT-7 and CT-8 (Architecture & Training from Scratch):

Why these examples:

The goal: Learn principles you can apply to your custom models and domains.


What You'll Build (Series Overview)

Lessons CT-2 and CT-3: Preparation

Lesson CT-4: Your First Training Run

Lessons CT-5 and CT-6: Scaling Up

Lessons CT-7 and CT-8: Advanced Topics


Common Questions

"Should I fine-tune or train from scratch?"

99% of the time: fine-tune.

Fine-tuning is:

Train from scratch when:

"How much data do I need?"

For fine-tuning:

For training from scratch:

Quality > Quantity: 200 high-quality examples beat 10,000 mediocre ones.

"Will fine-tuning erase what the model learned?"

No, if done correctly.

Think of it as: Teaching a PhD new skills, not wiping their memory.

"Can I use this for commercial projects?"

Yes, with caveats:

Always verify licenses for your specific use case.


Beyond This Lesson: The Custom AI Landscape

You're about to learn how to train custom models - but what will you build with this power? Let's explore the possibilities.

What Developers Have Built on Tenstorrent

Real-world custom models running on TT hardware:

🎯 Domain-Specific Coding Assistants

📚 Knowledge Specialists

🎨 Creative Applications

🔬 Research & Experimentation

Working Within Constraints (N150 Can Do This!)

You don't need massive infrastructure to build something meaningful:

The magic is in the data and the task definition, not the hardware scale.

Imagine: Your Custom Model Journey

Month 1 (Starting Today):

Month 2-3:

Month 6+:

From Learning to Leading

This series teaches you:

But more importantly, it empowers you to:

The question isn't "Can I train a custom model on Tenstorrent hardware?"

The question is "What will I build first?"


Key Takeaways

Training creates models, inference uses them

Fine-tuning is usually the right choice for custom models

tt-train provides the framework for training on TT hardware

tt-blacksmith shows the patterns for organizing training code

Start with N150, scale to N300+ when needed

Focus on data quality over quantity

Examples in this series teach transferable principles


Next Steps

Lesson CT-2: Dataset Fundamentals

Now that you understand the concepts, it's time to get hands-on. In the next lesson, you'll:

  1. Create your first training dataset (JSONL format)
  2. Validate dataset format
  3. Understand tokenization and batching
  4. See how data flows through training

Estimated time: 15 minutes | Prerequisites: This lesson (CT-1)


Additional Resources

Official Documentation

Community


Ready to build your first dataset? Continue to Lesson CT-2: Dataset Fundamentals