Introduction

The tt-blacksmith project contains optimized training recipes for a variety of machine learning models on Tenstorrent hardware, powered by the tt-forge compiler stack. Showcasing the compiler's flexibility, it enables the use of popular AI frameworks like PyTorch and JAX for training workflows.

The main goal of this project are:

Demonstrations: Practical examples and workflows showcasing how to train various ML models on Tenstorrent hardware.

Getting Started w tt-blacksmith

Setup

To run experiments on Tenstorrent hardware, users must first build and activate either the TT-Forge-FE (for PyTorch) or tt-xla (for JAX) frontend environment using the provided scripts.

Build Frontend environment

TT-Forge-FE

To build the TT-Forge-FE frontend, run:

./scripts/build_frontends.sh --ffe

tt-xla

Since tt-xla depends on the MLIR environment, you can set the TTMLIR_TOOLCHAIN_DIR to point to your toolchain directory. If not specified, it defaults to:

/opt/ttmlir-toolchain

If you're setting up for the first time (or don't have the MLIR environment installed), do a full build:

./scripts/build_frontends.sh --xla --full

For subsequent builds, a regular rebuild is enough:

./scripts/build_frontends.sh --xla

Activating Frontend Environment

To activate the Python environment for a specific frontend:

For TT-Forge-FE:

source ./scripts/activate_frontend.sh --ffe

For tt-xla:

source ./scripts/activate_frontend.sh --xla

Running Experiments

This section guides you through the process of running experiments included in this project, allowing you to reproduce results and explore different configurations.

Explore Available Experiments: Browse the experiments documentation to find a list of all available experiments.
Understand Experiment Details: Before running an experiment, review its dedicated README file for high-level description and specific instructions.
Execute the Experiment: Follow the detailed steps outlined in the experiment's README file to run it successfully.
Experiment with Configurations: Feel free to modify the experiment configurations (e.g., parameters) as described in the README to observe their impact on the results.

Visual Demo: 3D Reconstruction with NeRF

Experiments

This page provides an overview of the experiments included in this repository, detailing their organization.

Available Experiments

The following table provides an overview of different model and dataset combinations within various frameworks explored in this project.

Framework	Model	Dataset	Devices	Details
Lightning	MLP	MNIST	TT	README
JAX	MLP	MNIST	TT	README
Lightning	NeRF	Blender	TT	README
PyTorch	Llama	SST-2	GPU	README

Navigating the Experiment Structure

Within this repository, you'll find the following structure to help you navigate the experimental setup:

datasets/: The dataset loaders for specific model training are defined in this directory and organized by the framework they utilize. For example, the loader for the MNIST dataset can be found at datasets/mnist/.
models/: This directory is organized by framework. Within it, you'll find subdirectories (e.g., jax/, pytorch/) containing the model implementations or loader scripts specific to that framework. For instance, the JAX implementation of a model for MNIST training would typically be located in models/jax/mnist/.
experiments/: Experiments are organized first by the framework they utilize, and then by the specific model or task. For example, the JAX-based MNIST experiment can be found under blacksmith/experiments/jax/mnist/. Within each experiment directory, you will typically find the following files:
- A Python file defining the configuration structure for the experiment (e.g. configs.py).
- A YAML file containing the specific configuration parameters for a particular run of the experiment (e.g. test_jax_mnist.yml).
- The Python script responsible for running the experiment using the defined configurations (e.g. test_pure_jax_mnist.py).

tt-blacksmith documentation