Getting Started

This document walks you through how to set up TT-XLA. TT-XLA is a front end for TT-Forge that is primarily used to ingest JAX models via jit compile, providing a StableHLO (SHLO) graph to the TT-MLIR compiler. TT-XLA leverages PJRT to integrate JAX, tt-mlir and Tenstorrent hardware. Please see this blog post for more information about PJRT project. This project is a fork of iree-pjrt.

NOTE: Currently, only Tenstorrent Nebula boards are supported.

This is the main Getting Started page. There are two additional Getting Started pages depending on what you want to do. They are all described here, with links provided to each.

The following topics are covered:

NOTE: If you encounter issues, please request assistance on the TT-XLA Issues page.

Setup Options

TT-XLA can be used to run JAX models on Tenstorrent's AI hardware. Because TT-XLA is open source, you can also develop and add features to it. Setup instructions differ based on the task. You have the following options, listed in order of difficulty:

Configuring Hardware

Before setup can happen, you must configure your hardware. You can skip this section if you already completed the configuration steps. Otherwise, this section of the walkthrough shows you how to do a quick setup using TT-Installer.

  1. Configure your hardware with TT-Installer using the Quick Installation section here.

  2. Reboot your machine.

  3. Please ensure that after you run this script, after you complete reboot, you activate the virtual environment it sets up - source ~/.tenstorrent-venv/bin/activate.

  4. After your environment is running, to check that everything is configured, type the following:

tt-smi

You should see the Tenstorrent System Management Interface. It allows you to view real-time stats, diagnostics, and health info about your Tenstorrent device.

TT-SMI

Installing a Wheel and Running an Example

This section walks you through downloading and installing a wheel. You can install the wheel wherever you would like if it is for running a model.

  1. Make sure you are in an active virtual environment. This walkthrough uses the same environment you activated to look at TT-SMI in the Configuring Hardware section. If you are using multiple TT-Forge front ends to run models, you may want to set up a separate virtual environment instead. For example:
python3 -m venv .xla-venv
source .xla-venv/bin/activate
  1. Download the wheel in your active virtual environment:
pip install pjrt-plugin-tt --extra-index-url https://pypi.eng.aws.tenstorrent.com/
  1. You are now ready to try running a model. Navigate to the section of the TT-Forge repo that contains TT-XLA demos.

  2. For this walkthrough, the demo in the gpt2 folder is used. In the gpt2 folder, in the requirements.txt file, you can see that flax and transformers are necessary to run the demo. Install them:

pip install flax transformers
  1. Download the demo.py file from the gpt2 folder inside your activated virtual environment in a place where you can run it. The demo you are about to run takes a piece of text and tries to predict the next word that logically follows.

  2. Run the model:

python demo.py
  1. If all goes well you should see the prompt "The capital of France is", the predicted next token, the probability it will occur, and a list of other ranked options that could follow instead.

Other Setup Options

If you want to keep your environment completely separate in a Docker container, or you want to develop TT-XLA further, this section links you to the pages with those options:

Where to Go Next

Now that you have set up the TT-XLA wheel, you can compile and run other demos. See the TT-XLA folder in the TT-Forge repo for other demos you can try.