# Getting Started
This document walks you through how to set up TT-XLA. TT-XLA is a front end for TT-Forge that ingests JAX models via jit compile and PyTorch models through [torch-xla](https://github.com/pytorch/xla), providing StableHLO (SHLO) graphs to the TT-MLIR compiler. TT-XLA leverages [PJRT](https://github.com/openxla/xla/tree/main/xla/pjrt/c#pjrt---uniform-device-api) to integrate JAX, [TT-MLIR](https://github.com/tenstorrent/tt-mlir) and Tenstorrent hardware. Please see [this](https://opensource.googleblog.com/2023/05/pjrt-simplifying-ml-hardware-and-framework-integration.html) blog post for more information about the PJRT project.

> **NOTE:** If you encounter issues, please request assistance on the
>[TT-XLA Issues](https://github.com/tenstorrent/tt-xla/issues) page.

## Prerequisites

### 1. Set Up the Hardware
- Follow the instructions for the Tenstorrent device you are using at: [Hardware Setup](https://docs.tenstorrent.com)

### 2. Install Software (choose one)
- **Option 1: Quick path:** Use TT-Installer using: [Software Installation](https://docs.tenstorrent.com/getting-started/README.html#software-installation)

- **Option 2: Manual path:** For more control, follow the [manual software dependencies installation guide.](https://docs.tenstorrent.com/getting-started/manual-software-install.html)

## TT-XLA Installation Options

- [Option 1: Installing a Wheel and Running an Example](#installing-a-wheel-and-running-an-example)

   You should choose this option if you want to run models.

- [Option 2: Using a Docker Container to Run an Example](#using-a-docker-container-to-run-an-example)

   Choose this option if you want to keep the environment for running models separate from your existing environment.

- [Option 3: Building from Source](#building-from-source)

   This option is best if you want to develop TT-XLA further. It's a more complex process you are unlikely to need if you want to stick with running a model.

---

### Installing a Wheel and Running an Example

To install a wheel and run an example model, do the following:

#### Step 1. Install the Latest Wheel:

Download the latest wheel with pip.
```bash
pip install pjrt-plugin-tt --extra-index-url https://pypi.eng.aws.tenstorrent.com/
```

Run the tt-forge-install script to install missing system dependencies.
```
tt-forge-install
```

#### Step 2. Run some models:

Use `wget` to fetch each demo script into your current directory, and install packages with pip when noted.

**MNIST (small CNN)**

The [**mnist.py**](https://github.com/tenstorrent/tt-xla/blob/main/examples/pytorch/mnist.py) example runs a simple CNN on a Tenstorrent device and compares the output against a CPU reference.

```bash
wget https://raw.githubusercontent.com/tenstorrent/tt-xla/main/examples/pytorch/mnist.py
python mnist.py
```

You should see the model output and a PCC (Pearson Correlation Coefficient) check confirming the TT device output matches the CPU reference.

**Tiny Llama (Hugging Face `transformers`)**

The [**tiny_llama_demo.py**](https://github.com/tenstorrent/tt-forge/blob/main/demos/tt-xla/nlp/pytorch/tiny_llama_demo.py) example in the TT-Forge repo loads a small LLM from Hugging Face, compiles it with `torch.compile(..., backend="tt")`, and prints top-token predictions. You must download the script and install [`transformers`](https://pypi.org/project/transformers/) (and its dependencies); the wheel install in Step 1 does not include them. The first run also downloads model weights from Hugging Face over the network.

```bash
wget https://raw.githubusercontent.com/tenstorrent/tt-forge/main/demos/tt-xla/nlp/pytorch/tiny_llama_demo.py
pip install transformers
python tiny_llama_demo.py
```

You should see the prompt "The capital of France is", the predicted next token, the probability it will occur, and a list of other ranked options that could follow instead, for example:
```
Prompt: `The capital of France is`
Top prediction: `Paris`

Rank   Token           Probability
-----------------------------------
1      'Paris'         36.9141%
2      'located'       10.0098%
3      'the'           8.8867%
4      'a'             4.2480%
5      'in'            2.5391%
```

---

### Using a Docker Container to Run an Example

This section walks through the installation steps for using a Docker container for your project.
- Prerequisite: Docker must be installed. See the [official Docker installation guide](https://docs.docker.com/engine/install/ubuntu/) if needed.

#### Step 1. Run the Docker container:

   ```bash
   docker run -it --rm \
   --device /dev/tenstorrent \
   -v /dev/hugepages-1G:/dev/hugepages-1G \
   ghcr.io/tenstorrent/tt-xla-slim:latest
   ```

   >**NOTE:** You cannot isolate devices in containers. You must pass through all devices even if you are only using one. You can do this by passing ```--device /dev/tenstorrent```. Do not try to pass ```--device /dev/tenstorrent/1``` or similar, as this type of device-in-container isolation will result in fatal errors later on during execution.

- If you want to check that it is running, open a new tab with the **Same Command** option and run the following:

   ```bash
   docker ps
   ```

#### Step 2: Running Models in Docker

Use `wget` to fetch each demo script into your current directory, and install packages with pip when noted.

**MNIST (small CNN)**

The [**mnist.py**](https://github.com/tenstorrent/tt-xla/blob/main/examples/pytorch/mnist.py) example runs a simple CNN on a Tenstorrent device and compares the output against a CPU reference.

```bash
wget https://raw.githubusercontent.com/tenstorrent/tt-xla/main/examples/pytorch/mnist.py
python mnist.py
```

You should see the model output and a PCC (Pearson Correlation Coefficient) check confirming the TT device output matches the CPU reference.

**Tiny Llama (Hugging Face `transformers`)**

The [**tiny_llama_demo.py**](https://github.com/tenstorrent/tt-forge/blob/main/demos/tt-xla/nlp/pytorch/tiny_llama_demo.py) example in the TT-Forge repo loads a small LLM from Hugging Face, compiles it with `torch.compile(..., backend="tt")`, and prints top-token predictions. You must download the script and install [`transformers`](https://pypi.org/project/transformers/) (and its dependencies); the slim image does not include them. The first run also downloads model weights from Hugging Face over the network.

```bash
wget https://raw.githubusercontent.com/tenstorrent/tt-forge/main/demos/tt-xla/nlp/pytorch/tiny_llama_demo.py
pip install transformers
python tiny_llama_demo.py
```

You should see the prompt "The capital of France is", the predicted next token, the probability it will occur, and a list of other ranked options that could follow instead, for example:
```
Prompt: `The capital of France is`
Top prediction: `Paris`

Rank   Token           Probability
-----------------------------------
1      'Paris'         36.9141%
2      'located'       10.0098%
3      'the'           8.8867%
4      'a'             4.2480%
5      'in'            2.5391%
```

---

### Building from Source

Install from source if you are a developer who wants to develop for TT-XLA.

#### Step 1: Prerequisites

- TT-XLA has the following system dependencies:
   * Ubuntu 24.04
   * Python 3.12
   * python3.12-venv
   * Clang 20
   * GCC 13
   * Ninja
   * CMake 4.0.3

- TT-XLA additionally requires the following libraries:

   ```bash
   sudo apt install protobuf-compiler libprotobuf-dev
   sudo apt install ccache
   sudo apt install libnuma-dev
   sudo apt install libhwloc-dev
   sudo apt install libboost-all-dev
   sudo apt install libnsl-dev
   ```

#### Step 2: Building the TT-MLIR Toolchain

- Before compiling TT-XLA, the TT-MLIR toolchain needs to be built:
   - Clone the [tt-mlir](https://github.com/tenstorrent/tt-mlir) repo.
   - Follow the TT-MLIR [build instructions](https://docs.tenstorrent.com/tt-mlir/getting-started.html#setting-up-the-environment-manually) to set up the environment and build the toolchain.

- After building the toolchain, set the following environment variables:

| Variable | Required | Description |
|----------|----------|-------------|
| `TTMLIR_TOOLCHAIN_DIR` | Yes | Path to TT-MLIR toolchain (e.g., `/opt/ttmlir-toolchain/`) |
| `TTXLA_LOGGER_LEVEL` | No | Set to `DEBUG` or `VERBOSE` for detailed logs |

#### Step 3: Building TT-XLA

   Make sure you are not in the TT-MLIR build directory, and you are in the location where you want TT-XLA to install.

1. Clone TT-XLA:

   ```bash
   git clone https://github.com/tenstorrent/tt-xla.git
   ```

2. Navigate into the TT-XLA folder:
   ```bash
   cd tt-xla
   ```

3. Initialize third-party submodules:

   ```bash
   git submodule update --init --recursive
   ```

4. Run the following set of commands to build TT-XLA (this will build the PJRT plugin and install it into `venv`):

   ```bash
   source venv/activate
   cmake -G Ninja -B build # -DCMAKE_BUILD_TYPE=Debug in case you want debug build
   cmake --build build
   ```

5. To verify that everything is working correctly, run the following command:

   ```bash
   python -c "import jax; print(jax.devices('tt'))"
   ```

   The command should output all available TT devices, e.g. `[TTDevice(id=0, arch=Wormhole_b0)]`

6. (optional) If you want to build the TT-XLA wheel, run the following command:

   ```bash
   cd python_package
   python setup.py bdist_wheel
   ```

   The above command outputs a `python_package/dist/pjrt_plugin_tt*.whl` file which is self-contained. To install the created wheel, run:

   ```bash
   pip install dist/pjrt_plugin_tt*.whl
   ```

   The wheel has the following structure:
   ```text
   pjrt_plugin_tt/                     # PJRT plugin package
      |-- __init__.py
      |-- pjrt_plugin_tt.so               # PJRT plugin binary
      |-- tt-metal/                       # tt-metal runtime dependencies (kernels, riscv compiler/linker, etc.)
      `-- lib/                            # shared library dependencies (tt-mlir, tt-metal)
   jax_plugin_tt/                      # Thin JAX wrapper
      `-- __init__.py                     # imports and sets up pjrt_plugin_tt for XLA
   torch_plugin_tt                     # Thin PyTorch/XLA wrapper
      `-- __init__.py                     # imports and sets up pjrt_plugin_tt for PyTorch/XLA
   ```

   It contains a custom Tenstorrent PJRT plugin (`pjrt_plugin_tt.so`) and its dependencies (`tt-mlir` and `tt-metal`). Additionally, there are thin wrappers for JAX (`jax_plugin_tt`) and PyTorch/XLA (`torch_plugin_tt`) that import the PJRT plugin and set it up for use with the respective frameworks.

## Testing
The TT-XLA repo contains various tests in the **tests** directory. To run an individual test, `pytest -svv` is recommended in order to capture all potential error messages down the line. Multi-chip tests can be run only on specific Tenstorrent hardware, therefore these tests are structured in folders named by the Tenstorrent cards/systems they can be run on. For example, you can run `pytest -v tests/jax/multi_chip/n300` only on a system with an n300 Tenstorrent card. Single-chip tests can be run on any system with the command `pytest -v tests/jax/single_chip`.

## Common Build Errors
- Building TT-XLA requires `clang-20`. Please make sure that `clang-20` is installed on the system and `clang/clang++` links to the correct version of the respective tools.
- Please also see the TT-MLIR [docs](https://github.com/tenstorrent/tt-mlir/blob/main/docs/src/getting-started.md#common-build-errors) for common build errors.

## Pre-commit
Pre-commit applies a git hook to the local repository such that linting is checked and applied on every `git commit` action. Install it from the root of the repository using:

```bash
source venv/activate
pre-commit install
```

If you have already committed something locally before installing the pre-commit hooks, you can run this command to check all files:

```bash
pre-commit run --all-files
```

For more information please visit [pre-commit](https://pre-commit.com/).

## Where to Go Next

- Try more examples in the [TT-XLA examples directory](https://github.com/tenstorrent/tt-xla/tree/main/examples)
- Learn about [Improving Model Performance](./performance.md)
- Explore [Code Generation](./getting_started_codegen.md) to convert models into standalone code