Functional Simulator
TT-Lang includes a functional simulator that runs operations as pure Python, without requiring Tenstorrent hardware or the full compiler stack. Use it to validate kernel logic and iterate quickly during development.
The simulator typically supports more language features than the compiler at any given point — see the functionality matrix for current coverage.
Setup
The recommended path is to install the simulator from PyPI:
python3 -m venv --prompt ttlang ttlang-venv
source ttlang-venv/bin/activate
pip install tt-lang-sim
tt-lang-setup
See Getting Started — Install from PyPI
for details. tt-lang-sim runs on Linux and macOS and does not require
Tenstorrent hardware. That install adds tt-lang-sim and the trace post-processor
tt-lang-sim-stats to your PATH. There is no separate PyPI package for
statistics; tt-lang-sim-stats ships only as a console entry point with the
simulator distributions (tt-lang-sim, or full tt-lang, which includes the same
simulator).
To run the simulator from a source checkout instead (without building the
compiler), configure with -DTTLANG_SIM_ONLY=ON to create just the Python
environment:
cmake -G Ninja -B build -DTTLANG_SIM_ONLY=ON
cmake --build build
source build/env/activate
This skips the LLVM, tt-mlir, and tt-metal builds entirely and only sets up the Python venv with runtime dependencies.
If you have already built the full TT-Lang compiler (source build/env/activate), the simulator works without any additional setup.
Running
tt-lang-sim examples/eltwise_add.py
Run the simulator test suite:
python -m pytest test/sim/
Some tests are marked slow and skipped by default. Pass --run-slow to
include them (the hardware CI always does; the GitHub-hosted sim CI does not):
python -m pytest test/sim/ --run-slow
Float32 Promotion
By default the simulator promotes all floating-point dtypes narrower than float32 to float32 before any computation:
Declared dtype |
Simulator dtype |
|---|---|
|
|
|
|
|
backed by |
|
|
This makes the simulator work correctly on host architectures that lack native support for narrow float types (e.g. Apple Silicon has no hardware bfloat16 or float16 support, so using those types natively would be slow or incorrect).
Disabling promotion
Pass --no-float32-promotion to tt-lang-sim to run with the dtypes declared
in the source file:
tt-lang-sim --no-float32-promotion examples/matmul_1d.py
When to disable promotion
Correctness checks calibrated for the original dtype. Examples that use
ULP-based assertions (assert_with_ulp) with tolerances chosen for bfloat16
precision will fail when run in float32, because the same absolute numerical
difference corresponds to more ULPs in float32 (which has a smaller ULP than
bfloat16). Run these with --no-float32-promotion:
examples/matmul_1d.pyexamples/matmul_1d_mcast.pyexamples/metal_examples/single_node_matmul/ttlang/single_node_matmul.pyexamples/metal_examples/multinode_matmul/ttlang/multinode_matmul.py
L1 memory budget. The simulator uses the declared dtype for all
DataflowBuffer capacity accounting so the reported footprint always matches
what the hardware would allocate, regardless of whether float32 promotion is
active. If the total buffer capacity for a core exceeds the L1 limit, the
simulator issues a warning:
UserWarning: Total DataflowBuffer capacity per core (N bytes) exceeds the L1 memory limit of M bytes.
Memory is accounted using declared dtypes, so this reflects the on-hardware footprint of the kernel.
This warning does not abort execution, but it indicates that the kernel would not fit in hardware L1.
Dtype-specific behavior. If a kernel explicitly tests dtype identity, overflow behavior, or precision characteristics of a specific narrow type, disable promotion so the script runs with the declared dtype.
Simulator statistics (tt-lang-sim-stats)
Tensor, pipe, and dataflow-buffer statistics are not printed by tt-lang-sim
itself. Record a JSON Lines trace with tt-lang-sim using --trace
(after the script path), then pass that
file to tt-lang-sim-stats to print the same summary tables (for sharing,
diffing, or inspecting a run without re-executing the kernel). The
tt-lang-sim-stats command is installed together with tt-lang-sim (or
with full tt-lang); it is not distributed or installed on its own.
From a repository checkout, run ./bin/tt-lang-sim-stats (repo root). After
pip install tt-lang-sim (or pip install tt-lang), or source build/env/activate
from a CMake build, tt-lang-sim-stats is on your PATH. The
underlying entry point is python -m sim_stats; override the interpreter
with PYTHON if needed (for example
PYTHON=python3.12 ./bin/tt-lang-sim-stats trace.jsonl).
Record a JSON Lines trace while simulating (path is optional; the default file name is
trace.jsonl):./bin/tt-lang-sim examples/eltwise_add.py --trace /tmp/my_run.jsonl
Print statistics from that file:
./bin/tt-lang-sim-stats /tmp/my_run.jsonl
Statistics are derived from trace events such as copy_end, pipe_send,
pipe_recv, dfb_reserve_end, and dfb_wait_end. If the trace was recorded
with a restricted event set, some tables may be empty. Regenerate the trace
with tt-lang-sim SCRIPT.py --trace and the default categories, or enable the relevant
groups via --trace-events (see the tracing guide in docs/TRACING.md in the
repository). For full CLI details:
./bin/tt-lang-sim-stats --help
Debugging
The simulator runs as standard Python code, so any Python debugger works with it.
VSCode
Create a debug configuration in .vscode/launch.json:
{
"name": "Debug TT-Lang Simulator",
"type": "debugpy",
"request": "launch",
"module": "ttl.sim.ttlang_sim",
"args": ["${file}"],
"console": "integratedTerminal",
"justMyCode": false,
"cwd": "${workspaceFolder}"
}
Open a TT-NN program file in VSCode (e.g.,
examples/eltwise_add.py)Set breakpoints in your program code
Press F5 or select “Debug TT-Lang Simulator” from the Run menu
The debugger stops at breakpoints, allowing variable inspection and step-through execution