Tracy Profiler
Note
Tools are only fully supported on source builds.
Profiling is an essential part of software development that helps developers gain insight into how their code executes, where time is spent, and which parts of the system may be causing performance bottlenecks. By collecting and analyzing detailed runtime data, profiling tools enable developers to identify inefficient code paths, understand resource usage, and make informed decisions about where to focus optimization efforts. Effective profiling is key to improving performance.
Overview
Tracy is an opens-source C++ profiling tool with sampling and code instrumentation profiling capabilities. Metalium uses a fork for Tracy adapted to the the Tensix Processors as it’s primary profiling tool.
Detais documentation about Tracy itself can be found here: https://github.com/wolfpld/tracy/releases/latest/download/tracy.pdf. Reading the Quick-start guide
section can help with the rest of this documentation.
All host-side code, including python code in tt_metal
can be profiled using Tracy.
Enabling Tracy
To use Tracy for profiling, you need to build Metalium with profiler support enabled. This is done by setting the ENABLE_TRACY=ON
option when configuring CMake. Or with the --enable-profiler
flag in the build script. Profiling features are only available in builds where this option is set.
You can enable the profiler by running:
# Via build script
./build_metal.sh --enable-profiler
# Via CMake flags
cmake . -DENABLE_TRACY=ON
ninja
ninja install
Note that Tracy support is not included in prebuilt binaries or release builds by default. You must build from source with the profiler enabled to use these features.
GUI
Tracy provides a GUI application for viewing profiling results. You can open saved profiling sessions or connect to a remote machine to view real-time profiling data, as long as network access to the remote system is available.
Installing for Mac users
Mac users can install Tracy using Homebrew. Open a terminal and run:
brew uninstall tracy # Remove any old version of Tracy
wget -P ~/ --no-check-certificate --no-cache --no-cookies https://raw.githubusercontent.com/tenstorrent-metal/tracy/master/tracy.rb
brew install ~/tracy.rb
rm ~/tracy.rb
After installation, start the Tracy GUI with:
tracy
Building for Linux users
For Linux users, you need to build the Tracy GUI from source. First, clone the Tracy repository.
git clone https://github.com/tenstorrent/tracy.git
cd tracy/profiler/build/unix
make -j8
A Tracy-release
binary will be generated in the current directory after the build completes. You can run it directly from there or copy it to a directory in your PATH for easier access
./Tracy-release
Starting the GUI
The application will start showing a window similar to the image below.

Capturing Profiling Data
Set the client address to the IP address of the remote machine and port 8086 (e.g. 172.27.28.132:8086), then click connect.
A “Waiting for connection …” dialog will appear after clicking connect.

When the host machine starts running a tracy-enabled application, the GUI will automatically collect profiling data and display it in real time.
Counterintuitively, the Tracy GUI connects as a TCP server, while the profiled application runs as a TCP client, usually connecting to port 8086. If your application host is on a different network than the Tracy GUI, you may need to set up port forwarding or a VPN connection. SSH port forwarding is a common solution:
ssh -NL 8086:127.0.0.1:8086 user@remote-machine
Capturing via Command Line
Alternatively, use the capture-release
CLI tool built under tt-metal when Tracy is enabled, under tt-metal/build/tools/profiler/bin
when Tracy is enabled. This tool acts as a client that saves the profile to disk, which can then be copied and loaded into the GUI later. To use it, run the following command before starting the application:
./capture-release -o output_file_name.tracy
Note
The output of capture-release
is quite compressible. For large profile files, it is recommended to compress them before transferring over the network. You can use the -z
option with rsync, -C
with scp, or standalone tools like gzip or zstd.
Profiling Host Code
C++
With Tracy enabled in the Metalium build, all C++ marked zones will be profiled. Zones in Tracy are marked sections of code that users are interested in profiling. Tracy provides macros such as ZoneScoped;
to accomplish this.
Please refer to section 3 of Tracy’s documentation for further information on zones and available macros.
The following image is a snapshot of Tracy C++ profiling:

For example, the Device
constructor shown above is instrumented as follows:
Device::Device(chip_id_t device_id, const std::vector<uint32_t>& l1_bank_remap) : id_(device_id)
{
ZoneScoped;
this->initialize(l1_bank_remap);
}
Python
Python provides the standard sys.setprofile
and sys.settrace
functions for tracing and profiling Python code. These are used to integrate Python profiling with Tracy.
There are several ways to profile Python code with Tracy in Metalium projects.
Python Scripts
To profile an entire Python script, run your program using the tracy
module as follows:
python -m tracy {test_script}.py
Pytest Sessions
For pytest-based tests, import pytest as a module and pass its arguments as needed. For example, to profile a BERT unit test, run:
python -m tracy -m pytest models/experimental/bert_large_performant/unit_tests/test_bert_large_split_and_transform_qkv_heads.py::test_split_query_key_value_and_split_heads_with_program_cache

Instrumenting Python Code
Instrumentation can also be done without using the pytest fixture.
The following shows how to profile a function called function_under_test
and all of its child python function calls by manually enabling tracy profiler.
def function_under_test():
child_function_1()
child_function_2()
from tracy import Profiler
profiler = Profiler()
profiler.enable()
function_under_test()
profiler.disable()
Similar to the pytest setup, calling the parent script with -p
option will profile the region where profiler is enabled.
Note, it is recommended to sandwich the function call between the enable and disable calls rather than having them as first and last calls in the function being profiled. As settrace
and setprofile
trigger on more relevant events when the setup is done previous to the function call.
Signposts in Python Code
signpost(header, message)
from the tracy
module can be placed anywhere in the code path for your test. This call will produce a row in the op report CSV and a message in the tracy run.
from tracy import signpost
signpost(header="Run number 5", message="This is the run after 5 warmup runs")
run_inference()
signpost(header="Run result post proc")
post_proc()
The above example will show up as follows.
Op report CSV

Tracy run

Line-level Profiling
In some cases, significant duration of a function, does not get broken down to smaller child calls with explainable durations. This is usually either due to inline work that is
not wrapped inside a function or a call to a function that is defined as part of a shared object. For example, pytorch
function calls do not come in as native python calls and will not generate python call events.
Line-level profiling is only provided with partial profiling because it produces substantially more data.
Add -l
option to enable line-level profiling:
python -m tracy -p -l -m pytest models/experimental/bert_large_performant/unit_tests/test_bert_large_split_and_transform_qkv_heads.py::test_split_query_key_value_and_split_heads_with_program_cache
Profiling Device Code
The version of Tracy used in Metalium supports profiling device-side code, including individual Baby RISC-V cores on each Tensix and other tiles on the NoC.
For more details on device-side profiling with Tracy, see Device Program Profiler.