ttrt

This tool is intended to be a swiss army knife for working with flatbuffers generated by the compiler. Its primary role is to inspect and run flatbuffer files. It enables the running of flatbuffer files without a front-end runtime.

Building

  1. Build ttmlir
  2. Build ttrt:
source env/activate
cmake --build build -- ttrt
ttrt --help

Building runtime mode

Add the following flags when building the compiler

-DTTMLIR_ENABLE_RUNTIME=ON

Building perf mode

Add the following flags when building the compiler

-DTTMLIR_ENABLE_RUNTIME=ON
-DTT_RUNTIME_ENABLE_PERF_TRACE=ON

LOGGER Levels

ttrt support logging at different logger levels. You will need to set env var TTRT_LOGGER_LEVEL in command line or a python script. By default, it's set to INFO.

TTRT_LOGGER_LEVEL=INFO
TTRT_LOGGER_LEVEL=CRITICAL
TTRT_LOGGER_LEVEL=ERROR
TTRT_LOGGER_LEVEL=WARNING
TTRT_LOGGER_LEVEL=DEBUG

tt-metal logging

ttrt runtime uses tt-metal for op execution and device interfacing. For more detailed logs, which can help in troubleshooting build or runtime issues, set env var TT_METAL_LOGGER_LEVEL. By default, it is set to FATAL.

export TT_METAL_LOGGER_LEVEL=DEBUG

Installing ttrt as python whls

Every time ttrt is built, it creates a whls file in build/runtime/tools/ttrt/build. Ex filename: ttrt-0.0.235-cp310-cp310-linux_x86_64.whl. You can take this whls file and install it in any docker container and in any venv outside of ttmlir. After which, you can use all the following functionality as the same.

  1. Download whls
  2. Create a python venv
python -m venv ttrt_env
source ttrt_env/bin/activate
  1. Install whls (replace with your version of the whls)
pip install build/runtime/tools/ttrt/build/ttrt-0.0.235-cp310-cp310-linux_x86_64.whl

Generating a flatbuffer

tt-mlir exposes a few ways to generate flatbuffers.

Generate a flatbuffer file from ttir-builder

ttir-builder is a tool for creating TTIR ops, converting them into MLIR modules, running passes to lower modules into backends, and translating to flatbuffers. See documentation for further instructions.

Generate a flatbuffer file from compiler

The compiler supports a pass to load a system descriptor to compile against. You can feed this pass into ttmlir-opt.

  1. Build ttmlir
  2. Build ttrt (see building section on this page)
  3. Generate ttsys file from the system you want to compile for using ttrt. This will create a system_desc.ttsys file under ttrt-artifacts folder.
ttrt query --save-artifacts
  1. Use ttmlir-opt tool in compiler to feed system descriptor. See the ttmlir-opt documentation for more information on how to generate .mlir files.
./build/bin/ttmlir-opt --ttcore-register-device="system-desc-path=/path/to/system_desc.ttsys" --ttir-to-ttnn-backend-pipeline test/ttmlir/Dialect/TTNN/simple_subtract.mlir -o ttnn.mlir
or (pipe path directly into ttir-to-ttnn-backend-pipeline)
./build/bin/ttmlir-opt --ttir-to-ttnn-backend-pipeline="system-desc-path=/path/to/system_desc.ttsys" test/ttmlir/Dialect/TTNN/simple_subtract_to_add.mlir -o ttnn.mlir
  1. Use ttmlir-translate tool in compiler to generate the flatbuffer executable. See the ttmlir-translate documentation for more information on how to generate flatbuffer files.
./build/bin/ttmlir-translate --ttnn-to-flatbuffer ttnn.mlir -o out.ttnn
  1. Run your test cases using ttrt
ttrt run /path/to/out.ttnn

Generate flatbuffer files using llvm-lit

There are already existing .mlir test cases under test/ttmlir/Silicon. You can use llvm-lit tool to generate the corresponding ttnn and ttm files.

  1. Build ttmlir
  2. Build ttrt (see building section on this page)
  3. Generate ttsys file from the system you want to compile for using ttrt. This will create a system_desc.ttsys file under ttrt-artifacts folder.
ttrt query --save-artifacts
  1. Export this file in your environment using export SYSTEM_DESC_PATH=/path/to/system_desc.ttsys. When llvm-lit is run, it will query this variable and generate the ttnn and ttm files using this system. Optionally, you can also provide this manually when running llvm-lit.
  2. Generate your test cases. This will generate all your ttnn and ttm files under build/test/ttmlir/Silicon. ttnn files have a .ttnn file extension and ttmetal files have a .ttm extension.
cmake --build build -- check-ttmlir
  1. (Optional) If you have a single .mlir file (or a directory of custom .mlir files) that you created using the compiler, and you want to generate the corresponding ttnn and ttm files for it, you can run llvm-lit standalone to the path of your .mlir file or directory of .mlir files to generate the flatbuffer executables. You will have to make sure you add in the correct llvm-lit configs into your .mlir file. See section on adding llvm-lit config options inside a .mlir file to create flatbuffer binaries for more info. You must also make sure your .mlir test is found within test/ttmlir/Silicon folder (and point lit to the build folder)!
llvm-lit -v ./build/test/ttmlir/Silicon
or
SYSTEM_DESC_PATH=/path/to/system_desc.ttsys llvm-lit -v ./build/test/ttmlir/Silicon
  1. Run your test cases using ttrt
ttrt run /path/to/test.ttnn
ttrt run /path/to/dir/of/flatbuffers

Adding llvm-lit config options inside a .mlir file to create flatbuffer binaries

Inside of your .mlir file, you can add certain config options that llvm-lit will use when running against that test case. For the purpose of generating flatbuffer executables, you can add --ttcore-register-device="system-desc-path=%system_desc_path%" which will tell llvm-lit to parse the system desc found from the environment flag set by export SYSTEM_DESC_PATH=/path/to/system_desc.ttsys. You can also paste a custom path to a system desc file as well.

// RUN: ttmlir-opt --ttcore-register-device="system-desc-path=%system_desc_path%" --ttnn-layout --convert-ttir-to-ttnn %s  > %t.mlir
// RUN: FileCheck %s --input-file=%t.mlir
// RUN: ttmlir-translate --ttnn-to-flatbuffer %t.mlir > %t.ttnn

Adding new mlir test cases

You can copy your .mlir test file (with the appropriate llvm-lit config options for generating flatbuffer binaries) into test/ttmlir/Silicon. Then, follow generating flatbuffer files using llvm-lit to generate the executables to run!

Versioning

ttrt and flatbuffers have strict versioning check. When running a flatbuffer against ttrt, you have to make sure the flatbuffer was generated using the same version as ttrt (or vice versa). Major and Minor versions are manually set using github tags when releases are made. Patch versioning is the number of commits from the last major/minor tag.

vmajor.minor.patch

The flag --ignore-version can be used to bypass versioning checks. Use at your own risk; it can cause unpredictable errors.

Application APIs

ttrt --help
ttrt read
ttrt run
ttrt query
ttrt perf
ttrt check

Command line usage

There are different ways you can use the APIs under ttrt. The first is via the command line as follows. All artifacts are saved under ttrt-artifacts folder under TT_MLIR_HOME environment variable. By default, all logging is printed to the terminal. You can specify a log file to dump output to.

read

Read sections of a binary file

ttrt read --help
ttrt read --section version out.ttnn
ttrt read --section system_desc out.ttnn
ttrt read --section mlir out.ttnn
ttrt read --section cpp out.ttnn
ttrt read --section inputs out.ttnn
ttrt read --section outputs out.ttnn
ttrt read --section op_stats out.ttnn
ttrt read --section mesh_shape out.ttnn
ttrt read --section all out.ttnn --clean-artifacts
ttrt read --section all out.ttnn --save-artifacts
ttrt read --section all /dir/of/flatbuffers
ttrt read system_desc.ttsys
ttrt read --section system_desc system_desc.ttsys
ttrt read system_desc.ttsys --log-file ttrt.log
ttrt read out.ttnn --save-artifacts --artifact-dir /path/to/some/dir
ttrt read out.ttnn --result-file result.json

run

Run a binary file or a directory of binary files Note: It's required to be on a system with silicon and to have a runtime enabled build -DTTMLIR_ENABLE_RUNTIME=ON.

ttrt run --help
ttrt run out.ttnn
ttrt run out.ttnn --seed 0
ttrt run out.ttnn --init arange
ttrt run out.ttnn --identity
ttrt run out.ttnn --identity --rtol 1 --atol 1
ttrt run out.ttnn --clean-artifacts
ttrt run out.ttnn --save-artifacts
ttrt run out.ttnn --loops 10
ttrt run --program-index all out.ttnn
ttrt run --program-index 0 out.ttnn
ttrt run /dir/of/flatbuffers
ttrt run /dir/of/flatbuffers --loops 10
ttrt run /dir/of/flatbuffers --log-file ttrt.log
ttrt run out.ttnn --save-artifacts --artifact-dir /path/to/some/dir
ttrt run out.ttnn --load-kernels-from-disk
ttrt run out.ttnn --result-file result.json
ttrt run out.ttnn --disable-golden
ttrt run out.ttnn --save-golden-tensors
ttrt run out.ttnn --print-input-output-tensors
ttrt run out.ttnn --debugger
ttrt run out.ttnn --memory --save-artifacts
ttrt run out.ttnn --memory --check-memory-leak

Run results

The run api saves a run_results.json file that records information about the run including any errors that were thrown and location of other saved run data.

{
[
  {
    "file_path": "ttnn/test_tan[f32-shape0]_ttnn.mlir.ttnn",
    "result": "pass",
    "exception": "",
    "log_file": "ttrt.log",
    "artifacts": "/home/$USER/tt-mlir/ttrt-artifacts",
    "program_index": "all",
    "program_results": {
      "program_index_0": {
        "loop_0": {
          "total_duration_ns": 3269341588,
          "total_ttnn_api_duration_ns": null,
          "total_device_kernel_duration_ns": null
        }
      }
    }
  }
]

Golden checks

Golden checks are used to verify runtime op accuracy. They are run by default during the golden callback unless flag --disable-golden is used. If flag --save-artifacts is used, a golden results report will be saved under the artifacts directory.

{
    "loc(\"/home/$USER/tt-mlir/test/python/golden/test_ttir_ops.py:74:id(0)\")": {
        "expected_pcc": 0.99,
        "actual_pcc": 0.0015917614829425491,
        "atol": 1e-08,
        "rtol": 1e-05,
        "allclose": false,
        "max": 8529.765625,
        "mean_absolute_error": 6.644593238830566,
        "root_mean_square_error": 100.30211639404297,
        "cosine_similarity": 0.0016297339461743832
    }
}

Memory

Memory callback functions are run when flag --memory is used. A memory report will be written under the artifacts directory that contains information on op memory usage.

{
    "0": {
        "loc": "loc(\"/home/$USER/tt-mlir/test/python/golden/test_ttir_ops.py:74:id(0)\")",
        "debug_str": "%0 = \"ttnn.tan\"(%arg0) : (tensor<128x128xf32, #ttnn.ttnn_layout<(d0, d1) -> (d0, d1), <1x1>, memref<4x4x!ttcore.tile<32x32, f32>, #ttnn.buffer_type<dram>>, <interleaved>>>) -> tensor<128x128xf32, #ttnn.ttnn_layout<(d0, d1) -> (d0, d1), <1x1>, memref<4x4x!ttcore.tile<32x32, f32>, #ttnn.buffer_type<dram>>, <interleaved>>> loc(\"/home/$USER/tt-mlir/test/python/golden/test_ttir_ops.py:74:id(0)\")",
        "dram": {
            "num_banks": 12,
            "total_bytes_per_bank": 1071181792,
            "total_bytes_allocated_per_bank": 16384,
            "total_bytes_free_per_bank": 1071167456,
            "largest_contiguous_bytes_free_per_bank": 1071165408,
            "block_table": [
                {
                    "allocated": "yes",
                    "nextID": "1",
                    "prevID": "-1",
                    "size": "8192",
                    "address": "0",
                    "blockID": "0"
                },
                {
                    "allocated": "yes",
                    "nextID": "3",
                    "prevID": "0",
                    "size": "8192",
                    "address": "8192",
                    "blockID": "1"
                },
                {
                    "allocated": "no",
                    "nextID": "-1",
                    "prevID": "1",
                    "size": "1071165408",
                    "address": "16384",
                    "blockID": "3"
                }
            ]
        },
        "l1": {
            "num_banks": 64,
            "total_bytes_per_bank": 1369120,
            "total_bytes_allocated_per_bank": 0,
            "total_bytes_free_per_bank": 1369120,
            "largest_contiguous_bytes_free_per_bank": 1369120,
            "block_table": [
                {
                    "allocated": "no",
                    "nextID": "-1",
                    "prevID": "-1",
                    "size": "1369120",
                    "address": "0",
                    "blockID": "0"
                }
            ]
        },
        "l1_small": {
            "num_banks": 64,
            "total_bytes_per_bank": 32768,
            "total_bytes_allocated_per_bank": 0,
            "total_bytes_free_per_bank": 32768,
            "largest_contiguous_bytes_free_per_bank": 32768,
            "block_table": [
                {
                    "allocated": "no",
                    "nextID": "-1",
                    "prevID": "-1",
                    "size": "32768",
                    "address": "0",
                    "blockID": "0"
                }
            ]
        },
        "trace": {
            "num_banks": 12,
            "total_bytes_per_bank": 0,
            "total_bytes_allocated_per_bank": 0,
            "total_bytes_free_per_bank": 0,
            "largest_contiguous_bytes_free_per_bank": 0,
            "block_table": [
                {
                    "allocated": "no",
                    "nextID": "-1",
                    "prevID": "-1",
                    "size": "0",
                    "address": "0",
                    "blockID": "0"
                }
            ]
        }
    }
}

Debugger

Enabling the --debugger flag sets a pbd trace to run after each op during the callback hook.

query

Query the system to obtain the system desc file (optionally store it to disk) Note: It's required to be on a system with silicon and to have a runtime enabled build -DTTMLIR_ENABLE_RUNTIME=ON.

ttrt query --help
ttrt query
ttrt query --quiet
ttrt query --save-artifacts
ttrt query --clean-artifacts
ttrt query --save-artifacts --log-file ttrt.log
ttrt query --save-artifacts --artifact-dir /path/to/some/dir
ttrt query --result-file result.json

perf

Run performance mode of a binary file or a directory of binary files Note: It's required to be on a system with silicon and to have a runtime enabled build -DTTMLIR_ENABLE_RUNTIME=ON. Also need perf enabled build -DTT_RUNTIME_ENABLE_PERF_TRACE=ON. Note: You can collect host only related performance data via --host-only flag. By default, host and device side performance data are both collected. If the saving artifacts flag is provided, perf mode will dump the following files in the artifacts directory

ops_perf_results.csv : compiled op performance results
OP CODE,OP TYPE,GLOBAL CALL COUNT,DEVICE ID,ATTRIBUTES,MATH FIDELITY,CORE COUNT,PARALLELIZATION STRATEGY,HOST START TS,HOST END TS,HOST DURATION [ns],DEVICE FW START CYCLE,DEVICE FW END CYCLE,OP TO OP LATENCY [ns],OP TO OP LATENCY BR/NRISC START [ns],DEVICE FW DURATION [ns],DEVICE KERNEL DURATION [ns],DEVICE KERNEL DURATION DM START [ns],DEVICE KERNEL DURATION PER CORE MIN [ns],DEVICE KERNEL DURATION PER CORE MAX [ns],DEVICE KERNEL DURATION PER CORE AVG [ns],DEVICE KERNEL FIRST TO LAST START [ns],DEVICE BRISC KERNEL DURATION [ns],DEVICE NCRISC KERNEL DURATION [ns],DEVICE TRISC0 KERNEL DURATION [ns],DEVICE TRISC1 KERNEL DURATION [ns],DEVICE TRISC2 KERNEL DURATION [ns],DEVICE ERISC KERNEL DURATION [ns],DEVICE COMPUTE CB WAIT FRONT [ns],DEVICE COMPUTE CB RESERVE BACK [ns],DISPATCH TOTAL CQ CMD OP TIME [ns],DISPATCH GO SEND WAIT TIME [ns],INPUT_0_W,INPUT_0_Z,INPUT_0_Y,INPUT_0_X,INPUT_0_LAYOUT,INPUT_0_DATATYPE,INPUT_0_MEMORY,OUTPUT_0_W,OUTPUT_0_Z,OUTPUT_0_Y,OUTPUT_0_X,OUTPUT_0_LAYOUT,OUTPUT_0_DATATYPE,OUTPUT_0_MEMORY,METAL TRACE ID,METAL TRACE REPLAY SESSION ID,COMPUTE KERNEL SOURCE,COMPUTE KERNEL HASH,DATA MOVEMENT KERNEL SOURCE,DATA MOVEMENT KERNEL HASH,BRISC MAX KERNEL SIZE [B],NCRISC MAX KERNEL SIZE [B],TRISC 0 MAX KERNEL SIZE [B],TRISC 1 MAX KERNEL SIZE [B],TRISC 2 MAX KERNEL SIZE [B],ERISC MAX KERNEL SIZE [B],PM IDEAL [ns],PM COMPUTE [ns],PM BANDWIDTH [ns],PM REQ I BW,PM REQ O BW,PM FPU UTIL (%),NOC UTIL (%),DRAM BW UTIL (%),NPE CONG IMPACT (%),LOC,CONST_EVAL_OP,PROGRAM_METADATA
UnaryDeviceOperation,tt_dnn_device,1024,0,{'bfp8_pack_precise': 'false'; 'fp32_dest_acc_en': 'true'; 'op_chain': '{UnaryWithParam(op_type=UnaryOpType::TAN;param={})}'; 'output_dtype': 'DataType::FLOAT32'; 'output_memory_config': 'MemoryConfig(memory_layout=TensorMemoryLayout::INTERLEAVED;buffer_type=BufferType::DRAM;shard_spec=std::nullopt;nd_shard_spec=std::nullopt;created_with_nd_shard_spec=0)'; 'preserve_fp32_precision': 'true'},HiFi4,16,,4556959654,4557518500,558846,9815181939513,9815181946491,0,0,6978,6314,6126,4982,6216,5652,335,6087,1375,1656,4957,465,,,,,,1,1,128,128,TILE,FLOAT32,DEV_1_DRAM_INTERLEAVED,1,1,128,128,TILE,FLOAT32,DEV_1_DRAM_INTERLEAVED,,,['ttnn/cpp/ttnn/operations/eltwise/unary/device/kernels/compute//eltwise_sfpu.cpp'],['eltwise_sfpu/3265258334475852953/'],['ttnn/cpp/ttnn/operations/eltwise/unary/device/kernels/dataflow/reader_unary_interleaved_start_id.cpp'; 'ttnn/cpp/ttnn/operations/eltwise/unary/device/kernels/dataflow/writer_unary_interleaved_start_id.cpp'],['reader_unary_interleaved_start_id/1146610629329498539/'; 'writer_unary_interleaved_start_id/1727642094059197364/'],708,736,1344,1568,1380,0,1,1,1,[],[],0.016,,,,"loc(""/home/$USER/tt-mlir/test/python/golden/test_ttir_ops.py:74:id(0)"")",false,"{'loop_number': 0, 'program_index': 0, 'disable_eth_dispatch': False, 'enable_program_cache': False, 'dump_device_rate': 1000}"
profile_log_device.csv : dump of all device side profiled results
tracy_ops_data.csv : op data results dumped in a readable format
tracy_ops_times.csv : op time results dumped in a readable format
tracy_profile_log_host.tracy : tracy profiled results file, this file can be fed into the tracy GUI

check

Check a binary file or a directory of binary files against a system desc (by default, uses the host machine) Note: It's required to be on a system with silicon and to have a runtime enabled build -DTTMLIR_ENABLE_RUNTIME=ON.

ttrt check --help
ttrt check out.ttnn
ttrt check out.ttnn --system-desc /path/to/system_desc.ttsys
ttrt check out.ttnn --clean-artifacts
ttrt check out.ttnn --save-artifacts
ttrt check out.ttnn --log-file ttrt.log
ttrt check /dir/of/flatbuffers --system-desc /dir/of/system_desc
ttrt check --save-artifacts --artifact-dir /path/to/some/dir out.ttnn
ttrt check out.ttnn --result-file result.json

gdb

You can relaunch ttrt inside of gdb which can be useful for debugging C++ runtime components.

ttrt --gdb run ...
ttrt --gdb perf ...

Using as a python package

The other way to use the APIs under ttrt is importing it as a library. This allows the user to use it in custom scripts.

Import ttrt as a python package

from ttrt.common.api import API

Setup API and register all features

API.initialize_apis()

Setup arguments

You can specify certain arguments to pass to each API, or use the default arguments provided

Args

This can be a dictionary of values to set inside your API instance. These are the same options as found via the command line. You can get the total list of support arguments via the ttrt --help command. Any argument not provided will be set to the default.

custom_args = {}
custom_args["--clean-artifacts"] = True
query_instance = API.Query(args=custom_args)

Logging

You can specify a specific logging module you want to set inside your API instance. The rationale behind this is to support different instances of different APIs, all being able to be logged to a different file. You can also customize the level of detail your log file contains.

from ttrt.common.util import Logger
import os

os.environ["LOGGER_LEVEL"] = "DEBUG"
log_file_name = "some_file_name.log"
custom_logger = Logger(log_file_name)
read_instance = API.Read(logger=custom_logger)

Artifacts

You can specify a specific artifacts directory to store all the generate metadata during the execution of any API run. This allows you to specify different artifact directories if you wish for different instances of APIs.

from ttrt.common.util import Artifacts

log_file_name = "some_file_name.log"
artifacts_folder_path = "/opt/folder"
custom_logger = Logger(log_file_name)
custom_artifacts = Artifacts(logger=custom_logger, artifacts_folder_path=artifacts_folder_path)
run_instance = API.Run(artifacts=custom_artifacts)

Execute API

Once all the arguments are setup, you can run your API instance with all your provided arguments. Note, APIs are stateless. Thus, subsequent calls to the same API instance will not preserve previous call artifacts. You can generate a new artifacts directory for subsequent runs if you wish to call the APIs multiple times, for example.

result_code, results = query_instance()
result_code, results = read_instance()
result_code, results = run_instance()

Putting it all together

You can do interesting stuff when combining all the above features into your python script

from ttrt.common.api import API
from ttrt.common.util import Logger
from ttrt.common.util import Artifacts

API.initialize_apis()

custom_args = {}
custom_args["--clean-artifacts"] = True
custom_args["--save-artifacts"] = True
custom_args["--loops"] = 10
custom_args["--init"] = "randn"
custom_args["binary"] = "/path/to/subtract.ttnn"

log_file_name = "some_file_name.log"
custom_logger = Logger(log_file_name)

artifacts_folder_path = "/opt/folder"
custom_artifacts = Artifacts(logger=custom_logger, artifacts_folder_path=artifacts_folder_path)

run_instance = API.Run(args=custom_args, logger=custom_logger, artifacts=custom_artifacts)
result_code, results = run_instance()

Runtime integration

The full set of ttrt.runtime exposed APIs and types can be found in runtime/python/runtime/runtime.cpp, however only the ones intended to be used for runtime customization through callback hooks are outlined here.

Callback hooks

MLIR Runtime exposes a feature to register python callback functions. Any two python fuctions can be provided - the first function will be executed before every op in MLIR Runtime, the second after every op. The following steps describe how to extend your application to register python functions. Callback functions are already implemented by default for pbd debugger implementation and gathering memory and golden check data as outlined in the run API section.

  1. Pybind DebugHooks C++ class, specifically tt::runtime::debug::Hooks::get. See runtime/python/runtime/runtime.cpp for an example of how ttrt pybinds it.
tt::runtime::debug::Hooks
tt::runtime::debug::Hooks::get
  1. Register callback functions in your python script. The following is registering the two callback functions written in runtime/tools/ttrt/ttrt/common/callback.py. The Debug Hooks get function has been pybinded to ttrt.runtime.DebugHooks.get
import ttrt.runtime

callback_env = ttrt.runtime.DebugHooks.get(pre_op_callback_runtime_config, post_op_callback_runtime_config)
  1. The callback function has a particular function signature, which looks like the following
def pre_op_callback_runtime_config(binary, program_context, op_context):

binary: reference to the binary you are currently running, ttrt.binary Binary object program_context: reference to the program currently running, ttrt.runtime ProgramContext object op_context: reference to the op that is currently running, ttrt.runtime OpContext object

  1. Each of these parameters has certain runtime APIs exposed which can only be called within the callback functions since they rely on the op_context variable that is only available from runtime during callbacks.
import ttrt.runtime

loc = ttrt.runtime.get_op_loc_info(op_context) : get the location of the op as a string which is used as the key when indexing the golden tensors stored in the flatbuffer
op_debug_str = ttrt.runtime.get_op_debug_str(op_context) : get the op debug str (contains op metadata inculding op type, attributes, input tensor shapes and dtypes, memref with layout and buffer type, and loc)
op_golden_tensor = ttrt.runtime.get_debug_info_golden(binary, loc) : get the golden tensor from the binary as a ttrt.binary GoldenTensor object
op_output_tensor = ttrt.runtime.get_op_output_tensor(op_context, program_context) : get the currently running output tensor from device as a ttrt.runtime Tensor object, if this is called in a preOp function or the op doesn't output a tensor, an empty tensor will be returned.

Note: ttrt is not needed to implement this callback feature. It aims to provide an example of how this callback feature can be implemented for golden application.

FAQ

Flatbuffer version does not match ttrt version!

ttrt and flatbuffer have strict versioning that is checked during ttrt execution. You will have to generate a flatbuffer using the same version of ttrt (or vice versa). This mean you might have to build on the same branch on which the flatbuffer was generated or regenerate the flatbuffer using your current build.

System desc does not match flatbuffer!

Flatbuffers are compiled using a specific system desc (or default values if no system desc is provided). During runtime, the flatbuffer system desc is checked against the current system to ensure the system being run on supports the flatbuffer that was compiled. If you get this error, you will have to regenerate the flatbuffer using the system you want to run on. See generate a flatbuffer file from compiler section on how to do this.

I just want to test and push my commit! What do I do!

Follow these steps (on n150, n300, and llmbox)

  1. Build ttmlir (sample instructions - subject to change)
source env/activate
cmake -G Ninja -B build -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang-17 -DCMAKE_CXX_COMPILER=clang++-17 -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DTTMLIR_ENABLE_RUNTIME=ON -DTT_RUNTIME_ENABLE_PERF_TRACE=ON
cmake --build build
  1. Build ttrt (sample instructions - subject to change)
cmake --build build -- ttrt
  1. Query system
ttrt query --save-artifacts
  1. Export system desc file
export SYSTEM_DESC_PATH=/path/to/system_desc.ttsys (path dumped in previous command)
  1. Generate test cases
cmake --build build -- check-ttmlir
  1. Run test cases
ttrt run build/test/ttmlir/Silicon
  1. (Optional) Run perf test cases
ttrt perf build/test/ttmlir/Silicon

TTRT yields an ambiguous segmentation fault!

The ttrt toolchain has specific behaviors and requirements that can lead to build and runtime issues, particularly when dealing with version mismatches or out-of-sync dependencies.

Version Mismatch Due to Local Commits

The ttrt toolchain verifies whether the current system configuration matches the model’s compilation environment. This verification involves tracking the number of commits since the last synchronization. When local commits are made in your branch, it may trigger a version mismatch between the compiled model and the current environment. This mismatch may not be handled properly by the runtime (rt), leading to potential issues.

To resolve issues stemming from these synchronization problems, follow this workflow:

  1. Incremental build
# make some changes
# commit
cmake --build build
cmake --build build -- ttrt
# note you need to generate system_desc and flatbuffer again once you do this

This incremental build should be sufficient. If it does not resolve the error, please file an issue and proceed with the following steps for now.

  1. Clear the existing build and dependencies:
rm -rf build third_party/tt-metal

This ensures that all previous build artifacts and dependencies are removed, preventing conflicts or stale files from affecting the new build.

  1. Rebuild from scratch: After clearing the build directories, rebuild the project from the ground up. This ensures that the build process incorporates all the necessary components without any remnants of previous builds. Build Instructions

  2. Switch build configurations: If switching from a Debug to a Release build (or vice versa), ensure that you clean the build environment before transitioning. This avoids inconsistencies between build configurations and potential issues with optimization levels or debugging symbols.

  3. Re-acquire the IRD: By relinquishing and re-acquiring the IRD, you ensure that the correct toolchain is used for the new build. This step ensures synchronization between the model and the toolchain.

  4. Enable Debug Logging for tt-metal: To gain more insight into potential issues, enable debugging by setting the TT_METAL_LOGGER_LEVEL to DEBUG. This will provide detailed logs, which can help in troubleshooting build or runtime issues.

export TT_METAL_LOGGER_LEVEL=DEBUG