ttrt
This tool is intended to be a swiss army knife for working with flatbuffers generated by the compiler. Its primary role is to inspect and run flatbuffer files. It enables the running of flatbuffer files without a front-end runtime.
Building
- Build ttmlir
- Build
ttrt
:
source env/activate
cmake --build build -- ttrt
ttrt --help
Building runtime mode
Add the following flags when building the compiler
-DTTMLIR_ENABLE_RUNTIME=ON
Building perf mode
Add the following flags when building the compiler
-DTTMLIR_ENABLE_RUNTIME=ON
-DTT_RUNTIME_ENABLE_PERF_TRACE=ON
LOGGER Levels
ttrt
support logging at different logger levels. You will need to set env var TTRT_LOGGER_LEVEL
in command line or a python script. By default, it's set to INFO
.
TTRT_LOGGER_LEVEL=INFO
TTRT_LOGGER_LEVEL=CRITICAL
TTRT_LOGGER_LEVEL=ERROR
TTRT_LOGGER_LEVEL=WARNING
TTRT_LOGGER_LEVEL=DEBUG
tt-metal logging
ttrt
runtime uses tt-metal for op execution and device interfacing. For more detailed logs, which can help in troubleshooting build or runtime issues, set env var TT_METAL_LOGGER_LEVEL
. By default, it is set to FATAL
.
export TT_METAL_LOGGER_LEVEL=DEBUG
Installing ttrt
as python whls
Every time ttrt
is built, it creates a whls file in build/runtime/tools/ttrt/build
. Ex filename: ttrt-0.0.235-cp310-cp310-linux_x86_64.whl
. You can take this whls file and install it in any docker container and in any venv outside of ttmlir. After which, you can use all the following functionality as the same.
- Download whls
- Create a python venv
python -m venv ttrt_env
source ttrt_env/bin/activate
- Install whls (replace with your version of the whls)
pip install build/runtime/tools/ttrt/build/ttrt-0.0.235-cp310-cp310-linux_x86_64.whl
Generating a flatbuffer
tt-mlir exposes a few ways to generate flatbuffers.
Generate a flatbuffer file from ttir-builder
ttir-builder
is a tool for creating TTIR ops, converting them into MLIR modules, running passes to lower modules into backends, and translating to flatbuffers. See documentation for further instructions.
Generate a flatbuffer file from compiler
The compiler supports a pass to load a system descriptor to compile against. You can feed this pass into ttmlir-opt
.
- Build ttmlir
- Build
ttrt
(see building section on this page) - Generate ttsys file from the system you want to compile for using
ttrt
. This will create asystem_desc.ttsys
file underttrt-artifacts
folder.
ttrt query --save-artifacts
- Use
ttmlir-opt
tool in compiler to feed system descriptor. See thettmlir-opt
documentation for more information on how to generate .mlir files.
./build/bin/ttmlir-opt --ttcore-register-device="system-desc-path=/path/to/system_desc.ttsys" --ttir-to-ttnn-backend-pipeline test/ttmlir/Dialect/TTNN/simple_subtract.mlir -o ttnn.mlir
or (pipe path directly into ttir-to-ttnn-backend-pipeline)
./build/bin/ttmlir-opt --ttir-to-ttnn-backend-pipeline="system-desc-path=/path/to/system_desc.ttsys" test/ttmlir/Dialect/TTNN/simple_subtract_to_add.mlir -o ttnn.mlir
- Use
ttmlir-translate
tool in compiler to generate the flatbuffer executable. See thettmlir-translate
documentation for more information on how to generate flatbuffer files.
./build/bin/ttmlir-translate --ttnn-to-flatbuffer ttnn.mlir -o out.ttnn
- Run your test cases using
ttrt
ttrt run /path/to/out.ttnn
Generate flatbuffer files using llvm-lit
There are already existing .mlir test cases under test/ttmlir/Silicon
. You can use llvm-lit
tool to generate the corresponding ttnn and ttm files.
- Build ttmlir
- Build
ttrt
(see building section on this page) - Generate ttsys file from the system you want to compile for using
ttrt
. This will create asystem_desc.ttsys
file underttrt-artifacts
folder.
ttrt query --save-artifacts
- Export this file in your environment using
export SYSTEM_DESC_PATH=/path/to/system_desc.ttsys
. Whenllvm-lit
is run, it will query this variable and generate the ttnn and ttm files using this system. Optionally, you can also provide this manually when runningllvm-lit
. - Generate your test cases. This will generate all your ttnn and ttm files under
build/test/ttmlir/Silicon
. ttnn files have a.ttnn
file extension and ttmetal files have a.ttm
extension.
cmake --build build -- check-ttmlir
- (Optional) If you have a single .mlir file (or a directory of custom .mlir files) that you created using the compiler, and you want to generate the corresponding ttnn and ttm files for it, you can run
llvm-lit
standalone to the path of your .mlir file or directory of .mlir files to generate the flatbuffer executables. You will have to make sure you add in the correctllvm-lit
configs into your .mlir file. See section on addingllvm-lit
config options inside a .mlir file to create flatbuffer binaries for more info. You must also make sure your .mlir test is found within test/ttmlir/Silicon folder (and point lit to the build folder)!
llvm-lit -v ./build/test/ttmlir/Silicon
or
SYSTEM_DESC_PATH=/path/to/system_desc.ttsys llvm-lit -v ./build/test/ttmlir/Silicon
- Run your test cases using
ttrt
ttrt run /path/to/test.ttnn
ttrt run /path/to/dir/of/flatbuffers
Adding llvm-lit config options inside a .mlir file to create flatbuffer binaries
Inside of your .mlir file, you can add certain config options that llvm-lit
will use when running against that test case. For the purpose of generating flatbuffer executables, you can add --ttcore-register-device="system-desc-path=%system_desc_path%"
which will tell llvm-lit
to parse the system desc found from the environment flag set by export SYSTEM_DESC_PATH=/path/to/system_desc.ttsys
. You can also paste a custom path to a system desc file as well.
// RUN: ttmlir-opt --ttcore-register-device="system-desc-path=%system_desc_path%" --ttnn-layout --convert-ttir-to-ttnn %s > %t.mlir
// RUN: FileCheck %s --input-file=%t.mlir
// RUN: ttmlir-translate --ttnn-to-flatbuffer %t.mlir > %t.ttnn
Adding new mlir test cases
You can copy your .mlir test file (with the appropriate llvm-lit
config options for generating flatbuffer binaries) into test/ttmlir/Silicon
. Then, follow generating flatbuffer files using llvm-lit
to generate the executables to run!
Versioning
ttrt
and flatbuffers have strict versioning check. When running a flatbuffer against ttrt
, you have to make sure the flatbuffer was generated using the same version as ttrt
(or vice versa). Major and Minor versions are manually set using github tags when releases are made. Patch versioning is the number of commits from the last major/minor tag.
vmajor.minor.patch
The flag --ignore-version
can be used to bypass versioning checks. Use at your own risk; it can cause unpredictable errors.
Application APIs
ttrt --help
ttrt read
ttrt run
ttrt query
ttrt perf
ttrt check
Command line usage
There are different ways you can use the APIs under ttrt
. The first is via the command line as follows. All artifacts are saved under ttrt-artifacts
folder under TT_MLIR_HOME
environment variable. By default, all logging is printed to the terminal. You can specify a log file to dump output to.
read
Read sections of a binary file
ttrt read --help
ttrt read --section version out.ttnn
ttrt read --section system_desc out.ttnn
ttrt read --section mlir out.ttnn
ttrt read --section cpp out.ttnn
ttrt read --section inputs out.ttnn
ttrt read --section outputs out.ttnn
ttrt read --section op_stats out.ttnn
ttrt read --section mesh_shape out.ttnn
ttrt read --section all out.ttnn --clean-artifacts
ttrt read --section all out.ttnn --save-artifacts
ttrt read --section all /dir/of/flatbuffers
ttrt read system_desc.ttsys
ttrt read --section system_desc system_desc.ttsys
ttrt read system_desc.ttsys --log-file ttrt.log
ttrt read out.ttnn --save-artifacts --artifact-dir /path/to/some/dir
ttrt read out.ttnn --result-file result.json
run
Run a binary file or a directory of binary files
Note: It's required to be on a system with silicon and to have a runtime enabled build -DTTMLIR_ENABLE_RUNTIME=ON
.
ttrt run --help
ttrt run out.ttnn
ttrt run out.ttnn --seed 0
ttrt run out.ttnn --init arange
ttrt run out.ttnn --identity
ttrt run out.ttnn --identity --rtol 1 --atol 1
ttrt run out.ttnn --clean-artifacts
ttrt run out.ttnn --save-artifacts
ttrt run out.ttnn --loops 10
ttrt run --program-index all out.ttnn
ttrt run --program-index 0 out.ttnn
ttrt run /dir/of/flatbuffers
ttrt run /dir/of/flatbuffers --loops 10
ttrt run /dir/of/flatbuffers --log-file ttrt.log
ttrt run out.ttnn --save-artifacts --artifact-dir /path/to/some/dir
ttrt run out.ttnn --load-kernels-from-disk
ttrt run out.ttnn --result-file result.json
ttrt run out.ttnn --disable-golden
ttrt run out.ttnn --save-golden-tensors
ttrt run out.ttnn --print-input-output-tensors
ttrt run out.ttnn --debugger
ttrt run out.ttnn --memory --save-artifacts
ttrt run out.ttnn --memory --check-memory-leak
Run results
The run
api saves a run_results.json
file that records information about the run including any errors that were thrown and location of other saved run data.
{
[
{
"file_path": "ttnn/test_tan[f32-shape0]_ttnn.mlir.ttnn",
"result": "pass",
"exception": "",
"log_file": "ttrt.log",
"artifacts": "/home/$USER/tt-mlir/ttrt-artifacts",
"program_index": "all",
"program_results": {
"program_index_0": {
"loop_0": {
"total_duration_ns": 3269341588,
"total_ttnn_api_duration_ns": null,
"total_device_kernel_duration_ns": null
}
}
}
}
]
Golden checks
Golden checks are used to verify runtime op accuracy. They are run by default during the golden callback unless flag --disable-golden
is used. If flag --save-artifacts
is used, a golden results report will be saved under the artifacts directory.
{
"loc(\"/home/$USER/tt-mlir/test/python/golden/test_ttir_ops.py:74:id(0)\")": {
"expected_pcc": 0.99,
"actual_pcc": 0.0015917614829425491,
"atol": 1e-08,
"rtol": 1e-05,
"allclose": false,
"max": 8529.765625,
"mean_absolute_error": 6.644593238830566,
"root_mean_square_error": 100.30211639404297,
"cosine_similarity": 0.0016297339461743832
}
}
Memory
Memory callback functions are run when flag --memory
is used. A memory report will be written under the artifacts directory that contains information on op memory usage.
{
"0": {
"loc": "loc(\"/home/$USER/tt-mlir/test/python/golden/test_ttir_ops.py:74:id(0)\")",
"debug_str": "%0 = \"ttnn.tan\"(%arg0) : (tensor<128x128xf32, #ttnn.ttnn_layout<(d0, d1) -> (d0, d1), <1x1>, memref<4x4x!ttcore.tile<32x32, f32>, #ttnn.buffer_type<dram>>, <interleaved>>>) -> tensor<128x128xf32, #ttnn.ttnn_layout<(d0, d1) -> (d0, d1), <1x1>, memref<4x4x!ttcore.tile<32x32, f32>, #ttnn.buffer_type<dram>>, <interleaved>>> loc(\"/home/$USER/tt-mlir/test/python/golden/test_ttir_ops.py:74:id(0)\")",
"dram": {
"num_banks": 12,
"total_bytes_per_bank": 1071181792,
"total_bytes_allocated_per_bank": 16384,
"total_bytes_free_per_bank": 1071167456,
"largest_contiguous_bytes_free_per_bank": 1071165408,
"block_table": [
{
"allocated": "yes",
"nextID": "1",
"prevID": "-1",
"size": "8192",
"address": "0",
"blockID": "0"
},
{
"allocated": "yes",
"nextID": "3",
"prevID": "0",
"size": "8192",
"address": "8192",
"blockID": "1"
},
{
"allocated": "no",
"nextID": "-1",
"prevID": "1",
"size": "1071165408",
"address": "16384",
"blockID": "3"
}
]
},
"l1": {
"num_banks": 64,
"total_bytes_per_bank": 1369120,
"total_bytes_allocated_per_bank": 0,
"total_bytes_free_per_bank": 1369120,
"largest_contiguous_bytes_free_per_bank": 1369120,
"block_table": [
{
"allocated": "no",
"nextID": "-1",
"prevID": "-1",
"size": "1369120",
"address": "0",
"blockID": "0"
}
]
},
"l1_small": {
"num_banks": 64,
"total_bytes_per_bank": 32768,
"total_bytes_allocated_per_bank": 0,
"total_bytes_free_per_bank": 32768,
"largest_contiguous_bytes_free_per_bank": 32768,
"block_table": [
{
"allocated": "no",
"nextID": "-1",
"prevID": "-1",
"size": "32768",
"address": "0",
"blockID": "0"
}
]
},
"trace": {
"num_banks": 12,
"total_bytes_per_bank": 0,
"total_bytes_allocated_per_bank": 0,
"total_bytes_free_per_bank": 0,
"largest_contiguous_bytes_free_per_bank": 0,
"block_table": [
{
"allocated": "no",
"nextID": "-1",
"prevID": "-1",
"size": "0",
"address": "0",
"blockID": "0"
}
]
}
}
}
Debugger
Enabling the --debugger
flag sets a pbd trace to run after each op during the callback hook.
query
Query the system to obtain the system desc file (optionally store it to disk)
Note: It's required to be on a system with silicon and to have a runtime enabled build -DTTMLIR_ENABLE_RUNTIME=ON
.
ttrt query --help
ttrt query
ttrt query --quiet
ttrt query --save-artifacts
ttrt query --clean-artifacts
ttrt query --save-artifacts --log-file ttrt.log
ttrt query --save-artifacts --artifact-dir /path/to/some/dir
ttrt query --result-file result.json
perf
Run performance mode of a binary file or a directory of binary files
Note: It's required to be on a system with silicon and to have a runtime enabled build -DTTMLIR_ENABLE_RUNTIME=ON
. Also need perf enabled build -DTT_RUNTIME_ENABLE_PERF_TRACE=ON
.
Note: You can collect host only related performance data via --host-only
flag. By default, host and device side performance data are both collected.
If the saving artifacts flag is provided, perf mode will dump the following files in the artifacts directory
ops_perf_results.csv : compiled op performance results
OP CODE,OP TYPE,GLOBAL CALL COUNT,DEVICE ID,ATTRIBUTES,MATH FIDELITY,CORE COUNT,PARALLELIZATION STRATEGY,HOST START TS,HOST END TS,HOST DURATION [ns],DEVICE FW START CYCLE,DEVICE FW END CYCLE,OP TO OP LATENCY [ns],OP TO OP LATENCY BR/NRISC START [ns],DEVICE FW DURATION [ns],DEVICE KERNEL DURATION [ns],DEVICE KERNEL DURATION DM START [ns],DEVICE KERNEL DURATION PER CORE MIN [ns],DEVICE KERNEL DURATION PER CORE MAX [ns],DEVICE KERNEL DURATION PER CORE AVG [ns],DEVICE KERNEL FIRST TO LAST START [ns],DEVICE BRISC KERNEL DURATION [ns],DEVICE NCRISC KERNEL DURATION [ns],DEVICE TRISC0 KERNEL DURATION [ns],DEVICE TRISC1 KERNEL DURATION [ns],DEVICE TRISC2 KERNEL DURATION [ns],DEVICE ERISC KERNEL DURATION [ns],DEVICE COMPUTE CB WAIT FRONT [ns],DEVICE COMPUTE CB RESERVE BACK [ns],DISPATCH TOTAL CQ CMD OP TIME [ns],DISPATCH GO SEND WAIT TIME [ns],INPUT_0_W,INPUT_0_Z,INPUT_0_Y,INPUT_0_X,INPUT_0_LAYOUT,INPUT_0_DATATYPE,INPUT_0_MEMORY,OUTPUT_0_W,OUTPUT_0_Z,OUTPUT_0_Y,OUTPUT_0_X,OUTPUT_0_LAYOUT,OUTPUT_0_DATATYPE,OUTPUT_0_MEMORY,METAL TRACE ID,METAL TRACE REPLAY SESSION ID,COMPUTE KERNEL SOURCE,COMPUTE KERNEL HASH,DATA MOVEMENT KERNEL SOURCE,DATA MOVEMENT KERNEL HASH,BRISC MAX KERNEL SIZE [B],NCRISC MAX KERNEL SIZE [B],TRISC 0 MAX KERNEL SIZE [B],TRISC 1 MAX KERNEL SIZE [B],TRISC 2 MAX KERNEL SIZE [B],ERISC MAX KERNEL SIZE [B],PM IDEAL [ns],PM COMPUTE [ns],PM BANDWIDTH [ns],PM REQ I BW,PM REQ O BW,PM FPU UTIL (%),NOC UTIL (%),DRAM BW UTIL (%),NPE CONG IMPACT (%),LOC,CONST_EVAL_OP,PROGRAM_METADATA
UnaryDeviceOperation,tt_dnn_device,1024,0,{'bfp8_pack_precise': 'false'; 'fp32_dest_acc_en': 'true'; 'op_chain': '{UnaryWithParam(op_type=UnaryOpType::TAN;param={})}'; 'output_dtype': 'DataType::FLOAT32'; 'output_memory_config': 'MemoryConfig(memory_layout=TensorMemoryLayout::INTERLEAVED;buffer_type=BufferType::DRAM;shard_spec=std::nullopt;nd_shard_spec=std::nullopt;created_with_nd_shard_spec=0)'; 'preserve_fp32_precision': 'true'},HiFi4,16,,4556959654,4557518500,558846,9815181939513,9815181946491,0,0,6978,6314,6126,4982,6216,5652,335,6087,1375,1656,4957,465,,,,,,1,1,128,128,TILE,FLOAT32,DEV_1_DRAM_INTERLEAVED,1,1,128,128,TILE,FLOAT32,DEV_1_DRAM_INTERLEAVED,,,['ttnn/cpp/ttnn/operations/eltwise/unary/device/kernels/compute//eltwise_sfpu.cpp'],['eltwise_sfpu/3265258334475852953/'],['ttnn/cpp/ttnn/operations/eltwise/unary/device/kernels/dataflow/reader_unary_interleaved_start_id.cpp'; 'ttnn/cpp/ttnn/operations/eltwise/unary/device/kernels/dataflow/writer_unary_interleaved_start_id.cpp'],['reader_unary_interleaved_start_id/1146610629329498539/'; 'writer_unary_interleaved_start_id/1727642094059197364/'],708,736,1344,1568,1380,0,1,1,1,[],[],0.016,,,,"loc(""/home/$USER/tt-mlir/test/python/golden/test_ttir_ops.py:74:id(0)"")",false,"{'loop_number': 0, 'program_index': 0, 'disable_eth_dispatch': False, 'enable_program_cache': False, 'dump_device_rate': 1000}"
profile_log_device.csv : dump of all device side profiled results
tracy_ops_data.csv : op data results dumped in a readable format
tracy_ops_times.csv : op time results dumped in a readable format
tracy_profile_log_host.tracy : tracy profiled results file, this file can be fed into the tracy GUI
check
Check a binary file or a directory of binary files against a system desc (by default, uses the host machine)
Note: It's required to be on a system with silicon and to have a runtime enabled build -DTTMLIR_ENABLE_RUNTIME=ON
.
ttrt check --help
ttrt check out.ttnn
ttrt check out.ttnn --system-desc /path/to/system_desc.ttsys
ttrt check out.ttnn --clean-artifacts
ttrt check out.ttnn --save-artifacts
ttrt check out.ttnn --log-file ttrt.log
ttrt check /dir/of/flatbuffers --system-desc /dir/of/system_desc
ttrt check --save-artifacts --artifact-dir /path/to/some/dir out.ttnn
ttrt check out.ttnn --result-file result.json
gdb
You can relaunch ttrt
inside of gdb which can be useful for debugging C++
runtime components.
ttrt --gdb run ...
ttrt --gdb perf ...
Using as a python package
The other way to use the APIs under ttrt
is importing it as a library. This allows the user to use it in custom scripts.
Import ttrt
as a python package
from ttrt.common.api import API
Setup API and register all features
API.initialize_apis()
Setup arguments
You can specify certain arguments to pass to each API, or use the default arguments provided
Args
This can be a dictionary of values to set inside your API instance. These are the same options as found via the command line. You can get the total list of support arguments via the ttrt --help
command. Any argument not provided will be set to the default.
custom_args = {}
custom_args["--clean-artifacts"] = True
query_instance = API.Query(args=custom_args)
Logging
You can specify a specific logging module you want to set inside your API instance. The rationale behind this is to support different instances of different APIs, all being able to be logged to a different file. You can also customize the level of detail your log file contains.
from ttrt.common.util import Logger
import os
os.environ["LOGGER_LEVEL"] = "DEBUG"
log_file_name = "some_file_name.log"
custom_logger = Logger(log_file_name)
read_instance = API.Read(logger=custom_logger)
Artifacts
You can specify a specific artifacts directory to store all the generate metadata during the execution of any API run. This allows you to specify different artifact directories if you wish for different instances of APIs.
from ttrt.common.util import Artifacts
log_file_name = "some_file_name.log"
artifacts_folder_path = "/opt/folder"
custom_logger = Logger(log_file_name)
custom_artifacts = Artifacts(logger=custom_logger, artifacts_folder_path=artifacts_folder_path)
run_instance = API.Run(artifacts=custom_artifacts)
Execute API
Once all the arguments are setup, you can run your API instance with all your provided arguments. Note, APIs are stateless. Thus, subsequent calls to the same API instance will not preserve previous call artifacts. You can generate a new artifacts directory for subsequent runs if you wish to call the APIs multiple times, for example.
result_code, results = query_instance()
result_code, results = read_instance()
result_code, results = run_instance()
Putting it all together
You can do interesting stuff when combining all the above features into your python script
from ttrt.common.api import API
from ttrt.common.util import Logger
from ttrt.common.util import Artifacts
API.initialize_apis()
custom_args = {}
custom_args["--clean-artifacts"] = True
custom_args["--save-artifacts"] = True
custom_args["--loops"] = 10
custom_args["--init"] = "randn"
custom_args["binary"] = "/path/to/subtract.ttnn"
log_file_name = "some_file_name.log"
custom_logger = Logger(log_file_name)
artifacts_folder_path = "/opt/folder"
custom_artifacts = Artifacts(logger=custom_logger, artifacts_folder_path=artifacts_folder_path)
run_instance = API.Run(args=custom_args, logger=custom_logger, artifacts=custom_artifacts)
result_code, results = run_instance()
Runtime integration
The full set of ttrt.runtime
exposed APIs and types can be found in runtime/python/runtime/runtime.cpp
, however only the ones intended to be used for runtime customization through callback hooks are outlined here.
Callback hooks
MLIR Runtime exposes a feature to register python callback functions. Any two python fuctions can be provided - the first function will be executed before every op in MLIR Runtime, the second after every op. The following steps describe how to extend your application to register python functions. Callback functions are already implemented by default for pbd debugger implementation and gathering memory and golden check data as outlined in the run
API section.
- Pybind DebugHooks C++ class, specifically
tt::runtime::debug::Hooks::get
. Seeruntime/python/runtime/runtime.cpp
for an example of howttrt
pybinds it.
tt::runtime::debug::Hooks
tt::runtime::debug::Hooks::get
- Register callback functions in your python script. The following is registering the two callback functions written in
runtime/tools/ttrt/ttrt/common/callback.py
. The Debug Hooks get function has been pybinded tottrt.runtime.DebugHooks.get
import ttrt.runtime
callback_env = ttrt.runtime.DebugHooks.get(pre_op_callback_runtime_config, post_op_callback_runtime_config)
- The callback function has a particular function signature, which looks like the following
def pre_op_callback_runtime_config(binary, program_context, op_context):
binary
: reference to the binary you are currently running, ttrt.binary Binary object
program_context
: reference to the program currently running, ttrt.runtime ProgramContext object
op_context
: reference to the op that is currently running, ttrt.runtime OpContext object
- Each of these parameters has certain runtime APIs exposed which can only be called within the callback functions since they rely on the
op_context
variable that is only available from runtime during callbacks.
import ttrt.runtime
loc = ttrt.runtime.get_op_loc_info(op_context) : get the location of the op as a string which is used as the key when indexing the golden tensors stored in the flatbuffer
op_debug_str = ttrt.runtime.get_op_debug_str(op_context) : get the op debug str (contains op metadata inculding op type, attributes, input tensor shapes and dtypes, memref with layout and buffer type, and loc)
op_golden_tensor = ttrt.runtime.get_debug_info_golden(binary, loc) : get the golden tensor from the binary as a ttrt.binary GoldenTensor object
op_output_tensor = ttrt.runtime.get_op_output_tensor(op_context, program_context) : get the currently running output tensor from device as a ttrt.runtime Tensor object, if this is called in a preOp function or the op doesn't output a tensor, an empty tensor will be returned.
Note: ttrt
is not needed to implement this callback feature. It aims to provide an example of how this callback feature can be implemented for golden application.
FAQ
Flatbuffer version does not match ttrt
version!
ttrt
and flatbuffer have strict versioning that is checked during ttrt
execution. You will have to generate a flatbuffer using the same version of ttrt
(or vice versa). This mean you might have to build on the same branch on which the flatbuffer was generated or regenerate the flatbuffer using your current build.
System desc does not match flatbuffer!
Flatbuffers are compiled using a specific system desc (or default values if no system desc is provided). During runtime, the flatbuffer system desc is checked against the current system to ensure the system being run on supports the flatbuffer that was compiled. If you get this error, you will have to regenerate the flatbuffer using the system you want to run on. See generate a flatbuffer file from compiler section on how to do this.
I just want to test and push my commit! What do I do!
Follow these steps (on n150, n300, and llmbox)
- Build ttmlir (sample instructions - subject to change)
source env/activate
cmake -G Ninja -B build -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang-17 -DCMAKE_CXX_COMPILER=clang++-17 -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DTTMLIR_ENABLE_RUNTIME=ON -DTT_RUNTIME_ENABLE_PERF_TRACE=ON
cmake --build build
- Build
ttrt
(sample instructions - subject to change)
cmake --build build -- ttrt
- Query system
ttrt query --save-artifacts
- Export system desc file
export SYSTEM_DESC_PATH=/path/to/system_desc.ttsys (path dumped in previous command)
- Generate test cases
cmake --build build -- check-ttmlir
- Run test cases
ttrt run build/test/ttmlir/Silicon
- (Optional) Run perf test cases
ttrt perf build/test/ttmlir/Silicon
TTRT yields an ambiguous segmentation fault!
The ttrt
toolchain has specific behaviors and requirements that can lead to build and runtime issues, particularly when dealing with version mismatches or out-of-sync dependencies.
Version Mismatch Due to Local Commits
The ttrt
toolchain verifies whether the current system configuration matches the model’s compilation environment. This verification involves tracking the number of commits since the last synchronization. When local commits are made in your branch, it may trigger a version mismatch between the compiled model and the current environment. This mismatch may not be handled properly by the runtime (rt
), leading to potential issues.
To resolve issues stemming from these synchronization problems, follow this workflow:
- Incremental build
# make some changes
# commit
cmake --build build
cmake --build build -- ttrt
# note you need to generate system_desc and flatbuffer again once you do this
This incremental build should be sufficient. If it does not resolve the error, please file an issue and proceed with the following steps for now.
- Clear the existing build and dependencies:
rm -rf build third_party/tt-metal
This ensures that all previous build artifacts and dependencies are removed, preventing conflicts or stale files from affecting the new build.
-
Rebuild from scratch: After clearing the build directories, rebuild the project from the ground up. This ensures that the build process incorporates all the necessary components without any remnants of previous builds. Build Instructions
-
Switch build configurations: If switching from a Debug to a Release build (or vice versa), ensure that you clean the build environment before transitioning. This avoids inconsistencies between build configurations and potential issues with optimization levels or debugging symbols.
-
Re-acquire the IRD: By relinquishing and re-acquiring the IRD, you ensure that the correct toolchain is used for the new build. This step ensures synchronization between the model and the toolchain.
-
Enable Debug Logging for tt-metal: To gain more insight into potential issues, enable debugging by setting the TT_METAL_LOGGER_LEVEL to DEBUG. This will provide detailed logs, which can help in troubleshooting build or runtime issues.
export TT_METAL_LOGGER_LEVEL=DEBUG