LLK Asserts

Overview

LLK Asserts provide runtime validation checks within the tt-llk codebase that implements core tensor operations. These asserts validate critical assumptions about tensor dimensions, data formats, and hardware configuration parameters within the device-side compute stack, executing on the accelerator hardware itself.

LLK asserts complement Lightweight Kernel Asserts and the Watcher tool by providing specialized validation for the low-level kernel library that handles:

Tensor unpacking (moving data from L1 memory into source registers)
Math operations (matrix multiplication, element-wise operations)
Tensor packing (moving results from destination register back to L1 memory)
Tilization and untilization (data layout transformations)

Enabling

LLK Asserts are controlled independently from Lightweight Kernel Asserts and the Watcher. To enable them, set the following environment variable:

export TT_METAL_LLK_ASSERTS=1  # Enable LLK Asserts. Default is `0` (disabled).

Lightweight Kernel Asserts at present provide more detailed information about the assertion failure.

It is recommended to enable at least one of these mechanisms for comprehensive debugging:

# Option 1: Enable LLK Asserts with Lightweight Kernel Asserts
export TT_METAL_LLK_ASSERTS=1
export TT_METAL_LIGHTWEIGHT_KERNEL_ASSERTS=1

# Option 2: Enable LLK Asserts with Watcher (more comprehensive debugging)
export TT_METAL_LLK_ASSERTS=1
export TT_METAL_WATCHER=1

# Option 3: Enable LLK Asserts only
export TT_METAL_LLK_ASSERTS=1

When an LLK assert fails, it triggers:

An ebreak instruction (in case of Lightweight Kernel Asserts or LLK Asserts only)
Watcher assertion (in case of Watcher)

This causes the kernel to hang. If Watcher is used, the assertion message will be printed to stderr and the watcher log file. If Lightweight Kernel Asserts are used or LLK Asserts only are used, use tt-triage to analyze the failure state.

Quick-start: Running a Test with LLK Asserts

The easiest way to run a test with LLK asserts enabled — together with the device-print and on-timeout triage hooks needed to actually capture the failure — is to source the helper script tools/setup_llk_assert_env.sh. It exports every environment variable needed for a fully-instrumented debug run in a single line.

Usage

The script must be sourced (not executed) so the exported variables persist in your shell. It takes two required positional parameters (no defaults):

source tools/setup_llk_assert_env.sh <assert_output_path> <dprint_output_path>

<assert_output_path> — file where the on-timeout triage hook writes the lightweight-assert dump (callstacks, template parameters, runtime args, locals).
<dprint_output_path> — file where DEVICE_PRINT output from all cores is written (TT_METAL_DPRINT_FILE).

The script also requires TT_METAL_HOME to be set — activating the tt-metal Python environment is enough.

Example

cd $TT_METAL_HOME
source python_env/bin/activate
source tools/setup_llk_assert_env.sh assert.txt /tmp/tt_dprint.log
TT_METAL_LLK_ASSERTS=1 pytest [CONCRETE_TEST] > test_output.txt

After the run completes, three artifacts are available for analysis:

test_output.txt — pytest stdout/stderr, including the host-side RuntimeError and stack trace.
assert.txt — the lightweight-assert dump from the device when the hang was detected.
/tmp/tt_dprint.log — DEVICE_PRINT output (e.g. expected: …, actual: … values emitted near a failing LLK_ASSERT).

How the script is designed to work

A failing LLK_ASSERT executes ebreak on a TRISC, which silently hangs that core. The host keeps waiting on a command-queue completion that will never arrive. The script wires up six environment variables that turn this otherwise-mysterious hang into a precise, self-triaged failure:

Variable	Purpose
`TT_METAL_DISPATCH_TIMEOUT_COMMAND_TO_EXECUTE`	Shell command run by the host when a dispatch timeout is detected. The script sets it to `$TT_METAL_HOME/tools/tt-triage.py --run=dump_lightweight_asserts > <assert_output_path> && tt-smi -r` — first dump the lightweight asserts (callstack, template params, runtime args, locals) to the file you specified, then reset the device with `tt-smi -r` so the next run starts from a clean state.
`TT_METAL_OPERATION_TIMEOUT_SECONDS=5.0`	Short host-side completion timeout. Without it, the host would wait the full default timeout (minutes) before declaring the hang and firing the triage command.
`TT_RUN_DISABLED_TRIAGE_SCRIPTS_IN_CI=1`	Allows the triage scripts (including `dump_lightweight_asserts`) to execute even when running under CI-like conditions where they are otherwise gated off.
`TT_METAL_DPRINT_CORES=all`	Subscribes the host dprint server to every core. Without this, `DEVICE_PRINT` statements on TRISCs would not be captured.
`TT_METAL_DEVICE_PRINT=1`	Enables the `DEVICE_PRINT` macro at compile-time in JIT-built kernels.
`TT_METAL_DPRINT_FILE=<dprint_output_path>`	Redirects all `DEVICE_PRINT` output to the file you specified instead of stdout, making it easy to grep/diff after the run.

The end-to-end flow is:

LLK_ASSERT condition fails on a TRISC → ebreak → core hangs.
After TT_METAL_OPERATION_TIMEOUT_SECONDS=5.0 seconds, the host detects no completion and raises a dispatch timeout.
The host runs TT_METAL_DISPATCH_TIMEOUT_COMMAND_TO_EXECUTE → tt-triage.py walks the device, finds the asserting TRISC, recovers callstack, template params, runtime args, and locals, and writes them to <assert_output_path>.
tt-smi -r resets the device so the next test run is clean.
Pytest reports a RuntimeError to test_output.txt; DEVICE_PRINT lines emitted before the ebreak are in <dprint_output_path>.

Together, these three files normally pinpoint the failing kernel, the failing line, and the exact mismatched values within a single test run — no rebuild required.

LLK_ASSERT Macro

There is a single macro used throughout the tt-llk library:

LLK_ASSERT(condition, message)

Its runtime behavior depends on the compile context, which is determined by flags injected during JIT build (tt_metal/jit_build/build.cpp):

Disabled (ENABLE_LLK_ASSERT not defined): the condition is evaluated only as an unevaluated sizeof expression — fully type-checked and name-resolved at compile time, but with zero runtime overhead.
Assert-only context (ENV_LLK_INFRA or ENABLE_LLK_ASSERT_ONLY defined): if the condition is false, an ebreak instruction is executed directly and the TRISC hangs. In JIT builds, ENABLE_LLK_ASSERT_ONLY is set when TT_METAL_LLK_ASSERTS=1 and neither TT_METAL_WATCHER nor TT_METAL_LIGHTWEIGHT_KERNEL_ASSERTS is enabled.
tt-metal context (ENABLE_LLK_ASSERT defined but neither of the above): delegates to the standard ASSERT(condition) macro from api/debug/assert.h. This covers two sub-cases:
- Lightweight Kernel Asserts enabled (TT_METAL_LLK_ASSERTS=1 + TT_METAL_LIGHTWEIGHT_KERNEL_ASSERTS=1, no Watcher): ASSERT triggers ebreak via the lightweight-assert path, allowing tools/triage/dump_lightweight_asserts.py to retrieve the call stack and local variables.
- Watcher enabled (TT_METAL_LLK_ASSERTS=1 + TT_METAL_WATCHER=1): ASSERT reports the failure through the Watcher mechanism, printing a message to stderr and the watcher log file.

The macro is defined in tt_metal/tt-llk/common/llk_assert.h.

What LLK Asserts Validate

LLK asserts perform runtime validation of low-level kernel operations. Common checks include:

Hardware Configuration (most common assert trigger)

Unpacker format conversion is supported for the source/destination data format pair and FP32 accumulation setting (is_unpacker_format_conversion_supported_fp32_acc, is_unpacker_format_conversion_supported_dest)
Packer-to-L1 data format conversion is supported for the given source/destination pair (is_packer_to_L1_conversion_supported)
Multi-tile pack respects per-format tile count limits (e.g. ≤ 4 tiles for Float32, ≤ 8 for Float16/Float16_b)
face_r_dim and narrow_tile parameters are not set in contexts where they are unused (defensive correctness)
Face row dimension is one of the valid values: 1, 2, 4, 8, or FACE_R_DIM

Tensor Dimension Validation

Number of faces is 1, 2, or 4 (num_faces, unpA_num_faces, unpB_num_faces)
Tile dimensions match expected values (TILE_R_DIM, TILE_C_DIM)
Tile shape is valid for tile-dependent operations (validate_tensor_shape_tile_dependent_ops_)

Matrix Multiplication Constraints

MM throttling requires full 32×32 tiles with partial faces disabled
16×16 × 16×16 matmul (1-face × 1-face) is not supported
Transpose with 32×16 input tiles is not supported

Broadcast and Transpose Constraints

Column broadcast requires full 32×32 tiles (num_faces == 4)
Broadcast with 32×16 (narrow_tile) tiles is not supported for column or row modes
Scalar broadcast is not compatible with transpose of faces
Transpose requires face_r_dim == 16 and num_faces of 4 or 1

Math Pipeline Constraints

Math fidelity higher than LoFi is only valid with element-wise multiply (ELWMUL)
Element-wise binary type must be ELWADD, ELWSUB, or ELWMUL
Reduce narrow tile: num_faces of 4 implies full-width 32×32, not a narrow_tile

Destination Register Addressing

Destination tile index does not exceed get_dest_max_tiles() for SFPU unary, binary, and ternary operations
Destination address does not overflow half-dest in TopK operations (addr < DEST_REGISTER_HALF_SIZE)
Face index is in range (face_index < 4)

L1 Memory

Packer destination address is within the valid L1 memory region (is_valid_L1_address)
Unpacker base address is within the valid L1 memory region

Synchronization

Semaphore index is within bounds (index < semaphore::NUM_SEMAPHORES)
Semaphore value is not already at max before increment
Semaphore value is not already at zero before decrement

Examples from tt-llk Code

Here are some representative LLK asserts from the codebase:

From llk_unpack_tilize.h:

LLK_ASSERT(num_faces == 1 || num_faces == 2 || num_faces == 4,
           "num_faces must be 1, 2, or 4");

From llk_math_matmul.h:

LLK_ASSERT(
    (in0_tile_r_dim == TILE_R_DIM) &&
    (in0_tile_c_dim == TILE_C_DIM) &&
    (in1_tile_r_dim == TILE_R_DIM) &&
    (in1_tile_c_dim == TILE_C_DIM) &&
    !partial_face,
    "Matmul requires standard tile dimensions and no partial faces");

Location in Codebase

LLK asserts are defined and used throughout the tt-llk library under:

tt_metal/tt-llk/
├── common/
│   └── llk_assert.h              # Assert macro definition (single source of truth)
├── tt_llk_blackhole/
│   ├── llk_lib/
│   │   ├── llk_unpack_*.h        # Unpack operation asserts
│   │   ├── llk_math_*.h          # Math operation asserts
│   │   ├── llk_pack*.h           # Pack operation asserts
│   │   └── ...
│   └── common/inc/
│       ├── cunpack_common.h      # Common unpack asserts
│       ├── cpack_common.h        # Common pack asserts
│       └── ...
├── tt_llk_wormhole_b0/
│   └── (similar structure)
└── tt_llk_quasar/
    └── (similar structure)

CI/CD Integration

LLK asserts are fully integrated into the tt-metal CI/CD system through the enable-llk-asserts input parameter. When set to true, the setup-job composite action exports TT_METAL_LLK_ASSERTS=1 into the test environment, causing JIT-compiled kernels to include LLK_ASSERT checks at runtime.

Workflows Accepting enable-llk-asserts

The following key workflow files accept the enable-llk-asserts boolean input and can be triggered manually via workflow_dispatch with the checkbox enabled:

sanity-tests.yaml — broad sanity suite (fast dispatch, models, ops, TTNN, profiler)
sanity-tests-debug.yaml — nightly debug run (also the scheduled nightly LLK assert run)
blackhole-post-commit.yaml — Blackhole-specific test matrix (models, ops, TTNN, UMD, multi-card)

Other workflows also accept this parameter (e.g. ttnn-post-commit.yaml, ops-post-commit.yaml, models-post-commit.yaml, tt-metal-l2-nightly.yaml, and others), but the three above are the most useful entry points for validating new asserts.

Nightly Runs with LLK Asserts

The sanity-tests-debug.yaml workflow runs automatically every night at 2:00 AM UTC with LLK asserts enabled. It spawns four test jobs — Wormhole and Blackhole on both Ubuntu 22.04 and Ubuntu 24.04:

wormhole-nightly-debug-ubuntu-22 — runs sanity-tests.yaml with enable-llk-asserts: true
blackhole-nightly-debug-ubuntu-22 — runs blackhole-post-commit.yaml with enable-llk-asserts: true
wormhole-nightly-debug-ubuntu-24 — runs sanity-tests.yaml with enable-llk-asserts: true
blackhole-nightly-debug-ubuntu-24 — runs blackhole-post-commit.yaml with enable-llk-asserts: true

The same workflow can also be triggered on demand via workflow_dispatch with the enable-llk-asserts checkbox.

Adding New LLK_ASSERT — Required Procedure

After adding a new LLK_ASSERT to the codebase, you must validate it does not trigger false positives on supported configurations. Follow this procedure:

Run CI with LLK asserts enabled

Trigger either (or both) of these workflows manually via workflow_dispatch with the enable-llk-asserts checkbox enabled:
- Sanity tests — exercises a broad set of configurations including fast dispatch, models, ops, TTNN, and profiler regression.
- Blackhole post-commit — validates Blackhole-specific code paths (models, ops, TTNN, UMD, multi-card).
For each test failure, choose one of two resolutions:
1. Fix the failure (preferred) — When an LLK_ASSERT fires, the root cause is almost always in kernel code, not in the test. The assert is catching an invalid parameter or configuration being passed to the LLK API. Fix the kernel code to pass valid values.
2. Skip with tracking — If the failure cannot be fixed immediately, mark the test to be skipped when LLK asserts are enabled, with a reference to a tracking issue:
```
from models.common.utility_functions import skip_with_llk_assert

@skip_with_llk_assert("Hit LLK_ASSERT for unpacker data format conversion. Issue: #XXXXX")
def test_something(...):
    ...
```
  The skip_with_llk_assert decorator (from models/common/utility_functions.py) is a pytest.mark.skipif that activates whenever TT_METAL_LLK_ASSERTS=1. Always include the GitHub issue number in the reason string for tracking.
Ensure nightly coverage — Once all failures are resolved or skipped, the new assert will be continuously validated by the nightly sanity-tests-debug.yaml run at 2:00 AM UTC.

When to Use LLK Asserts

As a user, you typically don’t need to add LLK asserts—they are already present in the system library code. However, you should enable them during development and testing to catch:

Incorrect tensor operation parameters - Catches dimension mismatches early
Unsupported operation configurations - Validates that hardware capabilities aren’t exceeded
Library integration issues - Ensures operations are called with valid parameters

Performance Impact

LLK asserts have zero overhead when disabled (compiled out as no-ops). When enabled, they add conditional checks at critical points in low-level operations. The performance impact is generally acceptable for development and testing builds but should be disabled in production for maximum performance.

Debugging Failed LLK Asserts

When an LLK assert fails:

The kernel hangs at the assertion point (ebreak instruction)
Run tt-triage to analyze the device state
If Lightweight Kernel Asserts are enabled or LLK Asserts only are enabled, use tools/triage/dump_lightweight_asserts.py to see call stacks and local variables
Check the assertion message to understand what constraint was violated

Tip

Use the tools/setup_llk_assert_env.sh helper (see Quick-start: Running a Test with LLK Asserts) to automate the timeout / triage / device-print wiring so a single test run produces assert.txt, the dprint log, and test_output.txt with no manual export setup.

Note

When an LLK assert fires, the root cause is almost always in kernel code — the assert is validating that the LLK API was called with legal parameters. Look at the kernel code that is calling the LLK function, not at the test itself.

Common failure scenarios:

Attempting operations on tiles with unsupported dimensions
Passing incompatible data formats between operations
Using narrow tiles or partial faces where not supported
Dimension mismatches in matrix multiplications

Real-World Examples

The following issues and PRs illustrate the most common patterns of LLK assert hits observed in practice and how they were resolved.

Issue #38346 — Create green runs of Sanity/BPC workflows when LLK_ASSERTS are enabled

The umbrella tracking issue that motivated systematic LLK assert work. When LLK asserts were first enabled across the CI suite, a large number of pre-existing tests failed — revealing that many kernels had been silently misusing the LLK API. The resolution strategy defined was: fix kernel code where the LLK API was misused, skip tests (with skip_with_llk_assert) where further investigation was needed, and make nightly runs with LLK asserts enabled a reliable green baseline going forward.

Issue #39184 — Fix tests which hit LLK_ASSERT during unpack configuration verification

A detailed description of the most common HW configure assert pattern: the unpacker is configured during HW configure (or a reconfigure phase) with specific face_r_dim / num_faces values, but a different configuration is then passed to init or the execution block — causing the assert to fire on the mismatch. The issue includes a full reproduction recipe and an example tools/triage/dump_lightweight_asserts.py callstack:

#0 llk_unpack_tilizeA_B_init<false, 1, false, true>()
   at llk_unpack_tilize_api.h:97
#1 ckernel::tilizeA_B_reduce_init<false, true>()
   at tilize.h:79
#2 chlkc_unpack::unpack_main()
   at compute_pool_2d.cpp:87

Arguments: operandA=1, operandB=2, ct_dim=1, num_faces=2,
           unpA_face_r_dim=4, unpB_face_r_dim=1

The callstack clearly points to the compute kernel (compute_pool_2d.cpp) as the source of the problem, not the test.

PR #39945 — Fix LLK_ASSERT hits for sdpa tests (kernel code fix)

Addressed LLK assert failures in SDPA/MLA kernels caused by unpacker configuration mismatches. The fixes were entirely in kernel code:

Fixed operand ordering — reconfig_data_format(in0, in1) in sdpa_flash_decode.cpp was reordered to reconfig_data_format(in1, in0) to match the state_configure(in1, in0) call used by the matmul path.
Added missing reconfigure calls — reconfig_data_format and pack_reconfig_data_format calls were added in compute_common.hpp to keep unpacker/packer configuration consistent before matmul init and execution in the SDPA inner loop.

After the kernel fixes, all skip_with_llk_assert decorators were removed from the SDPA/MLA test files.

PR #40282 — Enrich LLK API with more customized hw (re)configure options (LLK API extension)

Addressed LLK assert failures in pooling-related kernels (compute_pool_2d.cpp) where the kernel legitimately needs to change face_r_dim or num_faces dynamically — for example, using different geometry for the last block. Because the existing LLK HW configure/reconfig API always derived these values from the circular buffer metadata, there was no way to express the kernel’s intent without triggering an assert.

The fix added new LLK pack/unpack reconfig API overloads that accept explicit face_r_dim and num_faces parameters, allowing kernels to configure hardware with geometry that differs from the CB metadata when there is a documented reason to do so. After the API was extended, skip_with_llk_assert was removed from multiple pool and reduction test files (test_avgpool2d.py, test_upsample.py, test_rotate.py, test_grid_sample.py, test_adaptive_pool2d.py, test_reduction.py).