ttkernel.add (tt::ttkernel::AddOp)

Add operation

Add operation

Operands:

OperandDescription
dst_index32-bit signless integer

ttkernel.add_tiles_init (tt::ttkernel::AddTilesInitOp)

Short init function

Must be run before add_tiles.

Operands:

OperandDescription
in0_cbTTKernel cb
in1_cbTTKernel cb

ttkernel.add_tiles (tt::ttkernel::AddTilesOp)

Add operation

Performs element-wise addition C=A+B of tiles in two CBs at given indices and writes the result to the DST register at index dst_tile_index. The DST register buffer must be in acquired state via tile_regs_acquire call. This call is blocking and is only available on the compute engine.

Operands:

OperandDescription
in0_cbTTKernel cb
in1_cbTTKernel cb
in0_tile_index32-bit signless integer
in1_tile_index32-bit signless integer
dst_index32-bit signless integer

ttkernel.binary_op_init_common (tt::ttkernel::BinaryOpInitCommonOp)

Init function for all binary ops

Followed by the specific init required with an opcode (binrary_op_specific_init).

Operands:

OperandDescription
in0_cbTTKernel cb
in1_cbTTKernel cb
out_cbTTKernel cb

ttkernel.builtin (tt::ttkernel::BuiltinOp)

Builtin call.

Kernel operation

Attributes:

AttributeMLIR TypeDescription
op::mlir::FlatSymbolRefAttrflat symbol reference attribute
kind::mlir::FlatSymbolRefAttrflat symbol reference attribute

Operands:

OperandDescription
argsvariadic of non-0-ranked.memref of any type values or TTKernel cb

ttkernel.cb_pop_front (tt::ttkernel::CBPopFrontOp)

CBPopFront call.

CBPopFront operation

Operands:

OperandDescription
cbTTKernel cb
numPages32-bit signless integer

ttkernel.cb_push_back (tt::ttkernel::CBPushBackOp)

CBPushBack call.

CBPushBack operation

Operands:

OperandDescription
cbTTKernel cb
numPages32-bit signless integer

ttkernel.cb_reserve_back (tt::ttkernel::CBReserveBackOp)

CBReserveBack call.

CBReserveBack operation

Operands:

OperandDescription
cbTTKernel cb
numPages32-bit signless integer

ttkernel.cb_wait_front (tt::ttkernel::CBWaitFrontOp)

CBWaitFront call.

CBWaitFront operation

Operands:

OperandDescription
cbTTKernel cb
numPages32-bit signless integer

ttkernel.reinterpret_cast<volatile tt_l1_ptr uint32_t*> (tt::ttkernel::CastToL1PtrOp)

CastToL1Ptr

Cast specified addr to L1 pointer.

Interfaces: InferTypeOpInterface

Operands:

OperandDescription
addr32-bit signless integer or TTKernel l1 address

Results:

ResultDescription
l1_ptrTTKernel l1 address pointer

ttkernel.copy_tile_init (tt::ttkernel::CopyTileInitOp)

Perform the init for copy tile. This does not reconfigure the unpacker data types.

Must be called before copy_tile.

ttkernel.copy_tile (tt::ttkernel::CopyTileOp)

Copy tile from specified CB to DST.

Copies a single tile from the specified input CB and writes the result to DST at a specified index. The function will employ unpacker to first unpack into SRC registers and then perform move into DST registers, at a specified index. For the in_tile_index to be valid for this call, cb_wait_front(n) had to be previously called to ensure that at least some number n>0 of tiles are available in the input CB. The CB index 0 then references the first tile in the received section of the CB, up to index n-1 (in a FIFO order). The DST register buffer must be in acquired state via tile_regs_acquire call. This call is blocking and is only available on the compute engine.

Operands:

OperandDescription
cb0TTKernel cb
tile_index_cb32-bit signless integer
tile_index_dst32-bit signless integer

ttkernel.exp_tile_init (tt::ttkernel::ExpTileInitOp)

Short init function which configures compute unit for execution of exp_tile.

Must be run before exp_tile.

ttkernel.exp_tile (tt::ttkernel::ExpTileOp)

Exp operation

Performs element-wise computation of exponential on each element of a tile in DST register at index tile_index. The DST register buffer must be in acquired state via tile_regs_acquire call. This call is blocking and is only available on the compute engine.

Operands:

OperandDescription
tile_index32-bit signless integer

ttkernel.get_noc_addr (tt::ttkernel::GetNocAddrOp)

GetNocAddr

GetNocAddr

Interfaces: InferTypeOpInterface

Operands:

OperandDescription
l1Address32-bit signless integer

Results:

ResultDescription
nocAddrTTKernel noc address

ttkernel.get_noc_addr_xy (tt::ttkernel::GetNocAddrXYOp)

GetNocAddrXY

GetNocAddr api including core coordinates

Interfaces: InferTypeOpInterface

Operands:

OperandDescription
x32-bit signless integer
y32-bit signless integer
l1Address32-bit signless integer

Results:

ResultDescription
nocAddrTTKernel noc address

ttkernel.get_noc_multicast_addr (tt::ttkernel::GetNocMulticastAddrOp)

GetNocMulticastAddr

GetNocMulticastAddr

Interfaces: InferTypeOpInterface

Operands:

OperandDescription
noc_x_start32-bit signless integer
noc_y_start32-bit signless integer
noc_x_end32-bit signless integer
noc_y_end32-bit signless integer
addr32-bit signless integer
noc8-bit signless integer

Results:

ResultDescription
mcastNocAddrTTKernel noc address

ttkernel.get_write_ptr (tt::ttkernel::GetWritePtrOp)

GetWritePtr

GetWritePtr operation

Interfaces: InferTypeOpInterface

Operands:

OperandDescription
cbTTKernel cb

Results:

ResultDescription
writePtr32-bit signless integer

ttkernel.matmul (tt::ttkernel::MatmulOp)

Matmul operation

Matmul operation

Operands:

OperandDescription
dst_index32-bit signless integer

ttkernel.max (tt::ttkernel::MaxOp)

Max operation

Max operation

Operands:

OperandDescription
dst_index32-bit signless integer

ttkernel.max_tile_init (tt::ttkernel::MaxTilesInitOp)

Short init function

Must be run before max_tile.

ttkernel.max_tile (tt::ttkernel::MaxTilesOp)

Max operation

Performs element-wise computation of maximum operation DST[dst0_index] <- max(DST[dst0_index], DST[dst1_index]) on DST register operands. The DST register buffer must be in acquired state via tile_regs_acquire call.

Operands:

OperandDescription
dst0_index32-bit signless integer
dst1_index32-bit signless integer

ttkernel.mem_zeros_base (tt::ttkernel::MemZerosBaseOp)

Op corresponding to MEM_ZEROS_BASE macro in kernels.

Op corresponding to MEM_ZEROS_BASE macro in kernels.

Interfaces: InferTypeOpInterface

Results:

ResultDescription
result32-bit signless integer

ttkernel.mem_zeros_size (tt::ttkernel::MemZerosSizeOp)

Op corresponding to MEM_ZEROS_SIZE macro in kernels.

Op corresponding to MEM_ZEROS_SIZE macro in kernels.

Interfaces: InferTypeOpInterface

Results:

ResultDescription
result32-bit signless integer

ttkernel.mul (tt::ttkernel::MulOp)

Mul operation

Mul operation

Operands:

OperandDescription
dst_index32-bit signless integer

ttkernel.mul_tiles_init_f (tt::ttkernel::MulTilesInitFOp)

Short init function. Init for math only.

Must be run before mul_tiles.

ttkernel.mul_tiles_init (tt::ttkernel::MulTilesInitOp)

Short init function

Must be run before mul_tiles.

Operands:

OperandDescription
in0_cbTTKernel cb
in1_cbTTKernel cb

ttkernel.mul_tiles (tt::ttkernel::MulTilesOp)

Mul operation

Performs element-wise multiplication C=A*B of tiles in two CBs at given indices and writes the result to the DST register at index dst_tile_index. The DST register buffer must be in acquired state via tile_regs_acquire call. This call is blocking and is only available on the compute engine.

Operands:

OperandDescription
in0_cbTTKernel cb
in1_cbTTKernel cb
in0_tile_index32-bit signless integer
in1_tile_index32-bit signless integer
dst_index32-bit signless integer

ttkernel.noc_async_read_barrier (tt::ttkernel::NocAsyncReadBarrierOp)

NocAsyncReadBarrier

NocAsyncReadBarrier

ttkernel.noc_async_read_one_packet_set_state (tt::ttkernel::NocAsyncReadOnePacketSetStateOp)

NocAsyncReadOnePacketSetState

NocAsyncReadOnePacketSetState

Operands:

OperandDescription
srcNocAddrTTKernel noc address
size32-bit signless integer

ttkernel.noc_async_read_one_packet_with_state (tt::ttkernel::NocAsyncReadOnePacketWithStateOp)

NocAsyncReadOnePacketWithState

NocAsyncReadOnePacketWithState

Operands:

OperandDescription
srcNocAddrTTKernel noc address
dstLocalL1Addr32-bit signless integer or TTKernel l1 address

ttkernel.noc_async_read (tt::ttkernel::NocAsyncReadOp)

NocAsyncRead

NocAsyncRead

Operands:

OperandDescription
srcNocAddrTTKernel noc address
dstLocalL1Addr32-bit signless integer
size32-bit signless integer

ttkernel.noc_async_write_barrier (tt::ttkernel::NocAsyncWriteBarrierOp)

NocAsyncWriteBarrier

NocAsyncWriteBarrier

ttkernel.noc_async_write_multicast_loopback_src (tt::ttkernel::NocAsyncWriteMulticastLoopbackSrcOp)

NocAsyncWriteMulticastLoopbackSrc

NocAsyncWriteMulticastLoopbackSrc

Attributes:

AttributeMLIR TypeDescription
linked::mlir::BoolAttrbool attribute
multicast_path_reserve::mlir::BoolAttrbool attribute

Operands:

OperandDescription
srcLocalL1Addr32-bit signless integer
dstNocAddrMulticastTTKernel noc address
size32-bit signless integer
num_dests32-bit signless integer
noc8-bit signless integer

ttkernel.noc_async_write_multicast_one_packet (tt::ttkernel::NocAsyncWriteMulticastOnePacketOp)

NocAsyncWriteMulticastOnePacket

NocAsyncWriteMulticastOnePacket this issues only a single packet with size <= NOC_MAX_BURST_SIZE (ie maximum packet size)

Attributes:

AttributeMLIR TypeDescription
linked::mlir::BoolAttrbool attribute
multicast_path_reserve::mlir::BoolAttrbool attribute

Operands:

OperandDescription
srcLocalL1Addr32-bit signless integer
dstNocAddrMulticastTTKernel noc address
size32-bit signless integer
num_dests32-bit signless integer
noc8-bit signless integer

ttkernel.noc_async_write_multicast (tt::ttkernel::NocAsyncWriteMulticastOp)

NocAsyncWriteMulticast

Initiates an asynchronous write from a source address in L1 memory on the Tensix core executing this function call to a rectangular destination grid. The destinations are specified using a uint64_t encoding referencing an on-chip grid of nodes located at NOC coordinate range (x_start,y_start,x_end,y_end) and a local address created using get_noc_multicast_addr function. Also, see noc_async_write_barrier.

The destination nodes can only be a set of Tensix cores + L1 memory address. The destination nodes must form a rectangular grid. The destination L1 memory address must be the same on all destination nodes.

With this API, the multicast sender cannot be part of the multicast destinations. If the multicast sender has to be in the multicast destinations (i.e. must perform a local L1 write), the other API variant noc_async_write_multicast_loopback_src can be used.

Note: The number of destinations needs to be non-zero. Besides that, there is no restriction on the number of destinations, i.e. the multicast destinations can span the full chip. However, as mentioned previously, the multicast source cannot be part of the destinations. So, the maximum number of destinations is 119.

Attributes:

AttributeMLIR TypeDescription
linked::mlir::BoolAttrbool attribute
multicast_path_reserve::mlir::BoolAttrbool attribute

Operands:

OperandDescription
srcLocalL1Addr32-bit signless integer
dstNocAddrMulticastTTKernel noc address
size32-bit signless integer
num_dests32-bit signless integer
noc8-bit signless integer

ttkernel.noc_async_write (tt::ttkernel::NocAsyncWriteOp)

NocAsyncWrite

NocAsyncWrite

Operands:

OperandDescription
srcLocalL1Addr32-bit signless integer
dstNocAddrTTKernel noc address
size32-bit signless integer

ttkernel.pack (tt::ttkernel::PackOp)

Pack op.

Pack operation

Operands:

OperandDescription
dst_index32-bit signless integer
out_cbTTKernel cb
out_index32-bit signless integer

ttkernel.pack_set_data_type (tt::ttkernel::PackSetDataTypeOp)

Pack set DataType op.

Pack set DataType operation

Attributes:

AttributeMLIR TypeDescription
data_type::mlir::IntegerAttr
TT DataTypes{{% markdown %}}Enum cases: * f32 (`Float32`) * f16 (`Float16`) * bf16 (`BFloat16`) * bfp_f8 (`BFP_Float8`) * bfp_bf8 (`BFP_BFloat8`) * bfp_f4 (`BFP_Float4`) * bfp_bf4 (`BFP_BFloat4`) * bfp_f2 (`BFP_Float2`) * bfp_bf2 (`BFP_BFloat2`) * u32 (`UInt32`) * u16 (`UInt16`) * u8 (`UInt8`){{% /markdown %}}

ttkernel.pack_tile (tt::ttkernel::PackTileOp)

PackTile op.

Copies a single tile from the DST register buffer at a specified index to a specified CB at a given index. For the out_tile_index to be valid for this call, cb_reserve_back(n) has to be called first to reserve at least some number n > 0 of tiles in the output CB. out_tile_index = 0 then references the first tile in the reserved section of the CB, up to index n - 1, which will then be visible to the consumer in the same order after a cb_push_back call. The DST register buffer must be in acquired state via tile_regs_acquire call. This call is blocking and is only available on the compute engine.

Each subsequent pack call will increment the write pointer in the cb by single tile size. The pointer is then again set to a valid position with space for n reserved tiles by another cb_reserve_back call.

Operates in tandem with functions cb_reserve_back and cb_push_back.

A typical use case is first the producer ensures that there is a number of tiles available in the buffer via cb_reserve_back, then the producer uses the pack_tile call to copy a tile from one of DST slots to a slot in reserved space and finally cb_push_back is called to announce visibility of the reserved section of the circular buffer to the consumer.

Operands:

OperandDescription
dst_index32-bit signless integer
out_cbTTKernel cb
out_index32-bit signless integer

ttkernel.recip_tile_init (tt::ttkernel::RecipTileInitOp)

Init function for recip_tile operation. Refer to documentation for any init function.

Must be called before recip_tile function.

ttkernel.recip_tile (tt::ttkernel::RecipTileOp)

Recip tile in the DST at specified index.

Performs element-wise computation of the reciprocal on each element of a tile in DST register at index tile_index. The DST register buffer must be in acquired state via tile_regs_acquire call. This call is blocking and is only available on the compute engine. Only works for Float32, Float16_b, Bfp8_b data formats for full accuracy.

Operands:

OperandDescription
tile_index32-bit signless integer

ttkernel.reduce_init (tt::ttkernel::ReduceInitOp)

Init function

Must be run before reduce_tile.

Attributes:

AttributeMLIR TypeDescription
reduce_type::mlir::tt::ttkernel::ReduceTypeAttr
TTKernel Reduce Types{{% markdown %}}Enum cases: * reduce_sum (`Sum`) * reduce_max (`Max`){{% /markdown %}}
reduce_dim::mlir::tt::ttkernel::ReduceDimAttr
TTKernel Reduce Dimensions{{% markdown %}}Enum cases: * reduce_dim_row (`Row`) * reduce_dim_col (`Col`) * reduce_dim_scalar (`Scalar`) * reduce_dim_none (`None`){{% /markdown %}}

Operands:

OperandDescription
in_cbTTKernel cb
scaling_cbTTKernel cb
out_cbTTKernel cb

ttkernel.reduce_tile (tt::ttkernel::ReduceTileOp)

Reduce operation

Performs a reduction operation B = reduce(A) using reduce_func for dimension reduction on a tile in the CB at a given index and writes the result to the DST register at index dst_tile_index. Reduction can be either of type Reduce::R, Reduce::C or Reduce::RC, identifying the dimension(s) to be reduced in size to 1. The DST register buffer must be in acquired state via tile_regs_acquire call. The templates takes reduce_type which can be ReduceFunc::Sum, ReduceFunc::Max and reduce_dim which can be Reduce::R, Reduce::C, Reduce::RC. They can also be specified by defines REDUCE_OP and REDUCE_DIM. This call is blocking and is only available on the compute engine.

Attributes:

AttributeMLIR TypeDescription
reduce_type::mlir::tt::ttkernel::ReduceTypeAttr
TTKernel Reduce Types{{% markdown %}}Enum cases: * reduce_sum (`Sum`) * reduce_max (`Max`){{% /markdown %}}
reduce_dim::mlir::tt::ttkernel::ReduceDimAttr
TTKernel Reduce Dimensions{{% markdown %}}Enum cases: * reduce_dim_row (`Row`) * reduce_dim_col (`Col`) * reduce_dim_scalar (`Scalar`) * reduce_dim_none (`None`){{% /markdown %}}

Operands:

OperandDescription
in_cbTTKernel cb
scaling_cbTTKernel cb
in_tile_index32-bit signless integer
scaling_tile_index32-bit signless integer
dst_index32-bit signless integer

ttkernel.return (tt::ttkernel::ReturnOp)

Return op.

Return operation

Traits: AlwaysSpeculatableImplTrait, ReturnLike, Terminator

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface), RegionBranchTerminatorOpInterface

Effects: MemoryEffects::Effect{}

ttkernel.store_to_l1 (tt::ttkernel::StoreToL1Op)

StoreToL1

Store value to L1.

Operands:

OperandDescription
value32-bit signless integer
l1_ptrTTKernel l1 address pointer
offset32-bit signless integer

ttkernel.sub (tt::ttkernel::SubOp)

Sub operation

Sub operation

Operands:

OperandDescription
dst_index32-bit signless integer

ttkernel.tile_regs_acquire (tt::ttkernel::TileRegsAcquireOp)

Tile_regs_acquire

Acquire an exclusive lock on the DST register for the MATH thread. This register is an array of 16 tiles of 32x32 elements each. This is a blocking function, i.e. this function will wait until the lock is acquired.

ttkernel.tile_regs_commit (tt::ttkernel::TileRegsCommitOp)

Tile_regs_commit

Release lock on DST register by MATH thread. The lock had to be previously acquired with tile_regs_acquire.

ttkernel.tile_regs_release (tt::ttkernel::TileRegsReleaseOp)

Tile_regs_release

Release lock on DST register by PACK thread. The lock had to be previously acquired with tile_regs_wait.

ttkernel.tile_regs_wait (tt::ttkernel::TileRegsWaitOp)

Tile_regs_wait

Acquire an exclusive lock on the DST register for the PACK thread. It waits for the MATH thread to commit the DST register. This is a blocking function, i.e. this function will wait until the lock is acquired.

ttkernel.tilize_block (tt::ttkernel::TilizeBlockOp)

TilizeBlockOp call.

TilizeBlockOp operation

Operands:

OperandDescription
cbInTTKernel cb
numTiles32-bit signless integer
cbOutTTKernel cb

ttkernel.tilize_init (tt::ttkernel::TilizeInitOp)

TilizeInitOp call.

TilizeInitOp operation

Operands:

OperandDescription
cbInTTKernel cb
numTiles32-bit signless integer
cbOutTTKernel cb

ttkernel.unary_op_init_common (tt::ttkernel::UnaryOpInitCommonOp)

Initialization function for unary operations.

This operation initializes all necessary components for unary operations, including unpacking, packing, and math configurations.

Operands:

OperandDescription
icbTTKernel cb
ocbTTKernel cb

ttkernel.unpack_ab (tt::ttkernel::UnpackABOp)

UnpackAB op.

UnpackAB operation

Operands:

OperandDescription
cb_aTTKernel cb
src_a_index32-bit signless integer
cb_bTTKernel cb
src_b_index32-bit signless integer

ttkernel.unpack_a (tt::ttkernel::UnpackAOp)

UnpackA op.

UnpackA operation

Operands:

OperandDescription
cbTTKernel cb
src_index32-bit signless integer

ttkernel.unpack_set_data_type (tt::ttkernel::UnpackSetDataTypeOp)

Unpack set DataType op.

Unpack set DataType operation

Attributes:

AttributeMLIR TypeDescription
data_type_a::mlir::IntegerAttr
TT DataTypes{{% markdown %}}Enum cases: * f32 (`Float32`) * f16 (`Float16`) * bf16 (`BFloat16`) * bfp_f8 (`BFP_Float8`) * bfp_bf8 (`BFP_BFloat8`) * bfp_f4 (`BFP_Float4`) * bfp_bf4 (`BFP_BFloat4`) * bfp_f2 (`BFP_Float2`) * bfp_bf2 (`BFP_BFloat2`) * u32 (`UInt32`) * u16 (`UInt16`) * u8 (`UInt8`){{% /markdown %}}
data_type_b::mlir::IntegerAttr
TT DataTypes{{% markdown %}}Enum cases: * f32 (`Float32`) * f16 (`Float16`) * bf16 (`BFloat16`) * bfp_f8 (`BFP_Float8`) * bfp_bf8 (`BFP_BFloat8`) * bfp_f4 (`BFP_Float4`) * bfp_bf4 (`BFP_BFloat4`) * bfp_f2 (`BFP_Float2`) * bfp_bf2 (`BFP_BFloat2`) * u32 (`UInt32`) * u16 (`UInt16`) * u8 (`UInt8`){{% /markdown %}}

ttkernel.unreachable (tt::ttkernel::UnreachableOp)

Unreachable op.

Unreachable operation

Traits: AlwaysSpeculatableImplTrait, ReturnLike, Terminator

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface), RegionBranchTerminatorOpInterface

Effects: MemoryEffects::Effect{}

ttkernel.untilize_block (tt::ttkernel::UntilizeBlockOp)

UntilizeBlockOp call.

UntilizeBlockOp operation

Operands:

OperandDescription
cbInTTKernel cb
numTiles32-bit signless integer
cbOutTTKernel cb

ttkernel.untilize_init (tt::ttkernel::UntilizeInitOp)

UntilizeInitOp call.

UntilizeInitOp operation

Operands:

OperandDescription
cbInTTKernel cb
cbOutTTKernel cb