ttkernel.add_tiles_init (tt::ttkernel::AddTilesInitOp)

Short init function

Must be run before add_tiles.

Traits: TTKernel_InitOpTrait

Operands:

OperandDescription
in0_cbTTKernel cb
in1_cbTTKernel cb

ttkernel.add_tiles (tt::ttkernel::AddTilesOp)

Add operation

Performs element-wise addition C=A+B of tiles in two CBs at given indices and writes the result to the DST register at index dst_tile_index. The DST register buffer must be in acquired state via tile_regs_acquire call. This call is blocking and is only available on the compute engine.

Traits: TTKernel_BinaryOpTrait, TTKernel_FPUOpTrait

Operands:

OperandDescription
in0_cbTTKernel cb
in1_cbTTKernel cb
in0_tile_indexindex or 32-bit signless integer
in1_tile_indexindex or 32-bit signless integer
dst_indexindex or 32-bit signless integer

ttkernel.binary_op_init_common (tt::ttkernel::BinaryOpInitCommonOp)

Init function for all binary ops

Followed by the specific init required with an opcode (binrary_op_specific_init).

Traits: TTKernel_InitOpTrait

Operands:

OperandDescription
in0_cbTTKernel cb
in1_cbTTKernel cb
out_cbTTKernel cb

ttkernel.cb_pop_front (tt::ttkernel::CBPopFrontOp)

CBPopFront call.

CBPopFront operation

Operands:

OperandDescription
cbTTKernel cb
numPages32-bit signless integer

ttkernel.cb_push_back (tt::ttkernel::CBPushBackOp)

CBPushBack call.

CBPushBack operation

Operands:

OperandDescription
cbTTKernel cb
numPages32-bit signless integer

ttkernel.cb_reinterpret_shape (tt::ttkernel::CBReinterpretShapeOp)

Get the data format of a given CB

get_dataformat operation

Operands:

OperandDescription
inputTTKernel cb

Results:

ResultDescription
outputTTKernel cb

ttkernel.cb_reserve_back (tt::ttkernel::CBReserveBackOp)

CBReserveBack call.

CBReserveBack operation

Operands:

OperandDescription
cbTTKernel cb
numPages32-bit signless integer

ttkernel.cb_wait_front (tt::ttkernel::CBWaitFrontOp)

CBWaitFront call.

CBWaitFront operation

Operands:

OperandDescription
cbTTKernel cb
numPages32-bit signless integer

ttkernel.reinterpret_cast<volatile tt_l1_ptr uint32_t*> (tt::ttkernel::CastToL1PtrOp)

CastToL1Ptr

Cast specified addr to L1 pointer.

Interfaces: InferTypeOpInterface

Operands:

OperandDescription
addr32-bit signless integer or TTKernel l1 address or TTKernel semaphore

Results:

ResultDescription
l1_ptrTTKernel l1 address pointer

ttkernel.copy_tile_init (tt::ttkernel::CopyTileInitOp)

Perform the init for copy tile. This does not reconfigure the unpacker data types.

Must be called before copy_tile.

Operands:

OperandDescription
cb0TTKernel cb

ttkernel.copy_tile (tt::ttkernel::CopyTileOp)

Copy tile from specified CB to DST.

Copies a single tile from the specified input CB and writes the result to DST at a specified index. The function will employ unpacker to first unpack into SRC registers and then perform move into DST registers, at a specified index. For the in_tile_index to be valid for this call, cb_wait_front(n) had to be previously called to ensure that at least some number n>0 of tiles are available in the input CB. The CB index 0 then references the first tile in the received section of the CB, up to index n-1 (in a FIFO order). The DST register buffer must be in acquired state via tile_regs_acquire call. This call is blocking and is only available on the compute engine.

Operands:

OperandDescription
cb0TTKernel cb
tile_index_cbindex or 32-bit signless integer
tile_index_dstindex or 32-bit signless integer

ttkernel.dprint (tt::ttkernel::DPrintOp)

Print to output stream from kernel.

Syntax:

operation ::= `ttkernel.dprint` `(` $fmt `,` $argv `)` attr-dict `:` `(` type($argv) `)`

std::format style format string:

rewriter.create<ttkernel::DPrintOp>(loc, "nocY={} nocX={} addr={}\\n",
                                  nocY, nocX, addr);
ttkernel.dprint("virtY {} virtX {} addr {}\\n", %14, %15, %13) : (index, index, i32)

Notes:

  • Only trivial format specifier currently supported, i.e. {}.
  • Must double escape newline character or other special characters.

Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{MemoryEffects::Read on ::mlir::SideEffects::DefaultResource, MemoryEffects::Write on ::mlir::SideEffects::DefaultResource}

Attributes:

AttributeMLIR TypeDescription
fmt::mlir::StringAttrstring attribute

Operands:

OperandDescription
argvvariadic of any type

ttkernel.exp_tile_init (tt::ttkernel::ExpTileInitOp)

Short init function which configures compute unit for execution of exp_tile.

Must be run before exp_tile.

Traits: TTKernel_InitOpTrait

ttkernel.exp_tile (tt::ttkernel::ExpTileOp)

Exp operation

Performs element-wise computation of exponential on each element of a tile in DST register at index tile_index. The DST register buffer must be in acquired state via tile_regs_acquire call. This call is blocking and is only available on the compute engine.

Traits: TTKernel_FPUOpTrait, TTKernel_UnaryOpTrait

Operands:

OperandDescription
tile_indexindex or 32-bit signless integer

ttkernel.get_arg_val (tt::ttkernel::GetArgValOp)

Get runtime arg value.

Get runtime argument value at specified index.

Operands:

OperandDescription
arg_indexindex or 32-bit signless integer

Results:

ResultDescription
arg_val32-bit signless integer or TTKernel cb or TTKernel l1 address

ttkernel.get_compile_time_arg_val (tt::ttkernel::GetCompileArgValOp)

Get compile-time arg value.

Get compile-time argument value at specified index.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes:

AttributeMLIR TypeDescription
arg_index::mlir::IntegerAttr32-bit signless integer attribute

Results:

ResultDescription
arg_val32-bit signless integer or TTKernel cb or TTKernel l1 address

ttkernel.get_dataformat (tt::ttkernel::GetDataFormatOp)

Get the data format of a given CB

get_dataformat operation

Interfaces: InferTypeOpInterface

Operands:

OperandDescription
cbTTKernel cb

Results:

ResultDescription
dataFormatTTKernel compute data format type

ttkernel.get_interleaved_addr_gen_fast (tt::ttkernel::GetInterleavedAddrGenFastOp)

GetInterleavedAddrGenFastOp

Returns an InterleavedAddrGenFast type.

Interfaces: InferTypeOpInterface

Operands:

OperandDescription
DRAM1-bit signless integer
bank_base_address32-bit signless integer
page_size32-bit signless integer
data_formatTTKernel compute data format type

Results:

ResultDescription
resultTTKernel InterleavedAddrGenFast type

ttkernel.get_noc_addr_from_bank_id (tt::ttkernel::GetNocAddrFromBankIDOp)

GetNocAddrFromBankID

GetNocAddrFromBankID api

Interfaces: InferTypeOpInterface

Operands:

OperandDescription
bank_id32-bit signless integer
bankAddressOffset32-bit signless integer

Results:

ResultDescription
nocAddrTTKernel noc address

ttkernel.get_noc_addr (tt::ttkernel::GetNocAddrOp)

GetNocAddr

GetNocAddr api including core coordinates

Interfaces: InferTypeOpInterface

Operands:

OperandDescription
xindex or 32-bit signless integer
yindex or 32-bit signless integer
l1Address32-bit signless integer or TTKernel l1 address or TTKernel semaphore

Results:

ResultDescription
nocAddrTTKernel noc address

ttkernel.get_noc_multicast_addr (tt::ttkernel::GetNocMulticastAddrOp)

GetNocMulticastAddr

GetNocMulticastAddr

Interfaces: InferTypeOpInterface

Operands:

OperandDescription
noc_x_startindex or 32-bit signless integer
noc_y_startindex or 32-bit signless integer
noc_x_endindex or 32-bit signless integer
noc_y_endindex or 32-bit signless integer
addr32-bit signless integer or TTKernel l1 address or TTKernel semaphore
noc8-bit signless integer

Results:

ResultDescription
mcastNocAddrTTKernel noc address

ttkernel.get_read_ptr (tt::ttkernel::GetReadPtrOp)

GetReadPtr

GetReadPtr operation

Interfaces: InferTypeOpInterface

Operands:

OperandDescription
cbTTKernel cb

Results:

ResultDescription
readPtr32-bit signless integer

ttkernel.get_semaphore (tt::ttkernel::GetSemaphoreOp)

GetSemaphoreOp

Get L1 addr of the semaphore with specified semaphore id

Interfaces: InferTypeOpInterface

Operands:

OperandDescription
semaphoreindex or 32-bit signless integer

Results:

ResultDescription
sem_addrTTKernel semaphore

ttkernel.get_tile_size (tt::ttkernel::GetTileSizeOp)

Get the tile size in bytes of a given CB

get_tile_size operation

Interfaces: InferTypeOpInterface

Operands:

OperandDescription
cbTTKernel cb

Results:

ResultDescription
tileSizeBytes32-bit signless integer

ttkernel.get_write_ptr (tt::ttkernel::GetWritePtrOp)

GetWritePtr

GetWritePtr operation

Interfaces: InferTypeOpInterface

Operands:

OperandDescription
cbTTKernel cb

Results:

ResultDescription
writePtr32-bit signless integer

ttkernel.mm_init (tt::ttkernel::MatmulInitOp)

Matmul init function

Must be run before matmul.

Traits: TTKernel_InitOpTrait

Operands:

OperandDescription
in0_cbTTKernel cb
in1_cbTTKernel cb
out_cbTTKernel cb
transpose32-bit signless integer

ttkernel.matmul_tiles (tt::ttkernel::MatmulTilesOp)

Matmul tiles operation

Performs tile-sized matrix multiplication C=A*B between the tiles in two specified input CBs and writes the result to DST. The DST register buffer must be in acquired state via ttkernel.tile_regs_acquire call. This call is blocking and is only available on the compute engine.

Traits: TTKernel_FPUOpTrait, TTKernel_TernaryOpTrait

Operands:

OperandDescription
in0_cb_idTTKernel cb
in1_cb_idTTKernel cb
in0_tile_idxindex or 32-bit signless integer
in1_tile_idxindex or 32-bit signless integer
dst_tile_idxindex or 32-bit signless integer
transpose32-bit signless integer

ttkernel.max_tile_init (tt::ttkernel::MaxTilesInitOp)

Short init function

Must be run before max_tile.

Traits: TTKernel_InitOpTrait

ttkernel.max_tile (tt::ttkernel::MaxTilesOp)

Max operation

Performs element-wise computation of maximum operation DST[dst0_index] <- max(DST[dst0_index], DST[dst1_index]) on DST register operands. The DST register buffer must be in acquired state via tile_regs_acquire call.

Traits: TTKernel_BinaryOpTrait, TTKernel_SFPUOpTrait

Operands:

OperandDescription
dst0_indexindex or 32-bit signless integer
dst1_indexindex or 32-bit signless integer

ttkernel.mem_zeros_base (tt::ttkernel::MemZerosBaseOp)

Op corresponding to MEM_ZEROS_BASE macro in kernels.

Op corresponding to MEM_ZEROS_BASE macro in kernels.

Interfaces: InferTypeOpInterface

Results:

ResultDescription
result32-bit signless integer

ttkernel.mem_zeros_size (tt::ttkernel::MemZerosSizeOp)

Op corresponding to MEM_ZEROS_SIZE macro in kernels.

Op corresponding to MEM_ZEROS_SIZE macro in kernels.

Interfaces: InferTypeOpInterface

Results:

ResultDescription
result32-bit signless integer

ttkernel.mul_tiles_init_f (tt::ttkernel::MulTilesInitFOp)

Short init function. Init for math only.

Must be run before mul_tiles.

Traits: TTKernel_InitOpTrait

ttkernel.mul_tiles_init (tt::ttkernel::MulTilesInitOp)

Short init function

Must be run before mul_tiles.

Traits: TTKernel_InitOpTrait

Operands:

OperandDescription
in0_cbTTKernel cb
in1_cbTTKernel cb

ttkernel.mul_tiles (tt::ttkernel::MulTilesOp)

Mul operation

Performs element-wise multiplication C=A*B of tiles in two CBs at given indices and writes the result to the DST register at index dst_tile_index. The DST register buffer must be in acquired state via tile_regs_acquire call. This call is blocking and is only available on the compute engine.

Traits: TTKernel_BinaryOpTrait, TTKernel_FPUOpTrait

Operands:

OperandDescription
in0_cbTTKernel cb
in1_cbTTKernel cb
in0_tile_indexindex or 32-bit signless integer
in1_tile_indexindex or 32-bit signless integer
dst_indexindex or 32-bit signless integer

ttkernel.my_x (tt::ttkernel::MyXOp)

MyX

Lowers to the tt-metal supported MY_X macro. This represents the virtual X coordinate of the current core.

Interfaces: InferTypeOpInterface

Operands:

OperandDescription
noc8-bit signless integer

Results:

ResultDescription
xindex

ttkernel.my_y (tt::ttkernel::MyYOp)

MyY

Lowers to the tt-metal supported MY_Y macro. This represents the virtual Y coordinate of the current core.

Interfaces: InferTypeOpInterface

Operands:

OperandDescription
noc8-bit signless integer

Results:

ResultDescription
yindex

ttkernel.noc_async_read_barrier (tt::ttkernel::NocAsyncReadBarrierOp)

NocAsyncReadBarrier

NocAsyncReadBarrier

ttkernel.noc_async_read_one_packet_set_state (tt::ttkernel::NocAsyncReadOnePacketSetStateOp)

NocAsyncReadOnePacketSetState

NocAsyncReadOnePacketSetState

Operands:

OperandDescription
srcNocAddrTTKernel noc address
size32-bit signless integer

ttkernel.noc_async_read_one_packet_with_state (tt::ttkernel::NocAsyncReadOnePacketWithStateOp)

NocAsyncReadOnePacketWithState

NocAsyncReadOnePacketWithState

Operands:

OperandDescription
srcNocAddrTTKernel noc address
dstLocalL1Addr32-bit signless integer or TTKernel l1 address

ttkernel.noc_async_read (tt::ttkernel::NocAsyncReadOp)

NocAsyncRead

NocAsyncRead

Operands:

OperandDescription
srcNocAddrTTKernel noc address
dstLocalL1Addr32-bit signless integer
size32-bit signless integer

ttkernel.noc_async_read_tile (tt::ttkernel::NocAsyncReadTileOp)

NocAsyncReadTile

NocAsyncReadTile

Operands:

OperandDescription
id32-bit signless integer
addrGenStructTTKernel InterleavedAddrGenFast type
dstLocalL1Addr32-bit signless integer

ttkernel.noc_async_write_barrier (tt::ttkernel::NocAsyncWriteBarrierOp)

NocAsyncWriteBarrier

NocAsyncWriteBarrier

ttkernel.noc_async_write_multicast_loopback_src (tt::ttkernel::NocAsyncWriteMulticastLoopbackSrcOp)

NocAsyncWriteMulticastLoopbackSrc

NocAsyncWriteMulticastLoopbackSrc

Attributes:

AttributeMLIR TypeDescription
linked::mlir::BoolAttrbool attribute
multicast_path_reserve::mlir::BoolAttrbool attribute

Operands:

OperandDescription
srcLocalL1Addr32-bit signless integer
dstNocAddrMulticastTTKernel noc address
size32-bit signless integer
num_dests32-bit signless integer
noc8-bit signless integer

ttkernel.noc_async_write_multicast_one_packet (tt::ttkernel::NocAsyncWriteMulticastOnePacketOp)

NocAsyncWriteMulticastOnePacket

NocAsyncWriteMulticastOnePacket this issues only a single packet with size <= NOC_MAX_BURST_SIZE (ie maximum packet size)

Attributes:

AttributeMLIR TypeDescription
linked::mlir::BoolAttrbool attribute
multicast_path_reserve::mlir::BoolAttrbool attribute

Operands:

OperandDescription
srcLocalL1Addr32-bit signless integer
dstNocAddrMulticastTTKernel noc address
size32-bit signless integer
num_dests32-bit signless integer
noc8-bit signless integer

ttkernel.noc_async_write_multicast (tt::ttkernel::NocAsyncWriteMulticastOp)

NocAsyncWriteMulticast

Initiates an asynchronous write from a source address in L1 memory on the Tensix core executing this function call to a rectangular destination grid. The destinations are specified using a uint64_t encoding referencing an on-chip grid of nodes located at NOC coordinate range (x_start,y_start,x_end,y_end) and a local address created using get_noc_multicast_addr function. Also, see noc_async_write_barrier.

The destination nodes can only be a set of Tensix cores + L1 memory address. The destination nodes must form a rectangular grid. The destination L1 memory address must be the same on all destination nodes.

With this API, the multicast sender cannot be part of the multicast destinations. If the multicast sender has to be in the multicast destinations (i.e. must perform a local L1 write), the other API variant noc_async_write_multicast_loopback_src can be used.

Note: The number of destinations needs to be non-zero. Besides that, there is no restriction on the number of destinations, i.e. the multicast destinations can span the full chip. However, as mentioned previously, the multicast source cannot be part of the destinations. So, the maximum number of destinations is 119.

Attributes:

AttributeMLIR TypeDescription
linked::mlir::BoolAttrbool attribute
multicast_path_reserve::mlir::BoolAttrbool attribute

Operands:

OperandDescription
srcLocalL1Addr32-bit signless integer
dstNocAddrMulticastTTKernel noc address
size32-bit signless integer
num_dests32-bit signless integer
noc8-bit signless integer

ttkernel.noc_async_write (tt::ttkernel::NocAsyncWriteOp)

NocAsyncWrite

NocAsyncWrite

Operands:

OperandDescription
srcLocalL1Addr32-bit signless integer
dstNocAddrTTKernel noc address
size32-bit signless integer

ttkernel.noc_async_write_tile (tt::ttkernel::NocAsyncWriteTileOp)

NocAsyncWriteTile

NocAsyncWriteTilie

Operands:

OperandDescription
idindex or 32-bit signless integer
addrGenStructTTKernel InterleavedAddrGenFast type
srcLocalL1Addr32-bit signless integer

ttkernel.noc_semaphore_inc (tt::ttkernel::NocSemaphoreIncOp)

NocSemaphoreInc

The Tensix core executing this function call initiates an atomic increment (with 32-bit wrap) of a remote Tensix core L1 memory address. This L1 memory address is used as a semaphore of size 4 Bytes, as a synchronization mechanism.

Operands:

OperandDescription
addrTTKernel noc address
incrindex or 32-bit signless integer
noc_id8-bit signless integer

ttkernel.noc_semaphore_set_multicast_loopback_src (tt::ttkernel::NocSemaphoreSetMulticastLoopbackOp)

NocSemaphoreSetMulticastLoopback

Initiates an asynchronous write from a source address in L1 memory on the Tensix core executing this function call to a rectangular destination grid. The destinations are specified using a uint64_t encoding referencing an on-chip grid of nodes located at NOC coordinate range (x_start,y_start,x_end,y_end) and a local address created using get_noc_multicast_addr function. The size of data that is sent is 4 Bytes. This is usually used to set a semaphore value at the destination nodes, as a way of a synchronization mechanism. The same as noc_async_write_multicast with preset size of 4 Bytes. Note: With this API, sending data only to the source node (when num_dests is 1) may result in unexpected behaviour. For some parameters, hangs have been observed. For some other parameters, nothing may happen. Consider using regular non multicast operations such as noc_async_write in this case.

Attributes:

AttributeMLIR TypeDescription
linked::mlir::BoolAttrbool attribute
multicast_path_reserve::mlir::BoolAttrbool attribute

Operands:

OperandDescription
src_local_l1_addrTTKernel semaphore
dst_noc_addr_multicastTTKernel noc address
num_dests32-bit signless integer

ttkernel.noc_semaphore_set_multicast (tt::ttkernel::NocSemaphoreSetMulticastOp)

NocSemaphoreSetMulticast

Initiates an asynchronous write from a source address in L1 memory on the Tensix core executing this function call to a rectangular destination grid. The destinations are specified using a uint64_t encoding referencing an on-chip grid of nodes located at NOC coordinate range (x_start,y_start,x_end,y_end) and a local address created using get_noc_multicast_addr function. The size of data that is sent is 4 Bytes. This is usually used to set a semaphore value at the destination nodes, as a way of a synchronization mechanism. The same as noc_async_write_multicast with preset size of 4 Bytes. With this API, the multicast sender cannot be part of the multicast destinations. If the multicast sender has to be in the multicast destinations (i.e. must perform a local L1 write), the other API variant noc_semaphore_set_multicast_loopback_src can be used.

Attributes:

AttributeMLIR TypeDescription
linked::mlir::BoolAttrbool attribute
multicast_path_reserve::mlir::BoolAttrbool attribute

Operands:

OperandDescription
src_local_l1_addrTTKernel semaphore
dst_noc_addr_multicastTTKernel noc address
num_dests32-bit signless integer

ttkernel.noc_semaphore_set (tt::ttkernel::NocSemaphoreSetOp)

NocSemaphoreSet

Sets the value of a local L1 memory address on the Tensix core executing this function to a specific value. This L1 memory address is used as a semaphore of size 4 Bytes, as a synchronization mechanism. Also, see noc_semaphore_wait.

Operands:

OperandDescription
sem_addrTTKernel l1 address pointer
valindex or 32-bit signless integer

ttkernel.noc_semaphore_wait_min (tt::ttkernel::NocSemaphoreWaitMinOp)

NocSemaphoreWaitMin

A blocking call that waits until the value of a local L1 memory address on the Tensix core executing this function becomes equal or greater than a target value. This L1 memory address is used as a semaphore of size 4 Bytes, as a synchronization mechanism. Also, see noc_semaphore_set.

Operands:

OperandDescription
sem_addrTTKernel l1 address pointer
val32-bit signless integer

ttkernel.noc_semaphore_wait (tt::ttkernel::NocSemaphoreWaitOp)

NocSemaphoreWait

A blocking call that waits until the value of a local L1 memory address on the Tensix core executing this function becomes equal to a target value. This L1 memory address is used as a semaphore of size 4 Bytes, as a synchronization mechanism. Also, see noc_semaphore_set.

Operands:

OperandDescription
sem_addrTTKernel l1 address pointer
valindex or 32-bit signless integer

ttkernel.pack_tile (tt::ttkernel::PackTileOp)

PackTile op.

Copies a single tile from the DST register buffer at a specified index to a specified CB at a given index. For the out_tile_index to be valid for this call, cb_reserve_back(n) has to be called first to reserve at least some number n > 0 of tiles in the output CB. out_tile_index = 0 then references the first tile in the reserved section of the CB, up to index n - 1, which will then be visible to the consumer in the same order after a cb_push_back call. The DST register buffer must be in acquired state via tile_regs_acquire call. This call is blocking and is only available on the compute engine.

Each subsequent pack call will increment the write pointer in the cb by single tile size. The pointer is then again set to a valid position with space for n reserved tiles by another cb_reserve_back call.

Operates in tandem with functions cb_reserve_back and cb_push_back.

A typical use case is first the producer ensures that there is a number of tiles available in the buffer via cb_reserve_back, then the producer uses the pack_tile call to copy a tile from one of DST slots to a slot in reserved space and finally cb_push_back is called to announce visibility of the reserved section of the circular buffer to the consumer.

Attributes:

AttributeMLIR TypeDescription
out_of_order::mlir::BoolAttrbool attribute

Operands:

OperandDescription
dst_indexindex or 32-bit signless integer
out_cbTTKernel cb
out_indexindex or 32-bit signless integer

ttkernel.recip_tile_init (tt::ttkernel::RecipTileInitOp)

Init function for recip_tile operation. Refer to documentation for any init function.

Must be called before recip_tile function.

Traits: TTKernel_InitOpTrait

ttkernel.recip_tile (tt::ttkernel::RecipTileOp)

Recip tile in the DST at specified index.

Performs element-wise computation of the reciprocal on each element of a tile in DST register at index tile_index. The DST register buffer must be in acquired state via tile_regs_acquire call. This call is blocking and is only available on the compute engine. Only works for Float32, Float16_b, Bfp8_b data formats for full accuracy.

Traits: TTKernel_FPUOpTrait, TTKernel_UnaryOpTrait

Operands:

OperandDescription
tile_indexindex or 32-bit signless integer

ttkernel.reduce_init (tt::ttkernel::ReduceInitOp)

Init function

Must be run before reduce_tile.

Traits: TTKernel_InitOpTrait

Attributes:

AttributeMLIR TypeDescription
reduce_type::mlir::tt::ttkernel::ReduceTypeAttr
TTKernel Reduce Types{{% markdown %}}Enum cases: * reduce_sum (`Sum`) * reduce_max (`Max`){{% /markdown %}}
reduce_dim::mlir::tt::ttkernel::ReduceDimAttr
TTKernel Reduce Dimensions{{% markdown %}}Enum cases: * reduce_dim_row (`Row`) * reduce_dim_col (`Col`) * reduce_dim_scalar (`Scalar`) * reduce_dim_none (`None`){{% /markdown %}}

Operands:

OperandDescription
in_cbTTKernel cb
scaling_cbTTKernel cb
out_cbTTKernel cb

ttkernel.reduce_tile (tt::ttkernel::ReduceTileOp)

Reduce operation

Performs a reduction operation B = reduce(A) using reduce_func for dimension reduction on a tile in the CB at a given index and writes the result to the DST register at index dst_tile_index. Reduction can be either of type Reduce::R, Reduce::C or Reduce::RC, identifying the dimension(s) to be reduced in size to 1. The DST register buffer must be in acquired state via tile_regs_acquire call. The templates takes reduce_type which can be ReduceFunc::Sum, ReduceFunc::Max and reduce_dim which can be Reduce::R, Reduce::C, Reduce::RC. They can also be specified by defines REDUCE_OP and REDUCE_DIM. This call is blocking and is only available on the compute engine.

Traits: TTKernel_BinaryOpTrait, TTKernel_FPUOpTrait

Attributes:

AttributeMLIR TypeDescription
reduce_type::mlir::tt::ttkernel::ReduceTypeAttr
TTKernel Reduce Types{{% markdown %}}Enum cases: * reduce_sum (`Sum`) * reduce_max (`Max`){{% /markdown %}}
reduce_dim::mlir::tt::ttkernel::ReduceDimAttr
TTKernel Reduce Dimensions{{% markdown %}}Enum cases: * reduce_dim_row (`Row`) * reduce_dim_col (`Col`) * reduce_dim_scalar (`Scalar`) * reduce_dim_none (`None`){{% /markdown %}}

Operands:

OperandDescription
in_cbTTKernel cb
scaling_cbTTKernel cb
in_tile_indexindex or 32-bit signless integer
scaling_tile_indexindex or 32-bit signless integer
dst_indexindex or 32-bit signless integer

ttkernel.sin_tile_init (tt::ttkernel::SinTileInitOp)

Init function for sin_tile operation. Refer to documentation for any init function.

Must be run before sin_tile.

Traits: TTKernel_InitOpTrait

ttkernel.sin_tile (tt::ttkernel::SinTileOp)

Sine tile in the DST at specified index.

Performs element-wise computation of sine operation DST[dst0_index] <- sin(DST[dst0_index]) on DST register operands. The DST register buffer must be in acquired state via tile_regs_acquire call.

Traits: TTKernel_SFPUOpTrait, TTKernel_UnaryOpTrait

Operands:

OperandDescription
dst0_indexindex or 32-bit signless integer

ttkernel.store_to_l1 (tt::ttkernel::StoreToL1Op)

StoreToL1

Store value to L1.

Operands:

OperandDescription
value32-bit signless integer
l1_ptrTTKernel l1 address pointer
offset32-bit signless integer

ttkernel.tile_regs_acquire (tt::ttkernel::TileRegsAcquireOp)

Tile_regs_acquire

Acquire an exclusive lock on the DST register for the MATH thread. This register is an array of 16 tiles of 32x32 elements each. This is a blocking function, i.e. this function will wait until the lock is acquired.

ttkernel.tile_regs_commit (tt::ttkernel::TileRegsCommitOp)

Tile_regs_commit

Release lock on DST register by MATH thread. The lock had to be previously acquired with tile_regs_acquire.

ttkernel.tile_regs_release (tt::ttkernel::TileRegsReleaseOp)

Tile_regs_release

Release lock on DST register by PACK thread. The lock had to be previously acquired with tile_regs_wait.

ttkernel.tile_regs_wait (tt::ttkernel::TileRegsWaitOp)

Tile_regs_wait

Acquire an exclusive lock on the DST register for the PACK thread. It waits for the MATH thread to commit the DST register. This is a blocking function, i.e. this function will wait until the lock is acquired.

ttkernel.tilize_block (tt::ttkernel::TilizeBlockOp)

TilizeBlockOp call.

TilizeBlockOp operation

Operands:

OperandDescription
cbInTTKernel cb
numTiles32-bit signless integer
cbOutTTKernel cb

ttkernel.tilize_init (tt::ttkernel::TilizeInitOp)

TilizeInitOp call.

Initialize the tilize operation. To be called once at beginning of a kernel.

Operands:

OperandDescription
cbInTTKernel cb
numTiles32-bit signless integer
cbOutTTKernel cb

ttkernel.tilize_init_short (tt::ttkernel::TilizeInitShortOp)

TilizeInitShortOp call.

Re-initialize for the tilize operation. This can be called after a full init.

Operands:

OperandDescription
cbInTTKernel cb
numiles32-bit signless integer
cbOutTTKernel cb

ttkernel.tilize_uninit (tt::ttkernel::TilizeUninitOp)

TilizeUninitOp call.

Uninitialize tilize operation before re-initializing for another operation.

Operands:

OperandDescription
cbITTKernel cb
cbOutTTKernel cb

ttkernel.unary_op_init_common (tt::ttkernel::UnaryOpInitCommonOp)

Initialization function for unary operations.

This operation initializes all necessary components for unary operations, including unpacking, packing, and math configurations.

Traits: TTKernel_InitOpTrait

Operands:

OperandDescription
icbTTKernel cb
ocbTTKernel cb

ttkernel.unreachable (tt::ttkernel::UnreachableOp)

Unreachable op.

Unreachable operation

Traits: AlwaysSpeculatableImplTrait, ReturnLike, Terminator

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface), RegionBranchTerminatorOpInterface

Effects: MemoryEffects::Effect{}

ttkernel.untilize_block (tt::ttkernel::UntilizeBlockOp)

UntilizeBlockOp call.

UntilizeBlockOp operation

Operands:

OperandDescription
cbInTTKernel cb
numTiles32-bit signless integer
cbOutTTKernel cb

ttkernel.untilize_init (tt::ttkernel::UntilizeInitOp)

UntilizeInitOp call.

Init function for untilize operations, to be used at the beginning of the kernel.

Operands:

OperandDescription
cbInTTKernel cb
cbOutTTKernel cb

ttkernel.untilize_init_short (tt::ttkernel::UntilizeInitShortOp)

UntilizeInitShortOp call.

Re-initialize for the tilize operation. This can be called after a full init.

Operands:

OperandDescription
cbInTTKernel cb

ttkernel.untilize_uninit (tt::ttkernel::UntilizeUninitOp)

UntilizeUninitOp call.

Uninitialize untilize operation, to allow initializing another operation.

Operands:

OperandDescription
cbInTTKernel cb