ttkernel.add
(tt::ttkernel::AddOp)
Add operation
Add operation
Operands:
Operand | Description |
---|---|
dst_index | 32-bit signless integer |
ttkernel.add_tiles_init
(tt::ttkernel::AddTilesInitOp)
Short init function
Must be run before add_tiles.
Operands:
Operand | Description |
---|---|
in0_cb | TTKernel cb |
in1_cb | TTKernel cb |
ttkernel.add_tiles
(tt::ttkernel::AddTilesOp)
Add operation
Performs element-wise addition C=A+B of tiles in two CBs at given indices and writes the result to the DST register at index dst_tile_index. The DST register buffer must be in acquired state via tile_regs_acquire call. This call is blocking and is only available on the compute engine.
Operands:
Operand | Description |
---|---|
in0_cb | TTKernel cb |
in1_cb | TTKernel cb |
in0_tile_index | 32-bit signless integer |
in1_tile_index | 32-bit signless integer |
dst_index | 32-bit signless integer |
ttkernel.binary_op_init_common
(tt::ttkernel::BinaryOpInitCommonOp)
Init function for all binary ops
Followed by the specific init required with an opcode (binrary_op_specific_init).
Operands:
Operand | Description |
---|---|
in0_cb | TTKernel cb |
in1_cb | TTKernel cb |
out_cb | TTKernel cb |
ttkernel.builtin
(tt::ttkernel::BuiltinOp)
Builtin call.
Kernel operation
Attributes:
Attribute | MLIR Type | Description |
---|---|---|
op | ::mlir::FlatSymbolRefAttr | flat symbol reference attribute |
kind | ::mlir::FlatSymbolRefAttr | flat symbol reference attribute |
Operands:
Operand | Description |
---|---|
args | variadic of non-0-ranked.memref of any type values or TTKernel cb |
ttkernel.cb_pop_front
(tt::ttkernel::CBPopFrontOp)
CBPopFront call.
CBPopFront operation
Operands:
Operand | Description |
---|---|
cb | TTKernel cb |
numPages | 32-bit signless integer |
ttkernel.cb_push_back
(tt::ttkernel::CBPushBackOp)
CBPushBack call.
CBPushBack operation
Operands:
Operand | Description |
---|---|
cb | TTKernel cb |
numPages | 32-bit signless integer |
ttkernel.cb_reserve_back
(tt::ttkernel::CBReserveBackOp)
CBReserveBack call.
CBReserveBack operation
Operands:
Operand | Description |
---|---|
cb | TTKernel cb |
numPages | 32-bit signless integer |
ttkernel.cb_wait_front
(tt::ttkernel::CBWaitFrontOp)
CBWaitFront call.
CBWaitFront operation
Operands:
Operand | Description |
---|---|
cb | TTKernel cb |
numPages | 32-bit signless integer |
ttkernel.reinterpret_cast<volatile tt_l1_ptr uint32_t*>
(tt::ttkernel::CastToL1PtrOp)
CastToL1Ptr
Cast specified addr to L1 pointer.
Interfaces: InferTypeOpInterface
Operands:
Operand | Description |
---|---|
addr | 32-bit signless integer or TTKernel l1 address |
Results:
Result | Description |
---|---|
l1_ptr | TTKernel l1 address pointer |
ttkernel.copy_tile_init
(tt::ttkernel::CopyTileInitOp)
Perform the init for copy tile. This does not reconfigure the unpacker data types.
Must be called before copy_tile.
ttkernel.copy_tile
(tt::ttkernel::CopyTileOp)
Copy tile from specified CB to DST.
Copies a single tile from the specified input CB and writes the result to DST at a specified index. The function will employ unpacker to first unpack into SRC registers and then perform move into DST registers, at a specified index. For the in_tile_index to be valid for this call, cb_wait_front(n) had to be previously called to ensure that at least some number n>0 of tiles are available in the input CB. The CB index 0 then references the first tile in the received section of the CB, up to index n-1 (in a FIFO order). The DST register buffer must be in acquired state via tile_regs_acquire call. This call is blocking and is only available on the compute engine.
Operands:
Operand | Description |
---|---|
cb0 | TTKernel cb |
tile_index_cb | 32-bit signless integer |
tile_index_dst | 32-bit signless integer |
ttkernel.exp_tile_init
(tt::ttkernel::ExpTileInitOp)
Short init function which configures compute unit for execution of exp_tile.
Must be run before exp_tile.
ttkernel.exp_tile
(tt::ttkernel::ExpTileOp)
Exp operation
Performs element-wise computation of exponential on each element of a tile in DST register at index tile_index. The DST register buffer must be in acquired state via tile_regs_acquire call. This call is blocking and is only available on the compute engine.
Operands:
Operand | Description |
---|---|
tile_index | 32-bit signless integer |
ttkernel.get_noc_addr
(tt::ttkernel::GetNocAddrOp)
GetNocAddr
GetNocAddr
Interfaces: InferTypeOpInterface
Operands:
Operand | Description |
---|---|
l1Address | 32-bit signless integer |
Results:
Result | Description |
---|---|
nocAddr | TTKernel noc address |
ttkernel.get_noc_addr_xy
(tt::ttkernel::GetNocAddrXYOp)
GetNocAddrXY
GetNocAddr api including core coordinates
Interfaces: InferTypeOpInterface
Operands:
Operand | Description |
---|---|
x | 32-bit signless integer |
y | 32-bit signless integer |
l1Address | 32-bit signless integer |
Results:
Result | Description |
---|---|
nocAddr | TTKernel noc address |
ttkernel.get_noc_multicast_addr
(tt::ttkernel::GetNocMulticastAddrOp)
GetNocMulticastAddr
GetNocMulticastAddr
Interfaces: InferTypeOpInterface
Operands:
Operand | Description |
---|---|
noc_x_start | 32-bit signless integer |
noc_y_start | 32-bit signless integer |
noc_x_end | 32-bit signless integer |
noc_y_end | 32-bit signless integer |
addr | 32-bit signless integer |
noc | 8-bit signless integer |
Results:
Result | Description |
---|---|
mcastNocAddr | TTKernel noc address |
ttkernel.get_write_ptr
(tt::ttkernel::GetWritePtrOp)
GetWritePtr
GetWritePtr operation
Interfaces: InferTypeOpInterface
Operands:
Operand | Description |
---|---|
cb | TTKernel cb |
Results:
Result | Description |
---|---|
writePtr | 32-bit signless integer |
ttkernel.matmul
(tt::ttkernel::MatmulOp)
Matmul operation
Matmul operation
Operands:
Operand | Description |
---|---|
dst_index | 32-bit signless integer |
ttkernel.max
(tt::ttkernel::MaxOp)
Max operation
Max operation
Operands:
Operand | Description |
---|---|
dst_index | 32-bit signless integer |
ttkernel.max_tile_init
(tt::ttkernel::MaxTilesInitOp)
Short init function
Must be run before max_tile.
ttkernel.max_tile
(tt::ttkernel::MaxTilesOp)
Max operation
Performs element-wise computation of maximum operation DST[dst0_index] <- max(DST[dst0_index], DST[dst1_index]) on DST register operands. The DST register buffer must be in acquired state via tile_regs_acquire call.
Operands:
Operand | Description |
---|---|
dst0_index | 32-bit signless integer |
dst1_index | 32-bit signless integer |
ttkernel.mem_zeros_base
(tt::ttkernel::MemZerosBaseOp)
Op corresponding to MEM_ZEROS_BASE macro in kernels.
Op corresponding to MEM_ZEROS_BASE macro in kernels.
Interfaces: InferTypeOpInterface
Results:
Result | Description |
---|---|
result | 32-bit signless integer |
ttkernel.mem_zeros_size
(tt::ttkernel::MemZerosSizeOp)
Op corresponding to MEM_ZEROS_SIZE macro in kernels.
Op corresponding to MEM_ZEROS_SIZE macro in kernels.
Interfaces: InferTypeOpInterface
Results:
Result | Description |
---|---|
result | 32-bit signless integer |
ttkernel.mul
(tt::ttkernel::MulOp)
Mul operation
Mul operation
Operands:
Operand | Description |
---|---|
dst_index | 32-bit signless integer |
ttkernel.mul_tiles_init_f
(tt::ttkernel::MulTilesInitFOp)
Short init function. Init for math only.
Must be run before mul_tiles.
ttkernel.mul_tiles_init
(tt::ttkernel::MulTilesInitOp)
Short init function
Must be run before mul_tiles.
Operands:
Operand | Description |
---|---|
in0_cb | TTKernel cb |
in1_cb | TTKernel cb |
ttkernel.mul_tiles
(tt::ttkernel::MulTilesOp)
Mul operation
Performs element-wise multiplication C=A*B of tiles in two CBs at given indices and writes the result to the DST register at index dst_tile_index. The DST register buffer must be in acquired state via tile_regs_acquire call. This call is blocking and is only available on the compute engine.
Operands:
Operand | Description |
---|---|
in0_cb | TTKernel cb |
in1_cb | TTKernel cb |
in0_tile_index | 32-bit signless integer |
in1_tile_index | 32-bit signless integer |
dst_index | 32-bit signless integer |
ttkernel.noc_async_read_barrier
(tt::ttkernel::NocAsyncReadBarrierOp)
NocAsyncReadBarrier
NocAsyncReadBarrier
ttkernel.noc_async_read_one_packet_set_state
(tt::ttkernel::NocAsyncReadOnePacketSetStateOp)
NocAsyncReadOnePacketSetState
NocAsyncReadOnePacketSetState
Operands:
Operand | Description |
---|---|
srcNocAddr | TTKernel noc address |
size | 32-bit signless integer |
ttkernel.noc_async_read_one_packet_with_state
(tt::ttkernel::NocAsyncReadOnePacketWithStateOp)
NocAsyncReadOnePacketWithState
NocAsyncReadOnePacketWithState
Operands:
Operand | Description |
---|---|
srcNocAddr | TTKernel noc address |
dstLocalL1Addr | 32-bit signless integer or TTKernel l1 address |
ttkernel.noc_async_read
(tt::ttkernel::NocAsyncReadOp)
NocAsyncRead
NocAsyncRead
Operands:
Operand | Description |
---|---|
srcNocAddr | TTKernel noc address |
dstLocalL1Addr | 32-bit signless integer |
size | 32-bit signless integer |
ttkernel.noc_async_write_barrier
(tt::ttkernel::NocAsyncWriteBarrierOp)
NocAsyncWriteBarrier
NocAsyncWriteBarrier
ttkernel.noc_async_write_multicast_loopback_src
(tt::ttkernel::NocAsyncWriteMulticastLoopbackSrcOp)
NocAsyncWriteMulticastLoopbackSrc
NocAsyncWriteMulticastLoopbackSrc
Attributes:
Attribute | MLIR Type | Description |
---|---|---|
linked | ::mlir::BoolAttr | bool attribute |
multicast_path_reserve | ::mlir::BoolAttr | bool attribute |
Operands:
Operand | Description |
---|---|
srcLocalL1Addr | 32-bit signless integer |
dstNocAddrMulticast | TTKernel noc address |
size | 32-bit signless integer |
num_dests | 32-bit signless integer |
noc | 8-bit signless integer |
ttkernel.noc_async_write_multicast_one_packet
(tt::ttkernel::NocAsyncWriteMulticastOnePacketOp)
NocAsyncWriteMulticastOnePacket
NocAsyncWriteMulticastOnePacket this issues only a single packet with size <= NOC_MAX_BURST_SIZE (ie maximum packet size)
Attributes:
Attribute | MLIR Type | Description |
---|---|---|
linked | ::mlir::BoolAttr | bool attribute |
multicast_path_reserve | ::mlir::BoolAttr | bool attribute |
Operands:
Operand | Description |
---|---|
srcLocalL1Addr | 32-bit signless integer |
dstNocAddrMulticast | TTKernel noc address |
size | 32-bit signless integer |
num_dests | 32-bit signless integer |
noc | 8-bit signless integer |
ttkernel.noc_async_write_multicast
(tt::ttkernel::NocAsyncWriteMulticastOp)
NocAsyncWriteMulticast
Initiates an asynchronous write from a source address in L1 memory on the Tensix core executing this function call to a rectangular destination grid. The destinations are specified using a uint64_t encoding referencing an on-chip grid of nodes located at NOC coordinate range (x_start,y_start,x_end,y_end) and a local address created using get_noc_multicast_addr function. Also, see noc_async_write_barrier.
The destination nodes can only be a set of Tensix cores + L1 memory address. The destination nodes must form a rectangular grid. The destination L1 memory address must be the same on all destination nodes.
With this API, the multicast sender cannot be part of the multicast destinations. If the multicast sender has to be in the multicast destinations (i.e. must perform a local L1 write), the other API variant noc_async_write_multicast_loopback_src can be used.
Note: The number of destinations needs to be non-zero. Besides that, there is no restriction on the number of destinations, i.e. the multicast destinations can span the full chip. However, as mentioned previously, the multicast source cannot be part of the destinations. So, the maximum number of destinations is 119.
Attributes:
Attribute | MLIR Type | Description |
---|---|---|
linked | ::mlir::BoolAttr | bool attribute |
multicast_path_reserve | ::mlir::BoolAttr | bool attribute |
Operands:
Operand | Description |
---|---|
srcLocalL1Addr | 32-bit signless integer |
dstNocAddrMulticast | TTKernel noc address |
size | 32-bit signless integer |
num_dests | 32-bit signless integer |
noc | 8-bit signless integer |
ttkernel.noc_async_write
(tt::ttkernel::NocAsyncWriteOp)
NocAsyncWrite
NocAsyncWrite
Operands:
Operand | Description |
---|---|
srcLocalL1Addr | 32-bit signless integer |
dstNocAddr | TTKernel noc address |
size | 32-bit signless integer |
ttkernel.pack
(tt::ttkernel::PackOp)
Pack op.
Pack operation
Operands:
Operand | Description |
---|---|
dst_index | 32-bit signless integer |
out_cb | TTKernel cb |
out_index | 32-bit signless integer |
ttkernel.pack_set_data_type
(tt::ttkernel::PackSetDataTypeOp)
Pack set DataType op.
Pack set DataType operation
Attributes:
Attribute | MLIR Type | Description |
---|---|---|
data_type | ::mlir::IntegerAttr | TT DataTypes{{% markdown %}}Enum cases: * f32 (`Float32`) * f16 (`Float16`) * bf16 (`BFloat16`) * bfp_f8 (`BFP_Float8`) * bfp_bf8 (`BFP_BFloat8`) * bfp_f4 (`BFP_Float4`) * bfp_bf4 (`BFP_BFloat4`) * bfp_f2 (`BFP_Float2`) * bfp_bf2 (`BFP_BFloat2`) * u32 (`UInt32`) * u16 (`UInt16`) * u8 (`UInt8`){{% /markdown %}} |
ttkernel.pack_tile
(tt::ttkernel::PackTileOp)
PackTile op.
Copies a single tile from the DST register buffer at a specified index to a specified CB at a given index. For the out_tile_index to be valid for this call, cb_reserve_back(n) has to be called first to reserve at least some number n > 0 of tiles in the output CB. out_tile_index = 0 then references the first tile in the reserved section of the CB, up to index n - 1, which will then be visible to the consumer in the same order after a cb_push_back call. The DST register buffer must be in acquired state via tile_regs_acquire call. This call is blocking and is only available on the compute engine.
Each subsequent pack call will increment the write pointer in the cb by single tile size. The pointer is then again set to a valid position with space for n reserved tiles by another cb_reserve_back call.
Operates in tandem with functions cb_reserve_back and cb_push_back.
A typical use case is first the producer ensures that there is a number of tiles available in the buffer via cb_reserve_back, then the producer uses the pack_tile call to copy a tile from one of DST slots to a slot in reserved space and finally cb_push_back is called to announce visibility of the reserved section of the circular buffer to the consumer.
Operands:
Operand | Description |
---|---|
dst_index | 32-bit signless integer |
out_cb | TTKernel cb |
out_index | 32-bit signless integer |
ttkernel.recip_tile_init
(tt::ttkernel::RecipTileInitOp)
Init function for recip_tile operation. Refer to documentation for any init function.
Must be called before recip_tile function.
ttkernel.recip_tile
(tt::ttkernel::RecipTileOp)
Recip tile in the DST at specified index.
Performs element-wise computation of the reciprocal on each element of a tile in DST register at index tile_index. The DST register buffer must be in acquired state via tile_regs_acquire call. This call is blocking and is only available on the compute engine. Only works for Float32, Float16_b, Bfp8_b data formats for full accuracy.
Operands:
Operand | Description |
---|---|
tile_index | 32-bit signless integer |
ttkernel.reduce_init
(tt::ttkernel::ReduceInitOp)
Init function
Must be run before reduce_tile.
Attributes:
Attribute | MLIR Type | Description |
---|---|---|
reduce_type | ::mlir::tt::ttkernel::ReduceTypeAttr | TTKernel Reduce Types{{% markdown %}}Enum cases: * reduce_sum (`Sum`) * reduce_max (`Max`){{% /markdown %}} |
reduce_dim | ::mlir::tt::ttkernel::ReduceDimAttr | TTKernel Reduce Dimensions{{% markdown %}}Enum cases: * reduce_dim_row (`Row`) * reduce_dim_col (`Col`) * reduce_dim_scalar (`Scalar`) * reduce_dim_none (`None`){{% /markdown %}} |
Operands:
Operand | Description |
---|---|
in_cb | TTKernel cb |
scaling_cb | TTKernel cb |
out_cb | TTKernel cb |
ttkernel.reduce_tile
(tt::ttkernel::ReduceTileOp)
Reduce operation
Performs a reduction operation B = reduce(A) using reduce_func for dimension reduction on a tile in the CB at a given index and writes the result to the DST register at index dst_tile_index. Reduction can be either of type Reduce::R, Reduce::C or Reduce::RC, identifying the dimension(s) to be reduced in size to 1. The DST register buffer must be in acquired state via tile_regs_acquire call. The templates takes reduce_type which can be ReduceFunc::Sum, ReduceFunc::Max and reduce_dim which can be Reduce::R, Reduce::C, Reduce::RC. They can also be specified by defines REDUCE_OP and REDUCE_DIM. This call is blocking and is only available on the compute engine.
Attributes:
Attribute | MLIR Type | Description |
---|---|---|
reduce_type | ::mlir::tt::ttkernel::ReduceTypeAttr | TTKernel Reduce Types{{% markdown %}}Enum cases: * reduce_sum (`Sum`) * reduce_max (`Max`){{% /markdown %}} |
reduce_dim | ::mlir::tt::ttkernel::ReduceDimAttr | TTKernel Reduce Dimensions{{% markdown %}}Enum cases: * reduce_dim_row (`Row`) * reduce_dim_col (`Col`) * reduce_dim_scalar (`Scalar`) * reduce_dim_none (`None`){{% /markdown %}} |
Operands:
Operand | Description |
---|---|
in_cb | TTKernel cb |
scaling_cb | TTKernel cb |
in_tile_index | 32-bit signless integer |
scaling_tile_index | 32-bit signless integer |
dst_index | 32-bit signless integer |
ttkernel.return
(tt::ttkernel::ReturnOp)
Return op.
Return operation
Traits: AlwaysSpeculatableImplTrait
, ReturnLike
, Terminator
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
, RegionBranchTerminatorOpInterface
Effects: MemoryEffects::Effect{}
ttkernel.store_to_l1
(tt::ttkernel::StoreToL1Op)
StoreToL1
Store value to L1.
Operands:
Operand | Description |
---|---|
value | 32-bit signless integer |
l1_ptr | TTKernel l1 address pointer |
offset | 32-bit signless integer |
ttkernel.sub
(tt::ttkernel::SubOp)
Sub operation
Sub operation
Operands:
Operand | Description |
---|---|
dst_index | 32-bit signless integer |
ttkernel.tile_regs_acquire
(tt::ttkernel::TileRegsAcquireOp)
Tile_regs_acquire
Acquire an exclusive lock on the DST register for the MATH thread. This register is an array of 16 tiles of 32x32 elements each. This is a blocking function, i.e. this function will wait until the lock is acquired.
ttkernel.tile_regs_commit
(tt::ttkernel::TileRegsCommitOp)
Tile_regs_commit
Release lock on DST register by MATH thread. The lock had to be previously acquired with tile_regs_acquire.
ttkernel.tile_regs_release
(tt::ttkernel::TileRegsReleaseOp)
Tile_regs_release
Release lock on DST register by PACK thread. The lock had to be previously acquired with tile_regs_wait.
ttkernel.tile_regs_wait
(tt::ttkernel::TileRegsWaitOp)
Tile_regs_wait
Acquire an exclusive lock on the DST register for the PACK thread. It waits for the MATH thread to commit the DST register. This is a blocking function, i.e. this function will wait until the lock is acquired.
ttkernel.tilize_block
(tt::ttkernel::TilizeBlockOp)
TilizeBlockOp call.
TilizeBlockOp operation
Operands:
Operand | Description |
---|---|
cbIn | TTKernel cb |
numTiles | 32-bit signless integer |
cbOut | TTKernel cb |
ttkernel.tilize_init
(tt::ttkernel::TilizeInitOp)
TilizeInitOp call.
TilizeInitOp operation
Operands:
Operand | Description |
---|---|
cbIn | TTKernel cb |
numTiles | 32-bit signless integer |
cbOut | TTKernel cb |
ttkernel.unary_op_init_common
(tt::ttkernel::UnaryOpInitCommonOp)
Initialization function for unary operations.
This operation initializes all necessary components for unary operations, including unpacking, packing, and math configurations.
Operands:
Operand | Description |
---|---|
icb | TTKernel cb |
ocb | TTKernel cb |
ttkernel.unpack_ab
(tt::ttkernel::UnpackABOp)
UnpackAB op.
UnpackAB operation
Operands:
Operand | Description |
---|---|
cb_a | TTKernel cb |
src_a_index | 32-bit signless integer |
cb_b | TTKernel cb |
src_b_index | 32-bit signless integer |
ttkernel.unpack_a
(tt::ttkernel::UnpackAOp)
UnpackA op.
UnpackA operation
Operands:
Operand | Description |
---|---|
cb | TTKernel cb |
src_index | 32-bit signless integer |
ttkernel.unpack_set_data_type
(tt::ttkernel::UnpackSetDataTypeOp)
Unpack set DataType op.
Unpack set DataType operation
Attributes:
Attribute | MLIR Type | Description |
---|---|---|
data_type_a | ::mlir::IntegerAttr | TT DataTypes{{% markdown %}}Enum cases: * f32 (`Float32`) * f16 (`Float16`) * bf16 (`BFloat16`) * bfp_f8 (`BFP_Float8`) * bfp_bf8 (`BFP_BFloat8`) * bfp_f4 (`BFP_Float4`) * bfp_bf4 (`BFP_BFloat4`) * bfp_f2 (`BFP_Float2`) * bfp_bf2 (`BFP_BFloat2`) * u32 (`UInt32`) * u16 (`UInt16`) * u8 (`UInt8`){{% /markdown %}} |
data_type_b | ::mlir::IntegerAttr | TT DataTypes{{% markdown %}}Enum cases: * f32 (`Float32`) * f16 (`Float16`) * bf16 (`BFloat16`) * bfp_f8 (`BFP_Float8`) * bfp_bf8 (`BFP_BFloat8`) * bfp_f4 (`BFP_Float4`) * bfp_bf4 (`BFP_BFloat4`) * bfp_f2 (`BFP_Float2`) * bfp_bf2 (`BFP_BFloat2`) * u32 (`UInt32`) * u16 (`UInt16`) * u8 (`UInt8`){{% /markdown %}} |
ttkernel.unreachable
(tt::ttkernel::UnreachableOp)
Unreachable op.
Unreachable operation
Traits: AlwaysSpeculatableImplTrait
, ReturnLike
, Terminator
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
, RegionBranchTerminatorOpInterface
Effects: MemoryEffects::Effect{}
ttkernel.untilize_block
(tt::ttkernel::UntilizeBlockOp)
UntilizeBlockOp call.
UntilizeBlockOp operation
Operands:
Operand | Description |
---|---|
cbIn | TTKernel cb |
numTiles | 32-bit signless integer |
cbOut | TTKernel cb |
ttkernel.untilize_init
(tt::ttkernel::UntilizeInitOp)
UntilizeInitOp call.
UntilizeInitOp operation
Operands:
Operand | Description |
---|---|
cbIn | TTKernel cb |
cbOut | TTKernel cb |