ttnn.MatmulMultiCoreReuseMultiCastProgramConfig

class ttnn.MatmulMultiCoreReuseMultiCastProgramConfig

Bases: pybind11_object

The “2D” matmul program config is used for block sharded tensors, and general interleaved tensors.

property compute_with_storage_grid_size

Grid size for compute cores with storage capability.

Specifies the 2D grid of cores (x, y) that will be used for computation and have access to storage. This determines how the computation is distributed across cores and affects multicast communication patterns.

from_json(self: str) → ttnn._ttnn.operations.matmul.MatmulMultiCoreReuseMultiCastProgramConfig

property fuse_batch

Whether to fuse batch dimensions into the matrix dimensions.

When true, batch dimensions are fused with the M dimension, allowing for more efficient processing of batched matrix multiplications. This can improve performance for operations with large batch sizes. Defaults to true.

property fused_activation

Optional fused activation function to apply to the output.

If provided, the specified activation function (e.g., ReLU, GELU) is applied directly during the matmul computation, avoiding the need for a separate activation operation and improving performance.

property in0_block_w

Block width for both input tensors along the K dimension (shared inner dimension).

Determines the data granularity by specifying how many tiles wide each block is along the K dimension for both input_tensor_a and input_tensor_b in multicast operations. Must be a divisor of the K dimension. Smaller blocks can improve load balancing but may increase communication overhead in multicast scenarios.

property out_block_h

Height of output blocks in tiles.

Specifies the block size for output tensor along the M dimension. If not provided, defaults to per_core_M. Must be divisible by out_subblock_h and should be chosen to optimize multicast efficiency and memory usage.

property out_block_w

Width of output blocks in tiles.

Specifies the block size for output tensor along the N dimension. If not provided, defaults to per_core_N. Must be divisible by out_subblock_w and should be chosen to optimize multicast efficiency and memory usage.

property out_subblock_h

Height of output subblocks in tiles.

Controls the granularity of computation within each output block along the M dimension. Must divide evenly into out_block_h. Affects memory usage and compute scheduling in the multicast implementation.

property out_subblock_w

Width of output subblocks in tiles.

Controls the granularity of computation within each output block along the N dimension. Must divide evenly into out_block_w. Affects memory usage and compute scheduling in the multicast implementation.

property per_core_M

Number of output tiles each core processes along the M dimension.

Determines how the M dimension is distributed across cores in the multicast setup. Used as the default value for out_block_h if not explicitly specified.

property per_core_N

Number of output tiles each core processes along the N dimension.

Determines how the N dimension is distributed across cores in the multicast setup. Used as the default value for out_block_w if not explicitly specified.

to_json(self: ttnn._ttnn.operations.matmul.MatmulMultiCoreReuseMultiCastProgramConfig) → str

property transpose_mcast

Whether to transpose the multicast communication pattern.

When true, the multicast direction is transposed, which can be beneficial for certain tensor shapes and grid configurations. This affects how data is broadcast across cores and can impact performance depending on the access patterns.