ttnn.MatmulMultiCoreReuseProgramConfig

class ttnn.MatmulMultiCoreReuseProgramConfig

Bases: object

Configuration class for multi-core reusable matmul operations.

This program config is used for basic multi-core matmul operations that can reuse intermediate results across cores for better performance.

property compute_with_storage_grid_size

Grid size for compute cores with storage capability.

Specifies the 2D grid of cores (x, y) that will be used for computation and have access to storage. This determines how the computation is distributed across cores.

from_json
property in0_block_w

Block width for both input tensors along the K dimension (shared inner dimension).

This parameter determines the granularity of data blocks by specifying how many tiles wide each block is along the K dimension. It affects the size of data chunks processed together and impacts memory usage and compute efficiency for both tensors. Must be a divisor of the K dimension. Suggested to be a multiple of 32 for tile alignment.

property out_subblock_h

Height of output subblocks in tiles.

Controls the granularity of computation within each output block along the M dimension. Smaller values can reduce memory usage but may decrease efficiency. Must divide evenly into the output block height.

property out_subblock_w

Width of output subblocks in tiles.

Controls the granularity of computation within each output block along the N dimension. Smaller values can reduce memory usage but may decrease efficiency. Must divide evenly into the output block width.

property per_core_M

Number of output tiles each core processes along the M dimension.

Determines how the M dimension of the output is distributed across cores. Larger values mean fewer cores are used but each core does more work. Must be chosen such that (total_M / per_core_M) cores are available.

property per_core_N

Number of output tiles each core processes along the N dimension.

Determines how the N dimension of the output is distributed across cores. Larger values mean fewer cores are used but each core does more work. Must be chosen such that (total_N / per_core_N) cores are available.

to_json