ttnn.SoftmaxShardedMultiCoreProgramConfig
- class ttnn.SoftmaxShardedMultiCoreProgramConfig
-
Bases:
objectMulti-core sharded program configuration for Softmax operations.
This configuration is designed for sharded tensors and enables multi-core execution with customizable block sizes and compute grid configuration. It provides fine-grained control over the computation parameters for optimal performance on sharded data.
- Parameters:
-
compute_with_storage_grid_size (CoreCoord) – The grid size for compute cores with storage capability.
subblock_w (int) – Width of sub-blocks for computation. Must be divisible by the tensor’s width.
block_h (int) – Height of blocks for processing. Controls the vertical granularity of computation.
block_w (int) – Width of blocks for processing. Controls the horizontal granularity of computation. Can be modified after creation.
Note
This configuration is specifically designed for sharded tensors.
Block dimensions must be compatible with the tensor’s shard specification.
Proper block sizing can significantly impact performance.
Example
# Setup input tensor and mask input_shape = (1, 1, 32, 32) attention_mask_t = ttnn.rand(input_shape, dtype=ttnn.bfloat8_b, layout=ttnn.TILE_LAYOUT, device=device) input_tensor = ttnn.rand(input_shape, dtype=ttnn.bfloat8_b, layout=ttnn.TILE_LAYOUT, device=device) # Apply in-place scale mask softmax tt_output = ttnn.scale_mask_softmax_in_place( input_tensor=input_tensor, scale=1.0, mask=attention_mask_t, ) logger.info(f"Scale Mask Softmax In Place result: {tt_output}") compute_grid_size = device.compute_with_storage_grid_size() fuse_head = 2 batch = compute_grid_size.x num_cores_r = compute_grid_size.y input_shape = (batch, num_cores_r, fuse_head * 384, 768) attention_mask_t = ttnn.rand((batch, 1, 384, 768), dtype=ttnn.bfloat8_b, layout=ttnn.TILE_LAYOUT, device=device) input_tensor = ttnn.rand(input_shape, dtype=ttnn.bfloat8_b, layout=ttnn.TILE_LAYOUT, device=device) # Shard the input tensor grid_coord = ttnn.CoreCoord(compute_grid_size.x - 1, compute_grid_size.y - 1) shard_grid = ttnn.CoreRangeSet({ttnn.CoreRange(ttnn.CoreCoord(0, 0), grid_coord)}) shard_shape = [fuse_head * 384, 768] shard_spec = ttnn.ShardSpec(shard_grid, shard_shape, ttnn.ShardOrientation.ROW_MAJOR) sharded_mem_config = ttnn.MemoryConfig(ttnn.TensorMemoryLayout.HEIGHT_SHARDED, ttnn.BufferType.L1, shard_spec) input_sharded = ttnn.to_memory_config(input_tensor, sharded_mem_config) # Create sharded program config program_config = ttnn.SoftmaxShardedMultiCoreProgramConfig( compute_with_storage_grid_size=compute_grid_size, subblock_w=8, block_h=12 * fuse_head, block_w=24, ) tt_output = ttnn.scale_mask_softmax_in_place( input_tensor=input_sharded, scale=1.0, mask=attention_mask_t, program_config=program_config, ) logger.info(f"Scale Mask Softmax In Place result: {tt_output}")
- property block_w
-
(self) -> int