ttnn.Conv2dConfig

class ttnn.Conv2dConfig

Bases: pybind11_object

Conv2DConfig is a structure that contains all the Tenstorrent device specific & implementation specific flags for the ttnn.conv1d(), ttnn.conv2d() and ttnn.conv_transpose2d() ops

property act_block_h_override

Controls the size of the activation block height.

The activation matrix is created from the input tensor, and is matrix multiplied with the weights tensor to generate the output tensor. The activation block is the chunk of the activation matrix that is available in L1 Memory, as the activation matrix gets divided among cores, and also can be further subdivided within a core. If set to 0, the the maximum possible size for the activation block is used, which is equal to output_matrix_height_per_core. This leads to large temporary Circular Buffers when the output matrix height is large, leading to OOM.

This flag specifies the height of the activation block to act_block_h_override. This must be a multiple of 32, and must evenly divide the maximum possible size of the activation block.

property act_block_w_div: Reduces the width of the activation block to reduce Circular Buffer sizes and prevent OOM. Valid only for Width Sharded Conv2d. This is only useful when the input channels is greater than 32 * num_cores. For n150, thats 32 * 64 = 2048. This is a divisor of the activation block width. A value of 1 means no reduction, and a value of 2 means the activation block width is halved.

property activation: A string that selects the fused activation function to be applied on the output. Empty string means no activation function. Supported activation function strings are: relu, silu, mish, sigmoid, sigmoid_approx, tanh, log, softplus, gelu, sqrt

property core_grid: Core Grid to be used for sharding the input tensor. This flag is only used when override_sharding_config is set to true.

property deallocate_activation: Boolean that indicates whether the activation tensor should be deallocated after the conv op is done. If true, the activation tensor will be deallocated after the halo micro-op is done. Should not be used if the input to the conv op is used by another op.

property dtype: Specifies the data type of the output tensor. Supports ttnn.float32, ttnn.bfloat16 and ttnn.bfloat8_b.

property enable_act_double_buffer: Doubles the size of the Activation Circular Buffer to allow for double buffering, preventing stalls of the activation reader kernel. This improves performance, but increases memory usage.

property enable_kernel_stride_folding

===================== EXPERIMENTAL FEATURE ======================

Enables tensor folding optimization when strides match kernel dimensions.

This feature is under development and may change without notice. Use with caution in production environments (Issue: #22378).

When enabled, this optimization reshapes tensors as follows:

Input tensor (NHWC format): - From: (N, H, W, IC) - To: (N, H/stride[0], W/stride[1], IC * kernel[0] * kernel[1])
Weight tensor: - From: (OC, IC, kernel[0], kernel[1]) - To: (1, 1, IC * kernel[0] * kernel[1], OC)

Note: This optimization is currently only applied when all of the following conditions are met: 1. The stride dimensions exactly match the kernel dimensions (stride[0] == kernel[0] and stride[1] == kernel[1]) 2. The input tensor is stored in DRAM memory

property enable_split_reader: This uses both the reader & writer cores to carry out the activation reader operation. This is useful when the input tensor is large, and the activation reader is a bottleneck. This is only supported for Height Sharded Conv2D.

property enable_subblock_padding

property enable_weights_double_buffer: Doubles the size of the Weights Circular Buffer to allow for double buffering, preventing stalls of the weights reader kernel. This improves performance, but increases the memory usage of the weights tensor.

property in_place: Enables support for in_place halo. This re-uses the input tensor as the output for halo, overwriting the input tensor. This can be used if the input tensor is not used by any other op after the conv op.

property output_layout: The layout of the output tensor. Can be either ttnn.Layout.TILE or ttnn.Layout.ROW_MAJOR. Conv2D expects it’s input to be in ttnn.Layout.ROW_MAJOR format. If the input is in ttnn.Layout.TILE format, the halo micro-op will convert it to ttnn.Layout.ROW_MAJOR format. So if the next op is a conv op, it is recommended to set this to ttnn.Layout.ROW_MAJOR.

property override_sharding_config: Boolean flag that allows the core grid for the conv op to be specified. If true, then core_grid must also be specified.

property reallocate_halo_output: reallocate_halo_output is a boolean that indicates whether the halo output tensor should be moved to reduce memory fragmentation, before the conv micro-op is called. This is ideally used with deallocate_activation = true, when facing OOM issues in the conv micro-op.

property reshard_if_not_optimal

This flag is used to determine if the input tensor should be resharded if the input tensor current shard config is not optimal. This flag is used only if the input tensor is already sharded. If it is not sharded, the input tensor will anyway be sharded to the optimal config.

If this flag is false, the conv op will try to execute the op with the current shard config. It is recommended to set this flag to true if the input dimensions of the previous conv op and the current op are significantly different, either due to differences in the input vs output channels, or large stride / kernel size / dilation.

property shard_layout: Optional argument that determines the TensorMemoryLayout to be used for the input and output tensor. If this is not specified, the op will try to determine the optimal layout based on it’s own heuristics. Can be either ttnn.TensorMemoryLayout.HEIGHT_SHARDED, ttnn.TensorMemoryLayout.BLOCK_SHARDED or ttnn.TensorMemoryLayout.WIDTH_SHARDED.

property transpose_shards: Determines if the Shard Orientation should be Row Major or Column Major. If true, the shard orientation is Row Major. If false, the shard orientation is Column Major. This is useful for Block Sharded Conv2D when the device core grid is not a square.

property weights_dtype: Optional argument which specifies the data type of the preprocessed weights & bias tensor if the Conv2D op is responsible for preparing the weights. Supports ttnn.bfloat16 and ttnn.bfloat8_b. If unspecified, the preprocessed weights will be in the same format as the input weights. If ttnn.bfloat8_b is selected, then the weights should be passed in as ttnn.bfloat16 or ttnn.float32 in row major format.