ttnn.Conv2dConfig
- class ttnn.Conv2dConfig
-
Bases:
pybind11_objectConv2DConfig is a structure that contains all the Tenstorrent device specific & implementation specific flags for the
ttnn.conv1d(),ttnn.conv2d()andttnn.conv_transpose2d()ops- property act_block_h_override
-
Controls the size of the activation block height.
The activation matrix is created from the input tensor, and is matrix multiplied with the weights tensor to generate the output tensor. The activation block is the chunk of the activation matrix that is available in L1 Memory, as the activation matrix gets divided among cores, and also can be further subdivided within a core. If set to 0, the the maximum possible size for the activation block is used, which is equal to output_matrix_height_per_core. This leads to large temporary Circular Buffers when the output matrix height is large, leading to OOM.
This flag specifies the height of the activation block to act_block_h_override. This must be a multiple of 32, and must evenly divide the maximum possible size of the activation block.
- property act_block_w_div
-
Reduces the width of the activation block to reduce Circular Buffer sizes and prevent OOM. Valid only for Width Sharded Conv2d. This is only useful when the input channels is greater than 32 * num_cores. For n150, thats 32 * 64 = 2048. This is a divisor of the activation block width. A value of 1 means no reduction, and a value of 2 means the activation block width is halved.
- property activation
-
Fused activation function to be applied on the output. None means no activation function. Use ttnn.UnaryWithParam(ttnn.UnaryOpType.RELU) for ReLU activation. Supported activation functions include: RELU, SILU, GELU, SIGMOID, TANH, etc.
- property config_tensors_in_dram
-
Boolean that determines where config tensors should be stored. Setting it to true stores them in DRAM. False stores them in L1_SMALL. Config tensors are used by Conv2D, Pooling and other 2D ops to store how data should be loaded, instead of computing on device RISC-cores.
- property core_grid
-
Core Grid to be used for sharding the input tensor. This flag is only used when override_sharding_config is set to true.
- property deallocate_activation
-
Boolean that indicates whether the activation tensor should be deallocated after the conv op is done. If true, the activation tensor will be deallocated after the halo micro-op is done. Should not be used if the input to the conv op is used by another op. Has no effect if input tensor is in DRAM.
- property enable_act_double_buffer
-
Doubles the size of the Activation Circular Buffer to allow for double buffering, preventing stalls of the activation reader kernel. This improves performance, but increases memory usage.
- property enable_activation_reuse
-
===================== EXPERIMENTAL FEATURE ======================
Enables reusing data between consecutive image rows. It can be enabled for height sharding only and boosts image2column performance, so its meant to be used for reader-bound convolutions.
- property enable_kernel_stride_folding
-
===================== EXPERIMENTAL FEATURE ======================
Enables tensor folding optimization that transforms convolution operations by reshaping tensors and adjusting stride patterns for improved computational efficiency.
- Parameters:
-
enable_kernel_stride_folding (Optional[bool]) –
None (default): Automatic enablement based on optimal conditions
True: Force enable the optimization
False: Disable the optimization
Behavior: When enabled, this optimization reshapes tensors as follows:
Input tensor (NHWC format): - From: (N, H, W, IC) - To: (N, H / stride[0], W / stride[1], IC * stride[0] * stride[1])
Weight tensor: - From: (OC, IC, kernel[0], kernel[1]) - To: (1, 1, IC * (kernel[0] + pad_h) * (kernel[1] + pad_w), OC) where pad_h = kernel[0] % stride[0] and pad_w = kernel[1] % stride[1]
Stride: Becomes (1, 1) after folding
Automatic Enablement: When set to None, automatically enabled when ALL conditions are met (transforms conv2d into Fold + MatMul): 1. Stride equals kernel size in both dimensions (stride == kernel_size) 2. Stride is greater than 1 in at least one dimension 3. No dilation applied (dilation == [1, 1]) 4. Input height and width (after padding) are divisible by respective stride values 5. Input tensor memory: DRAM (all types except bfloat8_b) OR L1 Height-sharded (all types)
Manual Enablement: Particularly beneficial for unaligned input channels (e.g., small channel counts like 3 RGB channels).
Requirements when forcing enable_kernel_stride_folding=True: - Stride ≤ kernel size in both dimensions - Input tensor supports folding (DRAM except bfloat8_b, or L1 Height-sharded) - Input dimensions after padding are divisible by stride values
Example: For small channel counts (like 3 RGB channels) with stride=2x2, kernel=7x7: - Transforms 3 channels → 12 channels, stride 2x2 → 1x1 - Reduces required padding for alignment (3→12 uses alignment more efficiently) - Kernel size reduces to kernel/stride (e.g., 7x7 kernel → 4x4 kernel with padding)
Note: The weight tensor padding is applied implicitly and not passed via the padding argument.
- property enable_weights_double_buffer
-
Doubles the size of the Weights Circular Buffer to allow for double buffering, preventing stalls of the weights reader kernel. This improves performance, but increases the memory usage of the weights tensor.
- property force_split_reader
-
===================== EXPERIMENTAL FEATURE ======================
This uses both the reader & writer cores to carry out the activation reader operation. This is useful when the input tensor is large, and the activation reader is a bottleneck. This is only supported for Height Sharded Conv2D. Setting this overrides the split reader heuristic.
- property full_inner_dim
-
Applies only to block sharded layout. By default inner dim of activation matrix will be sliced by kernel_h. If L1 constraints allowed it we can use full inner dim. This will increase perf, but it will take more L1 space.
- property in_place
-
Enables support for in_place halo. This re-uses the input tensor as the output for halo, overwriting the input tensor. This can be used if the input tensor is not used by any other op after the conv op.
- property output_layout
-
The layout of the output tensor. Can be either
ttnn.Layout.TILEorttnn.Layout.ROW_MAJOR. Conv2D expects it’s input to be inttnn.Layout.ROW_MAJORformat. If the input is inttnn.Layout.TILEformat, the halo micro-op will convert it tottnn.Layout.ROW_MAJORformat. So if the next op is a conv op, it is recommended to set this tottnn.Layout.ROW_MAJOR.
- property override_sharding_config
-
Boolean flag that allows the core grid for the conv op to be specified. If true, then core_grid must also be specified.
- property reallocate_halo_output
-
reallocate_halo_output is a boolean that indicates whether the halo output tensor should be moved to reduce memory fragmentation, before the conv micro-op is called. This is ideally used with deallocate_activation = true, when facing OOM issues in the conv micro-op.
- property reshard_if_not_optimal
-
This flag is used to determine if the input tensor should be resharded if the input tensor current shard config is not optimal. This flag is used only if the input tensor is already sharded. If it is not sharded, the input tensor will anyway be sharded to the optimal config.
If this flag is false, the conv op will try to execute the op with the current shard config. It is recommended to set this flag to true if the input dimensions of the previous conv op and the current op are significantly different, either due to differences in the input vs output channels, or large stride / kernel size / dilation.
- property shard_layout
-
Optional argument that determines the TensorMemoryLayout to be used for the input and output tensor. If this is not specified, the op will try to determine the optimal layout based on it’s own heuristics. Can be either
ttnn.TensorMemoryLayout.HEIGHT_SHARDED,ttnn.TensorMemoryLayout.BLOCK_SHARDEDorttnn.TensorMemoryLayout.WIDTH_SHARDED.
- property transpose_shards
-
Determines if the Shard Orientation should be Row Major or Column Major. If true, the shard orientation is Row Major. If false, the shard orientation is Column Major. This is useful for Block Sharded Conv2D when the device core grid is not a square.
- property weights_dtype
-
Optional argument which specifies the data type of the preprocessed weights & bias tensor if the Conv2D op is responsible for preparing the weights. Supports ttnn.bfloat16 and ttnn.bfloat8_b. If unspecified, the preprocessed weights will be in the same format as the input weights. If ttnn.bfloat8_b is selected, then the weights should be passed in as ttnn.bfloat16 or ttnn.float32 in row major format.