ttnn.softmax_in_place
- ttnn.softmax_in_place(input_tensor: ttnn.Tensor, *, program_config: SoftmaxProgramConfig = SoftmaxDefaultProgramConfig(, compute_kernel_config: DeviceComputeKernelConfig | None, numeric_stable: bool = True) ttnn.Tensor
-
Computes the softmax function along the last dimension of the input tensor in-place.
This operation modifies the input tensor directly, making it memory-efficient by avoiding additional tensor allocation. The softmax is computed as:
\[\text{softmax}(x_i) = \frac{e^{x_i}}{\sum_{j=1}^{K} e^{x_j}}\]- Parameters:
-
input_tensor (ttnn.Tensor) – The input tensor to apply softmax to. This tensor is modified in-place.
- Keyword Arguments:
-
program_config (SoftmaxProgramConfig, optional) – Program configuration for the operation. Defaults to SoftmaxDefaultProgramConfig().
compute_kernel_config (DeviceComputeKernelConfig, optional) – Compute kernel configuration for the operation.
numeric_stable (bool, optional) – Whether to use numerically stable softmax computation. Defaults to True.
- Returns:
-
ttnn.Tensor – The same tensor as input with softmax applied in-place.
Note
The tensors support the following data types and layouts:
Dtypes
Layouts
BFLOAT16, FLOAT32, BFLOAT8_B
TILE
The output tensor will be in TILE layout and have the same dtype as the
input_tensor- Limitations:
-
The input tensor is modified in-place to save memory. Must already be on the device.
For very wide tensors, the operation may fall back to standard softmax if circular buffers would consume more than 90% of L1 memory.
Supports both default and sharded multi-core program configurations.
Example
# Create input tensor shape = [1, 1, 32, 32] input_tensor = ttnn.rand(shape, dtype=ttnn.DataType.BFLOAT16, layout=ttnn.TILE_LAYOUT, device=device) # Apply in-place softmax logger.info(f"Input tensor before softmax in place: {input_tensor}") ttnn.softmax_in_place(input_tensor) logger.info(f"Input tensor after softmax in place: {input_tensor}")