ttnn.softmax_in_place

ttnn.softmax_in_place(input_tensor: ttnn.Tensor, *, program_config: SoftmaxProgramConfig = SoftmaxDefaultProgramConfig(, compute_kernel_config: DeviceComputeKernelConfig | None, numeric_stable: bool = True) ttnn.Tensor

Computes the softmax function along the last dimension of the input tensor in-place.

This operation modifies the input tensor directly, making it memory-efficient by avoiding additional tensor allocation. The softmax is computed as:

\[\text{softmax}(x_i) = \frac{e^{x_i}}{\sum_{j=1}^{K} e^{x_j}}\]
Parameters:

input_tensor (ttnn.Tensor) – The input tensor to apply softmax to. This tensor is modified in-place.

Keyword Arguments:
  • program_config (SoftmaxProgramConfig, optional) – Program configuration for the operation. Defaults to SoftmaxDefaultProgramConfig().

  • compute_kernel_config (DeviceComputeKernelConfig, optional) – Compute kernel configuration for the operation.

  • numeric_stable (bool, optional) – Whether to use numerically stable softmax computation. Defaults to True.

Returns:

ttnn.Tensor – The same tensor as input with softmax applied in-place.

Note

The tensors support the following data types and layouts:

Dtypes

Layouts

BFLOAT16, FLOAT32, BFLOAT8_B

TILE

The output tensor will be in TILE layout and have the same dtype as the input_tensor

Limitations:
  • The input tensor is modified in-place to save memory. Must already be on the device.

  • For very wide tensors, the operation may fall back to standard softmax if circular buffers would consume more than 90% of L1 memory.

  • Supports both default and sharded multi-core program configurations.

Example

# Create input tensor
shape = [1, 1, 32, 32]
input_tensor = ttnn.rand(shape, dtype=ttnn.DataType.BFLOAT16, layout=ttnn.TILE_LAYOUT, device=device)

# Apply in-place softmax
logger.info(f"Input tensor before softmax in place: {input_tensor}")
ttnn.softmax_in_place(input_tensor)
logger.info(f"Input tensor after softmax in place: {input_tensor}")