ttnn.softmax

ttnn.softmax(input_tensor: ttnn.Tensor, dim: int = -1 (last dimension, *, memory_config: ttnn.MemoryConfig | None, compute_kernel_config: DeviceComputeKernelConfig | None, numeric_stable: bool = True) ttnn.Tensor

Computes the softmax function over the specified dimension of the input tensor.

The softmax function is defined as:

\[\text{softmax}(x_i) = \frac{e^{x_i}}{\sum_{j=1}^{K} e^{x_j}}\]
Parameters:
  • input_tensor (ttnn.Tensor) – The input tensor to apply softmax to. Must be on the device.

  • dim (int, optional) – The dimension along which to compute softmax. Defaults to -1 (last dimension).

Keyword Arguments:
  • memory_config (ttnn.MemoryConfig, optional) – Memory configuration for the output tensor. If not provided, inherits from input tensor.

  • compute_kernel_config (DeviceComputeKernelConfig, optional) – Compute kernel configuration for the operation.

  • numeric_stable (bool, optional) – Whether to use numerically stable softmax computation. Defaults to True.

Returns:

ttnn.Tensor – Output tensor with softmax applied along the specified dimension.

Note

The tensors support the following data types and layouts:

Dtypes

Layouts

BFLOAT16, FLOAT32, BFLOAT8_B

TILE

The output tensor will be in TILE layout and have the same dtype as the input_tensor

Memory Support:
  • Interleaved: DRAM and L1

  • Sharded (L1): Height sharded

Limitations:
  • All tensors must be on-device, interleaved, and tile layout.

  • Using the attention-optimized kernels requires a 4D input tensor and reducing on the last dimension.

Example

# Create input tensor
tensor = ttnn.rand((1, 1, 32, 64), dtype=ttnn.bfloat16, layout=ttnn.TILE_LAYOUT, device=device)

# Apply softmax on dim=-1
output_tensor = ttnn.softmax(tensor, dim=-1)
logger.info(f"Softmax result: {output_tensor}")