ttnn.softmax
- ttnn.softmax(input_tensor: ttnn.Tensor, dim: int = -1 (last dimension, *, memory_config: ttnn.MemoryConfig | None, compute_kernel_config: DeviceComputeKernelConfig | None, numeric_stable: bool = True) ttnn.Tensor
-
Computes the softmax function over the specified dimension of the input tensor.
The softmax function is defined as:
\[\text{softmax}(x_i) = \frac{e^{x_i}}{\sum_{j=1}^{K} e^{x_j}}\]- Parameters:
-
input_tensor (ttnn.Tensor) – The input tensor to apply softmax to. Must be on the device.
dim (int, optional) – The dimension along which to compute softmax. Defaults to -1 (last dimension).
- Keyword Arguments:
-
memory_config (ttnn.MemoryConfig, optional) – Memory configuration for the output tensor. If not provided, inherits from input tensor.
compute_kernel_config (DeviceComputeKernelConfig, optional) – Compute kernel configuration for the operation.
numeric_stable (bool, optional) – Whether to use numerically stable softmax computation. Defaults to True.
- Returns:
-
ttnn.Tensor – Output tensor with softmax applied along the specified dimension.
Note
The tensors support the following data types and layouts:
Dtypes
Layouts
BFLOAT16, FLOAT32, BFLOAT8_B
TILE
The output tensor will be in TILE layout and have the same dtype as the
input_tensor- Memory Support:
-
Interleaved: DRAM and L1
Sharded (L1): Height sharded
- Limitations:
-
All tensors must be on-device, interleaved, and tile layout.
Using the attention-optimized kernels requires a 4D input tensor and reducing on the last dimension.
Example
# Create input tensor tensor = ttnn.rand((1, 1, 32, 64), dtype=ttnn.bfloat16, layout=ttnn.TILE_LAYOUT, device=device) # Apply softmax on dim=-1 output_tensor = ttnn.softmax(tensor, dim=-1) logger.info(f"Softmax result: {output_tensor}")