ttnn.rms_norm
- ttnn.rms_norm(input_tensor: ttnn.Tensor, *, memory_config: ttnn.MemoryConfig = None, epsilon: float, weight: ttnn.Tensor = None, bias: ttnn.Tensor = None, residual_input_tensor: ttnn.Tensor = None, program_config: ttnn.ProgramConfig = None, compute_kernel_config: ttnn.DeviceComputeKernelConfig = None) ttnn.Tensor
-
Computes RMS norm over
input_tensor. See Root Mean Square Layer Normalization for more details.\[\text{RMS_norm}(x, \gamma, \beta, \epsilon) = \frac{x}{\sqrt{\epsilon+\frac{1}{N}\sum_{i=1}^{N}x^{2}}} \cdot \gamma + \beta\]- Where:
-
\(\gamma\) and \(\beta\) are optional scale and shift parameters
\(\epsilon\) is a small constant
- Parameters:
-
input_tensor (ttnn.Tensor) – the input tensor.
- Keyword Arguments:
-
memory_config (ttnn.MemoryConfig, optional) – Memory configuration for the operation. Defaults to None.
epsilon (float) – 1e-12.
weight (ttnn.Tensor, optional) – Defaults to None.
bias (ttnn.Tensor, optional) – Defaults to None.
residual_input_tensor (ttnn.Tensor, optional) – Defaults to None.
program_config (ttnn.ProgramConfig, optional) – Defaults to None.
compute_kernel_config (ttnn.DeviceComputeKernelConfig) – Defaults to None.
- Returns:
-
ttnn.Tensor – the output tensor.
Note
Supported data types and layouts by tensor:
input_tensor dtype
layout
BFLOAT16, FLOAT32, BFLOAT8_B
TILE
residual_input_tensor dtype
layout
BFLOAT16, FLOAT32, BFLOAT8_B
TILE
weight (gamma) and bias (beta) dtype
layout
BFLOAT16, FLOAT32
TILE, ROW_MAJOR
output_tensor dtype
layout
BFLOAT16, FLOAT32, BFLOAT8_B (matching input)
TILE
- Memory Support:
-
Interleaved: DRAM and L1
- Limitations:
-
All input tensors must be on-device and have a rank >= 1.
Unsharded tensors must be interleaved, sharded inputs cannot be height-sharded.
If residual_input_tensor is provided, it must match the
input_tensor’s padded shape.If the weight/bias tensors are TILE layout: last padded dim must match
input_tensor’s last padded dim.If the weight/bias tensors are ROW_MAJOR layout: last padded dim must be TILE_WIDTH.
If the
input_tensoris sharded, theoutputmust also be sharded. In that case, theoutputmemory layout and buffer type must match theinput_tensor’s memory configuration.
Example
h, w = 32, 64 batch_size = 1 input_tensor = ttnn.rand([batch_size, h, w], dtype=ttnn.DataType.BFLOAT16, layout=ttnn.TILE_LAYOUT, device=device) weight = ttnn.rand([w], dtype=ttnn.DataType.BFLOAT16, layout=ttnn.TILE_LAYOUT, device=device) output_tensor = ttnn.rms_norm(input_tensor, weight=weight)