ttnn.rms_norm

ttnn.rms_norm(input_tensor: ttnn.Tensor, *, memory_config: ttnn.MemoryConfig = None, epsilon: float, weight: ttnn.Tensor = None, bias: ttnn.Tensor = None, residual_input_tensor: ttnn.Tensor = None, program_config: ttnn.ProgramConfig = None, compute_kernel_config: ttnn.DeviceComputeKernelConfig = None) ttnn.Tensor

Computes RMS norm over input_tensor. See Root Mean Square Layer Normalization for more details.

\[\text{RMS_norm}(x, \gamma, \beta, \epsilon) = \frac{x}{\sqrt{\epsilon+\frac{1}{N}\sum_{i=1}^{N}x^{2}}} \cdot \gamma + \beta\]
Where:
  • \(\gamma\) and \(\beta\) are optional scale and shift parameters

  • \(\epsilon\) is a small constant

Parameters:

input_tensor (ttnn.Tensor) – the input tensor.

Keyword Arguments:
  • memory_config (ttnn.MemoryConfig, optional) – Memory configuration for the operation. Defaults to None.

  • epsilon (float) – 1e-12.

  • weight (ttnn.Tensor, optional) – Defaults to None.

  • bias (ttnn.Tensor, optional) – Defaults to None.

  • residual_input_tensor (ttnn.Tensor, optional) – Defaults to None.

  • program_config (ttnn.ProgramConfig, optional) – Defaults to None.

  • compute_kernel_config (ttnn.DeviceComputeKernelConfig) – Defaults to None.

Returns:

ttnn.Tensor – the output tensor.

Note

Supported data types and layouts by tensor:

input_tensor

dtype

layout

BFLOAT16, FLOAT32, BFLOAT8_B

TILE

residual_input_tensor

dtype

layout

BFLOAT16, FLOAT32, BFLOAT8_B

TILE

weight (gamma) and bias (beta)

dtype

layout

BFLOAT16, FLOAT32

TILE, ROW_MAJOR

output_tensor

dtype

layout

BFLOAT16, FLOAT32, BFLOAT8_B (matching input)

TILE

Memory Support:
  • Interleaved: DRAM and L1

Limitations:
  • All input tensors must be on-device and have a rank >= 1.

  • Unsharded tensors must be interleaved, sharded inputs cannot be height-sharded.

  • If residual_input_tensor is provided, it must match the input_tensor’s padded shape.

  • If the weight/bias tensors are TILE layout: last padded dim must match input_tensor’s last padded dim.

  • If the weight/bias tensors are ROW_MAJOR layout: last padded dim must be TILE_WIDTH.

  • If the input_tensor is sharded, the output must also be sharded. In that case, the output memory layout and buffer type must match the input_tensor’s memory configuration.

Example

h, w = 32, 64
batch_size = 1

input_tensor = ttnn.rand([batch_size, h, w], dtype=ttnn.DataType.BFLOAT16, layout=ttnn.TILE_LAYOUT, device=device)
weight = ttnn.rand([w], dtype=ttnn.DataType.BFLOAT16, layout=ttnn.TILE_LAYOUT, device=device)
output_tensor = ttnn.rms_norm(input_tensor, weight=weight)