ttnn.rms_norm

ttnn.rms_norm(input_tensor: ttnn.Tensor, *, memory_config: ttnn.MemoryConfig = None, epsilon: float, weight: ttnn.Tensor = None, bias: ttnn.Tensor = None, residual_input_tensor: ttnn.Tensor = None, program_config: ttnn.ProgramConfig = None, compute_kernel_config: ttnn.DeviceComputeKernelConfig = None) → ttnn.Tensor

Computes RMS norm over input_tensor. See Root Mean Square Layer Normalization for more details.

\[\text{RMS_norm}(x, \gamma, \beta, \epsilon) = \frac{x}{\sqrt{\epsilon+\frac{1}{N}\sum_{i=1}^{N}x^{2}}} \cdot \gamma + \beta\]

Where:

\(\gamma\) and \(\beta\) are optional scale and shift parameters

\(\epsilon\) is a small constant

Parameters:

input_tensor (ttnn.Tensor) – the input tensor.

Keyword Arguments:

memory_config (ttnn.MemoryConfig, optional) – Memory configuration for the operation. Defaults to None.
epsilon (float) – 1e-12.
weight (ttnn.Tensor, optional) – Defaults to None.
bias (ttnn.Tensor, optional) – Defaults to None.
residual_input_tensor (ttnn.Tensor, optional) – Defaults to None.
program_config (ttnn.ProgramConfig, optional) – Defaults to None.
compute_kernel_config (ttnn.DeviceComputeKernelConfig) – Defaults to None.

Returns:

ttnn.Tensor – the output tensor.

Note

Supported data types and layouts by tensor:

input_tensor
dtype	layout
BFLOAT16, FLOAT32, BFLOAT8_B	TILE

residual_input_tensor
dtype	layout
BFLOAT16, FLOAT32, BFLOAT8_B	TILE

weight (gamma) and bias (beta)
dtype	layout
BFLOAT16, FLOAT32	TILE, ROW_MAJOR

output_tensor
dtype	layout
BFLOAT16, FLOAT32, BFLOAT8_B (matching input)	TILE

Memory Support:

Interleaved: DRAM and L1
Sharded (L1): Width and Block sharded

Limitations:

All input tensors must be on-device and have a rank >= 1.
Unsharded tensors must be interleaved, sharded inputs cannot be height-sharded.
If residual_input_tensor is provided, it must match the input_tensor’s padded shape.
If the weight/bias tensors are TILE layout: last padded dim must match input_tensor’s last padded dim.
If the weight/bias tensors are ROW_MAJOR layout: last padded dim must be TILE_WIDTH.
If the input_tensor is sharded, the output must also be sharded. In that case, the output memory layout and buffer type must match the input_tensor’s memory configuration.

Example

# Setup input tensor and weight
h, w = 32, 64
batch_size = 1
input_tensor = ttnn.rand([batch_size, h, w], dtype=ttnn.DataType.BFLOAT16, layout=ttnn.TILE_LAYOUT, device=device)
weight = ttnn.rand([w], dtype=ttnn.DataType.BFLOAT16, layout=ttnn.TILE_LAYOUT, device=device)

# Apply RMS normalization
output_tensor = ttnn.rms_norm(input_tensor, weight=weight)
logger.info(f"RMS Norm result: {output_tensor}")