ttnn.layer_norm

ttnn.layer_norm(input_tensor: ttnn.Tensor, *, memory_config: ttnn.MemoryConfig = None, epsilon: float, weight: ttnn.Tensor = None, bias: ttnn.Tensor = None, residual_input_tensor: ttnn.Tensor = None, program_config: ttnn.ProgramConfig = None) ttnn.Tensor

Computes layer norm over input_tensor. See Layer Normalization for more details.

\[\text{layer_norm}(x, \gamma, \beta, \epsilon) = \frac{x - \mu}{\sqrt{\sigma^2 + \epsilon}} \cdot \gamma + \beta\]
Where:
  • \(\mu\) is the mean of the input tensor. This is computed over the last dimension of the input tensor (W).

  • \(\sigma^2\) is the variance of the input tensor. This is computed over the last dimension of the input tensor (W) and is biased.

  • \(\gamma\) and \(\beta\) are the learnable scale and shift parameters, respectively

  • \(\epsilon\) is a small constant

Parameters:

input_tensor (ttnn.Tensor) – the input tensor.

Keyword Arguments:
  • memory_config (ttnn.MemoryConfig, optional) – Memory configuration for the operation. Defaults to None.

  • epsilon (float) – 1e-12.

  • weight (ttnn.Tensor, optional) – Defaults to None.

  • bias (ttnn.Tensor, optional) – Defaults to None.

  • residual_input_tensor (ttnn.Tensor, optional) – Defaults to None.

  • program_config (ttnn.ProgramConfig, optional) – Defaults to None.

  • compute_kernel_config (ttnn.DeviceComputeKernelConfig) –

Returns:

ttnn.Tensor – the output tensor.

Note

Supported data types and layouts by tensor:

input_tensor

dtype

layout

BFLOAT16, FLOAT32, BFLOAT8_B

TILE

residual_input_tensor

dtype

layout

BFLOAT16, FLOAT32, BFLOAT8_B

TILE

weight (gamma) and bias (beta)

dtype

layout

BFLOAT16, FLOAT32

TILE, ROW_MAJOR

output_tensor

dtype

layout

BFLOAT16, FLOAT32, BFLOAT8_B

TILE

Output tensor will be in TILE layout and have the same dtype as the input_tensor

Memory Support:
  • Interleaved: DRAM and L1

  • Sharded (L1): Width and Block sharded

Limitations:
  • All input tensors must be on-device and have a rank >= 1.

  • Unsharded tensors must be interleaved, sharded tensors cannot be height sharded.

  • If the input is sharded, the output and residual_input_tensor must have identical shard spec and memory config.

  • If residual_input_tensor is provided, it must match the input’s padded shape.

  • If TILE: weight and bias padded dim must match input’s last padded dim; padded height must equal TILE_HEIGHT (i.e. 32).

  • If ROW_MAJOR: weight and bias last padded dim must be TILE_WIDTH and the stick count must align with the input width.

Example

# Create input tensor
input_tensor = ttnn.rand([32, 64], dtype=ttnn.DataType.BFLOAT16, layout=ttnn.TILE_LAYOUT, device=device)

# Apply layer normalization
output_tensor = ttnn.layer_norm(input_tensor)
logger.info(f"Layer Norm result: {output_tensor}")