ttnn.layer_norm

ttnn.layer_norm(input_tensor: ttnn.Tensor, *, memory_config: ttnn.MemoryConfig = None, epsilon: float, weight: ttnn.Tensor = None, bias: ttnn.Tensor = None, residual_input_tensor: ttnn.Tensor = None, program_config: ttnn.ProgramConfig = None) → ttnn.Tensor

Computes layer norm over input_tensor. See Layer Normalization for more details.

\[\text{layer_norm}(x, \gamma, \beta, \epsilon) = \frac{x - \mu}{\sqrt{\sigma^2 + \epsilon}} \cdot \gamma + \beta\]

Where:

\(\mu\) is the mean of the input tensor. This is computed over the last dimension of the input tensor (W).

\(\sigma^2\) is the variance of the input tensor. This is computed over the last dimension of the input tensor (W) and is biased.

\(\gamma\) and \(\beta\) are the learnable scale and shift parameters, respectively

\(\epsilon\) is a small constant

Parameters:

input_tensor (ttnn.Tensor) – the input tensor.

Keyword Arguments:

memory_config (ttnn.MemoryConfig, optional) – Memory configuration for the operation. Defaults to None.
epsilon (float) – 1e-12.
weight (ttnn.Tensor, optional) – Defaults to None.
bias (ttnn.Tensor, optional) – Defaults to None.
residual_input_tensor (ttnn.Tensor, optional) – Defaults to None.
program_config (ttnn.ProgramConfig, optional) – Defaults to None.
compute_kernel_config (ttnn.DeviceComputeKernelConfig) –

Returns:

ttnn.Tensor – the output tensor.

Note

Supported data types and layouts by tensor:

input_tensor
dtype	layout
BFLOAT16, FLOAT32, BFLOAT8_B	TILE

residual_input_tensor
dtype	layout
BFLOAT16, FLOAT32, BFLOAT8_B	TILE

weight (gamma) and bias (beta)
dtype	layout
BFLOAT16, FLOAT32	TILE, ROW_MAJOR

output_tensor
dtype	layout
BFLOAT16, FLOAT32, BFLOAT8_B	TILE

Output tensor will be in TILE layout and have the same dtype as the input_tensor

Memory Support:

Interleaved: DRAM and L1
Sharded (L1): Width and Block sharded

Limitations:

All input tensors must be on-device and have a rank >= 1.
Unsharded tensors must be interleaved, sharded tensors cannot be height sharded.
If the input is sharded, the output and residual_input_tensor must have identical shard spec and memory config.
If residual_input_tensor is provided, it must match the input’s padded shape.
If TILE: weight and bias padded dim must match input’s last padded dim; padded height must equal TILE_HEIGHT (i.e. 32).
If ROW_MAJOR: weight and bias last padded dim must be TILE_WIDTH and the stick count must align with the input width.

Example

# Create input tensor
input_tensor = ttnn.rand([32, 64], dtype=ttnn.DataType.BFLOAT16, layout=ttnn.TILE_LAYOUT, device=device)

# Apply layer normalization
output_tensor = ttnn.layer_norm(input_tensor)
logger.info(f"Layer Norm result: {output_tensor}")