ttnn.layer_norm
- ttnn.layer_norm(input_tensor: ttnn.Tensor, *, memory_config: ttnn.MemoryConfig = None, epsilon: float, weight: ttnn.Tensor = None, bias: ttnn.Tensor = None, residual_input_tensor: ttnn.Tensor = None, program_config: ttnn.ProgramConfig = None) ttnn.Tensor
-
Computes layer norm over
input_tensor. See Layer Normalization for more details.\[\text{layer_norm}(x, \gamma, \beta, \epsilon) = \frac{x - \mu}{\sqrt{\sigma^2 + \epsilon}} \cdot \gamma + \beta\]- Where:
-
\(\mu\) is the mean of the input tensor. This is computed over the last dimension of the input tensor (W).
\(\sigma^2\) is the variance of the input tensor. This is computed over the last dimension of the input tensor (W) and is biased.
\(\gamma\) and \(\beta\) are the learnable scale and shift parameters, respectively
\(\epsilon\) is a small constant
- Parameters:
-
input_tensor (ttnn.Tensor) – the input tensor.
- Keyword Arguments:
-
memory_config (ttnn.MemoryConfig, optional) – Memory configuration for the operation. Defaults to None.
epsilon (float) – 1e-12.
weight (ttnn.Tensor, optional) – Defaults to None.
bias (ttnn.Tensor, optional) – Defaults to None.
residual_input_tensor (ttnn.Tensor, optional) – Defaults to None.
program_config (ttnn.ProgramConfig, optional) – Defaults to None.
compute_kernel_config (ttnn.DeviceComputeKernelConfig) –
- Returns:
-
ttnn.Tensor – the output tensor.
Note
Supported data types and layouts by tensor:
input_tensor dtype
layout
BFLOAT16, FLOAT32, BFLOAT8_B
TILE
residual_input_tensor dtype
layout
BFLOAT16, FLOAT32, BFLOAT8_B
TILE
weight (gamma) and bias (beta) dtype
layout
BFLOAT16, FLOAT32
TILE, ROW_MAJOR
output_tensor dtype
layout
BFLOAT16, FLOAT32, BFLOAT8_B
TILE
Output tensor will be in TILE layout and have the same dtype as the
input_tensor- Memory Support:
-
Interleaved: DRAM and L1
Sharded (L1): Width and Block sharded
- Limitations:
-
All input tensors must be on-device and have a rank >= 1.
Unsharded tensors must be interleaved, sharded tensors cannot be height sharded.
If the input is sharded, the
outputandresidual_input_tensormust have identical shard spec and memory config.If residual_input_tensor is provided, it must match the input’s padded shape.
If TILE: weight and bias padded dim must match input’s last padded dim; padded height must equal TILE_HEIGHT (i.e. 32).
If ROW_MAJOR: weight and bias last padded dim must be TILE_WIDTH and the stick count must align with the input width.
Example
# Create input tensor input_tensor = ttnn.rand([32, 64], dtype=ttnn.DataType.BFLOAT16, layout=ttnn.TILE_LAYOUT, device=device) # Apply layer normalization output_tensor = ttnn.layer_norm(input_tensor) logger.info(f"Layer Norm result: {output_tensor}")