ttnn.transformer.attention_softmax

ttnn.transformer.attention_softmax(input_tensor: ttnn.Tensor, *, memory_config: ttnn.MemoryConfig | None = None, head_size: int | None = None, attention_mask: ttnn.Tensor | None = None, program_config: SoftmaxProgramConfig = SoftmaxDefaultProgramConfig(), causal_mask: bool | None = false) ttnn.Tensor

Divides tensor by the square root of head_size, adds attention_mask (optionally) and computes softmax.

Parameters:

input_tensor (ttnn.Tensor) – the input tensor.

Keyword Arguments:
  • memory_config (ttnn.MemoryConfig, optional) – Memory configuration for the operation. Defaults to None.

  • head_size (int, optional) – Number of heads. Defaults to None.

  • attention_mask (ttnn.Tensor, optional) – Attention Mask. Defaults to None.

  • program_config (SoftmaxProgramConfig) – Program Config of the output tensor. Defaults to SoftmaxDefaultProgramConfig().

  • causal_mask (bool, optional) – the attention mask is causal. Defaults to false.

Returns:

ttnn.Tensor – the output tensor.