ttnn.transformer.attention_softmax

ttnn.transformer.attention_softmax = Operation(python_fully_qualified_name='ttnn.transformer.attention_softmax', function=<ttnn._ttnn.operations.transformer.attention_softmax_t object>, preprocess_golden_function_inputs=<function default_preprocess_golden_function_inputs>, golden_function=<function _golden_function>, postprocess_golden_function_outputs=<function default_postprocess_golden_function_outputs>, is_cpp_operation=True, is_experimental=False)

Divides tensor by the square root of head_size, adds attention_mask (optionally) and computes softmax.

Parameters:

input_tensor (ttnn.Tensor) – the input tensor.

Keyword Arguments:

memory_config (ttnn.MemoryConfig, optional) – Memory configuration for the operation. Defaults to None.
head_size (int, optional) – Number of heads. Defaults to None.
attention_mask (ttnn.Tensor, optional) – Attention Mask. Defaults to None.
program_config (SoftmaxProgramConfig) – Program Config of the output tensor. Defaults to SoftmaxDefaultProgramConfig().
causal_mask (bool, optional) – the attention mask is causal. Defaults to false.

Returns:

ttnn.Tensor – the output tensor.