ttnn.sum

ttnn.sum(input_tensor: ttnn.Tensor, dim: number or tuple, keepdim: bool = False, *, memory_config: ttnn.MemoryConfig = None, compute_kernel_config: ttnn.ComputeKernelConfig = None, scalar: float = 1.0, correction: bool | None, sub_core_grids: ttnn.CoreRangeSet = None) → ttnn.Tensor

Computes the sum of the input tensor input_tensor along the specified dimension(s) dim. If no dimension is provided, sum is computed over all dimensions yielding a single value.

Parameters:

input_tensor (ttnn.Tensor) – the input tensor. Must be on the device.
dim (number or tuple) – dimension value(s) to reduce over.
keepdim (bool, optional) – keep the original dimension size(s). Defaults to False.

Keyword Arguments:

memory_config (ttnn.MemoryConfig, optional) – Memory configuration for the operation. Defaults to None.
compute_kernel_config (ttnn.ComputeKernelConfig, optional) – Compute kernel configuration for the operation. Defaults to None.
scalar (float, optional) – A scaling factor to be applied to the input tensor. Defaults to 1.0.
correction (bool, optional) – Deprecated. This parameter is deprecated and will be removed in a future release. It has no impact on the result.
sub_core_grids (ttnn.CoreRangeSet, optional) – Subcore grids to use for the operation. Defaults to None, which will use all cores.

Returns:

ttnn.Tensor – the output tensor.

Note

The input tensor supports the following data types and layouts:

Input Tensor
dtype	layout
FLOAT32	ROW_MAJOR, TILE
BFLOAT16	ROW_MAJOR, TILE
BFLOAT8_B	TILE

The output tensor will be in TILE layout and have the same dtype as the input_tensor.

Memory Support:

Interleaved: DRAM and L1
Sharded (L1): Width, Height, and ND sharding
Output sharding will mirror the input

Example

# Create tensor
tensor_input = ttnn.rand((2, 3, 4), device=device)

# Apply ttnn.sum() on dim=2
tensor_output = ttnn.sum(tensor_input, dim=2)
logger.info(f"Sum result: {tensor_output}")