ttnn.cumsum

ttnn.cumsum(input: ttnn.Tensor, dim: int, *, dtype: ttnn.DataType | None, reverse_order: bool, optional, default False, out: ttnn.Tensor | None) ttnn.Tensor

Returns cumulative sum of input along dimension dim For a given input of size N, the output will also contain N elements and be such that:

\[\mathrm{{output}}_i = \mathrm{{input}}_1 + \mathrm{{input}}_2 + \cdots + \mathrm{{input}}_i\]
Parameters:
  • input (ttnn.Tensor) – input tensor. Must be on the device.

  • dim (int) – dimension along which to compute cumulative sum

Keyword Arguments:
  • dtype (ttnn.DataType, optional) – desired output type. If specified then input tensor will be cast to dtype before processing.

  • reverse_order (bool, optional, default False) – whether to perform accumulation from the end to the beginning of accumulation axis.

  • out (ttnn.Tensor, optional) – preallocated output. If specified, out must have same shape as input, and must be on the same device.

Returns:

ttnn.Tensor – the output tensor.

Note

If both dtype and output are specified then output.dtype must match dtype.

Supported dtypes, layout, ranks and dim values:

Dtypes

Layouts

Ranks

dim

BFLOAT16, FLOAT32, INT32, UINT32

TILE

1, 2, 3, 4, 5

-rank <= dim < rank

Memory Support:
  • Interleaved: DRAM and L1

Limitations:
  • Preallocated output must have the same shape as the input

Example

# Create tensor
tensor_input = ttnn.rand((2, 3, 4), device=device)

# Apply ttnn.cumsum() on dim=0
tensor_output = ttnn.cumsum(tensor_input, dim=0)

# With preallocated output and dtype
preallocated_output = ttnn.rand([2, 3, 4], dtype=ttnn.bfloat16, device=device)

tensor_output = ttnn.cumsum(tensor_input, dim=0, dtype=ttnn.bfloat16, out=preallocated_output)