ttnn.topk

ttnn.topk(input_tensor: ttnn.Tensor, k: number, dim: number, largest: bool = True, sorted: bool = True, *, memory_config: ttnn.MemoryConfig = None, output_tensor: ttnn.Tensor = None, sub_core_grids: ttnn.CoreRangeSet = None, indices_tensor: ttnn.Tensor = None) List of ttnn.Tensor

Returns the k largest or k smallest elements of the input_tensor along a given dimension dim.

If dim is not provided, the last dimension of the input_tensor is used.

If largest is True, the k largest elements are returned. Otherwise, the k smallest elements are returned.

The boolean option sorted if True, will make sure that the returned k elements are sorted.

Equivalent PyTorch code:

return torch.topk(input_tensor, k, dim=dim, largest=largest, sorted=sorted, *, out=None)
Parameters:
  • input_tensor (ttnn.Tensor) – the input tensor.

  • k (number) – the number of top elements to look for.

  • dim (number) – the dimension to reduce.

  • largest (bool) – whether to return the largest or the smallest elements. Defaults to True.

  • sorted (bool) – whether to return the elements in sorted order. Defaults to True.

Keyword Arguments:
  • memory_config (ttnn.MemoryConfig, optional) – Memory configuration for the operation. Defaults to None.

  • output_tensor (ttnn.Tensor, optional) – Preallocated output tensor. Defaults to None.

  • sub_core_grids (ttnn.CoreRangeSet, optional) – Core range set to run the operation on. Defaults to None.

  • indices_tensor (ttnn.Tensor, optional) – Preallocated indices tensor. Defaults to None.

Returns:

List of ttnn.Tensor – the output tensor.

Note

The input_tensor supports the following data type and layout:

input_tensor

dtype

layout

BFLOAT8, BFLOAT16

TILE

index_tensor

dtype

layout

UINT16, UINT32

TILE

The output_value_tensor will have the same data type as input_tensor and will be in TILE layout. The output_index_tensor will be UINT16 and will be in TILE layout.

Memory Support:
  • Interleaved: DRAM and L1

Limitations:
  • Inputs must be located on-device.

  • The op fundamentally operates on 4D tensors with shape [N, C, H, W], and with dim of -1. The tensor will be manipulated as needed when this is not the case, and restored afterwards.

  • For input_tensor, N*C*H must be a multiple of 32

  • W is ideally ≥64. If this is not the case the op will pad the tensor to satisfy this constraint.

  • The width of input_tensor along dim should be a multiple of tile width, and will be padded to the nearest multiple of tile width if needed.

  • The padding is currently only supported for bfloat16, float32, int32, and uint32.

  • To enable multicore execution, the width of input_tensor along dim must be ≥8192 and <65536, and k must be ≤64.

  • All shape validations are performed on padded shapes.

  • Sharded output memory configs are not supported for this operation.

Example

# Create tensor
tensor_input = ttnn.rand([1, 1, 32, 64], device=device)

# Apply ttnn.topk() to get top 3 values along dim=1
values, indices = ttnn.topk(tensor_input, k=32, dim=-1, largest=True, sorted=True)
logger.info(f"Topk values: {values}")
logger.info(f"Topk indices: {indices}")