ttnn.topk

ttnn.topk = Operation(python_fully_qualified_name='ttnn.topk', function=<ttnn._ttnn.operations.reduction.topk_t object>, preprocess_golden_function_inputs=<function default_preprocess_golden_function_inputs>, golden_function=<function _create_golden_function_topk.<locals>.golden_function>, postprocess_golden_function_outputs=<function default_postprocess_golden_function_outputs>, is_cpp_operation=True, is_experimental=False)

ttnn.topk(input_tensor: ttnn.Tensor, k: int, dim: int, largest: bool, sorted: bool, out: Optional[Tuple[ttnn.Tensor, ttnn.Tensor]] = None, memory_config: Optional[ttnn.MemoryConfig] = None, sub_core_grids: Optional[ttnn.CoreRangeSet] = None, indices_tensor: Optional[ttnn.Tensor] = None) -> Tuple[ttnn.Tensor, ttnn.Tensor]

Returns the k largest or k smallest elements of the input_tensor along a given dimension dim.

If dim is not provided, the last dimension of the input_tensor is used.

If largest is True, the k largest elements are returned. Otherwise, the k smallest elements are returned.

The boolean option sorted if True, will make sure that the returned k elements are sorted.

Equivalent PyTorch code:

return torch.topk(input_tensor, k, dim=dim, largest=largest, sorted=sorted, *, out=None)
Parameters:
  • input_tensor (ttnn.Tensor) – the input tensor.

  • k (number) – the number of top elements to look for.

  • dim (number) – the dimension to reduce.

  • largest (bool) – whether to return the largest or the smallest elements. Defaults to False.

  • sorted (bool) – whether to return the elements in sorted order. Defaults to False.

Keyword Arguments:
  • memory_config (ttnn.MemoryConfig, optional) – Memory configuration for the operation. Defaults to None.

  • output_tensor (ttnn.Tensor, optional) – Preallocated output tensor. Defaults to None.

  • sub_core_grids (ttnn.CoreRangeSet, optional) – Core range set to run the operation on. Defaults to None.

  • indices_tensor (ttnn.Tensor, optional) – Preallocated indices tensor. Defaults to None.

Returns:

List of ttnn.Tensor – the output tensor.

Note

The input_tensor supports the following data type and layout:

input_tensor

dtype

layout

BFLOAT8, BFLOAT16

TILE

index_tensor

dtype

layout

UINT16, UINT32

TILE

The output_value_tensor will have the same data type as input_tensor and output_index_tensor will have UINT16 data type.

Memory Support:
  • Interleaved: DRAM and L1

Limitations:
  • Inputs must be located on-device.

  • The op fundamentally operates on 4D tensors with shape [N, C, H, W], and with dim of -1. The tensor will be manipulated as needed when this is not the case, and restored afterwards.

  • For input_tensor, N*C*H must be a multiple of 32

  • W is ideally ≥64. If this is not the case the op will pad the tensor to satisfy this constraint.

  • The width of input_tensor along dim should be a multiple of tile width, and will be padded to the nearest multiple of tile width if needed.

  • The padding is currently only supported for bfloat16, float32, int32, and uint32.

  • To enable multicore execution, the width of input_tensor along dim must be ≥8192 and <65536, and k must be ≤64.

  • All shape validations are performed on padded shapes.

  • Sharded output memory configs are not supported for this operation.

Example

input_tensor = ttnn.rand([1, 1, 32, 64], device=device, layout=ttnn.TILE_LAYOUT)
topk_values, topk_indices = ttnn.topk(input_tensor, k=32, dim=-1, largest=True, sorted=True)