ttnn.topk

ttnn.topk = Operation(python_fully_qualified_name='ttnn.topk', function=<ttnn._ttnn.operations.reduction.topk_t object>, preprocess_golden_function_inputs=<function default_preprocess_golden_function_inputs>, golden_function=<function _create_golden_function_topk.<locals>.golden_function>, postprocess_golden_function_outputs=<function default_postprocess_golden_function_outputs>, is_cpp_operation=True, is_experimental=False)

ttnn.topk(input_tensor: ttnn.Tensor, k: int, dim: int, largest: bool, sorted: bool, out: Optional[Tuple[ttnn.Tensor, ttnn.Tensor]] = None, memory_config: Optional[ttnn.MemoryConfig] = None, sub_core_grids: Optional[ttnn.CoreRangeSet] = None, indices_tensor: Optional[ttnn.Tensor] = None) -> Tuple[ttnn.Tensor, ttnn.Tensor]

Returns the k largest or k smallest elements of the input_tensor along a given dimension dim.

If dim is not provided, the last dimension of the input_tensor is used.

If largest is True, the k largest elements are returned. Otherwise, the k smallest elements are returned.

The boolean option sorted if True, will make sure that the returned k elements are sorted.

Equivalent PyTorch code:

return torch.topk(input_tensor, k, dim=dim, largest=largest, sorted=sorted, *, out=None)

Parameters:

input_tensor (ttnn.Tensor) – the input tensor.
k (number) – the number of top elements to look for.
dim (number) – the dimension to reduce.
largest (bool) – whether to return the largest or the smallest elements. Defaults to False.
sorted (bool) – whether to return the elements in sorted order. Defaults to False.

Keyword Arguments:

memory_config (ttnn.MemoryConfig, optional) – Memory configuration for the operation. Defaults to None.
output_tensor (ttnn.Tensor, optional) – Preallocated output tensor. Defaults to None.
sub_core_grids (ttnn.CoreRangeSet, optional) – Core range set to run the operation on. Defaults to None.
indices_tensor (ttnn.Tensor, optional) – Preallocated indices tensor. Defaults to None.

Returns:

List of ttnn.Tensor – the output tensor.

Note

The input_tensor supports the following data type and layout:

input_tensor
dtype	layout
BFLOAT8, BFLOAT16	TILE

index_tensor
dtype	layout
UINT16, UINT32	TILE

The output_value_tensor will have the same data type as input_tensor and output_index_tensor will have UINT16 data type.

Memory Support:

Interleaved: DRAM and L1

Limitations:

Inputs must be located on-device.
The op fundamentally operates on 4D tensors with shape [N, C, H, W], and with dim of -1. The tensor will be manipulated as needed when this is not the case, and restored afterwards.
For input_tensor, N*C*H must be a multiple of 32
W is ideally ≥64. If this is not the case the op will pad the tensor to satisfy this constraint.
The width of input_tensor along dim should be a multiple of tile width, and will be padded to the nearest multiple of tile width if needed.
The padding is currently only supported for bfloat16, float32, int32, and uint32.
To enable multicore execution, the width of input_tensor along dim must be ≥8192 and <65536, and k must be ≤64.
All shape validations are performed on padded shapes.
Sharded output memory configs are not supported for this operation.

Example

input_tensor = ttnn.rand([1, 1, 32, 64], device=device, layout=ttnn.TILE_LAYOUT)
topk_values, topk_indices = ttnn.topk(input_tensor, k=32, dim=-1, largest=True, sorted=True)