ttnn.topk
- ttnn.topk = Operation(python_fully_qualified_name='ttnn.topk', function=<ttnn._ttnn.operations.reduction.topk_t object>, preprocess_golden_function_inputs=<function default_preprocess_golden_function_inputs>, golden_function=<function _create_golden_function_topk.<locals>.golden_function>, postprocess_golden_function_outputs=<function default_postprocess_golden_function_outputs>, is_cpp_operation=True, is_experimental=False)
-
ttnn.topk(input_tensor: ttnn.Tensor, k: int, dim: int, largest: bool, sorted: bool, out: Optional[Tuple[ttnn.Tensor, ttnn.Tensor]] = None, memory_config: Optional[ttnn.MemoryConfig] = None, sub_core_grids: Optional[ttnn.CoreRangeSet] = None, indices_tensor: Optional[ttnn.Tensor] = None) -> Tuple[ttnn.Tensor, ttnn.Tensor]
Returns the
k
largest ork
smallest elements of theinput_tensor
along a given dimensiondim
.If
dim
is not provided, the last dimension of theinput_tensor
is used.If
largest
is True, thek
largest elements are returned. Otherwise, thek
smallest elements are returned.The boolean option
sorted
if True, will make sure that the returnedk
elements are sorted.Equivalent PyTorch code:
return torch.topk(input_tensor, k, dim=dim, largest=largest, sorted=sorted, *, out=None)
- Parameters:
-
input_tensor (ttnn.Tensor) – the input tensor.
k (number) – the number of top elements to look for.
dim (number) – the dimension to reduce.
largest (bool) – whether to return the largest or the smallest elements. Defaults to False.
sorted (bool) – whether to return the elements in sorted order. Defaults to False.
- Keyword Arguments:
-
memory_config (ttnn.MemoryConfig, optional) – Memory configuration for the operation. Defaults to None.
output_tensor (ttnn.Tensor, optional) – Preallocated output tensor. Defaults to None.
sub_core_grids (ttnn.CoreRangeSet, optional) – Core range set to run the operation on. Defaults to None.
indices_tensor (ttnn.Tensor, optional) – Preallocated indices tensor. Defaults to None.
- Returns:
-
List of ttnn.Tensor – the output tensor.
Note
The
input_tensor
supports the following data type and layout:input_tensor dtype
layout
BFLOAT8, BFLOAT16
TILE
index_tensor dtype
layout
UINT16, UINT32
TILE
The
output_value_tensor
will have the same data type asinput_tensor
andoutput_index_tensor
will have UINT16 data type.- Memory Support:
-
Interleaved: DRAM and L1
- Limitations:
-
Inputs must be located on-device.
The op fundamentally operates on 4D tensors with shape [N, C, H, W], and with
dim
of -1. The tensor will be manipulated as needed when this is not the case, and restored afterwards.For
input_tensor
, N*C*H must be a multiple of 32W is ideally ≥64. If this is not the case the op will pad the tensor to satisfy this constraint.
The width of
input_tensor
alongdim
should be a multiple of tile width, and will be padded to the nearest multiple of tile width if needed.The padding is currently only supported for bfloat16, float32, int32, and uint32.
To enable multicore execution, the width of
input_tensor
alongdim
must be ≥8192 and <65536, andk
must be ≤64.All shape validations are performed on padded shapes.
Sharded output memory configs are not supported for this operation.
Example
input_tensor = ttnn.rand([1, 1, 32, 64], device=device, layout=ttnn.TILE_LAYOUT) topk_values, topk_indices = ttnn.topk(input_tensor, k=32, dim=-1, largest=True, sorted=True)