ttnn.topk
- ttnn.topk(input_tensor: ttnn.Tensor, k: number, dim: number, largest: bool = True, sorted: bool = True, *, memory_config: ttnn.MemoryConfig = None, output_tensor: tuple[ttnn.Tensor, ttnn.Tensor] = (`None`, sub_core_grids: ttnn.CoreRangeSet = None, indices_tensor: ttnn.Tensor = None) tuple[ttnn.Tensor, ttnn.Tensor]
-
Returns the
klargest orksmallest elements of theinput_tensoralong a given dimensiondim.If
dimis not provided, the last dimension of theinput_tensoris used.If
largestis True, theklargest elements are returned. Otherwise, theksmallest elements are returned.The boolean option
sortedif True, will make sure that the returnedkelements are sorted.Equivalent PyTorch code:
return torch.topk(input_tensor, k, dim=dim, largest=largest, sorted=sorted, *, output_tensor=None)
- Parameters:
-
input_tensor (ttnn.Tensor) – the input tensor. Must be on the device.
k (number) – the number of top elements to look for.
dim (number) – the dimension to reduce.
largest (bool) – whether to return the largest or the smallest elements. Defaults to True.
sorted (bool) – whether to return the elements in sorted order. Defaults to True.
- Keyword Arguments:
-
memory_config (ttnn.MemoryConfig, optional) – Memory configuration for the operation. Defaults to None.
output_tensor (tuple[ttnn.Tensor, ttnn.Tensor], optional) – A tuple with preallocated output tensors for the values and indices. If specified, must be on the same device as
input_tensor. Defaults to (None, None).sub_core_grids (ttnn.CoreRangeSet, optional) – Core range set to run the operation on. Defaults to None.
indices_tensor (ttnn.Tensor, optional) – Input tensor containing pre-computed index values. When provided, the operation reads indices from this tensor instead of generating them. Defaults to None.
- Returns:
-
tuple[ttnn.Tensor, ttnn.Tensor] – a tuple of (values_tensor, indices_tensor).
Note
The
input_tensorsupports the following data type and layout:input_tensor dtype
layout
BFLOAT8, BFLOAT16
TILE
index_tensor dtype
layout
UINT16, UINT32
TILE
The
output_value_tensorwill have the same data type asinput_tensorand will be in TILE layout. Theoutput_index_tensorwill be UINT16 if the dimension size is less than or equal to 65535, otherwise it will be UINT32. It will be in TILE layout.- Memory Support:
-
Interleaved: DRAM and L1
- Limitations:
-
Inputs must be located on-device.
The op fundamentally operates on 4D tensors with shape [N, C, H, W], and with
dimof -1. The tensor will be manipulated as needed when this is not the case, and restored afterwards.For
input_tensor, N*C*H must be a multiple of 32W is ideally ≥64. If this is not the case the op will pad the tensor to satisfy this constraint.
The width of
input_tensoralongdimshould be a multiple of tile width, and will be padded to the nearest multiple of tile width if needed.The padding is currently only supported for bfloat16, float32, int32, and uint32.
To enable multicore execution, the width of
input_tensoralongdimmust be ≥8192 and <65536, andkmust be ≤64.All shape validations are performed on padded shapes.
Sharded output memory configs are not supported for this operation.
Example
# Create tensor tensor_input = ttnn.rand([1, 1, 32, 64], device=device) # Apply ttnn.topk() to get top 3 values along dim=1 values, indices = ttnn.topk(tensor_input, k=32, dim=-1, largest=True, sorted=True) logger.info(f"Topk values: {values}") logger.info(f"Topk indices: {indices}")