ttnn.topk
- ttnn.topk(input_tensor: ttnn.Tensor, k: number, dim: number, largest: bool = True, sorted: bool = True, *, memory_config: ttnn.MemoryConfig = None, output_tensor: ttnn.Tensor = None, sub_core_grids: ttnn.CoreRangeSet = None, indices_tensor: ttnn.Tensor = None) List of ttnn.Tensor
-
Returns the
klargest orksmallest elements of theinput_tensoralong a given dimensiondim.If
dimis not provided, the last dimension of theinput_tensoris used.If
largestis True, theklargest elements are returned. Otherwise, theksmallest elements are returned.The boolean option
sortedif True, will make sure that the returnedkelements are sorted.Equivalent PyTorch code:
return torch.topk(input_tensor, k, dim=dim, largest=largest, sorted=sorted, *, out=None)
- Parameters:
-
input_tensor (ttnn.Tensor) – the input tensor.
k (number) – the number of top elements to look for.
dim (number) – the dimension to reduce.
largest (bool) – whether to return the largest or the smallest elements. Defaults to True.
sorted (bool) – whether to return the elements in sorted order. Defaults to True.
- Keyword Arguments:
-
memory_config (ttnn.MemoryConfig, optional) – Memory configuration for the operation. Defaults to None.
output_tensor (ttnn.Tensor, optional) – Preallocated output tensor. Defaults to None.
sub_core_grids (ttnn.CoreRangeSet, optional) – Core range set to run the operation on. Defaults to None.
indices_tensor (ttnn.Tensor, optional) – Preallocated indices tensor. Defaults to None.
- Returns:
-
List of ttnn.Tensor – the output tensor.
Note
The
input_tensorsupports the following data type and layout:input_tensor dtype
layout
BFLOAT8, BFLOAT16
TILE
index_tensor dtype
layout
UINT16, UINT32
TILE
The
output_value_tensorwill have the same data type asinput_tensorand will be in TILE layout. Theoutput_index_tensorwill be UINT16 and will be in TILE layout.- Memory Support:
-
Interleaved: DRAM and L1
- Limitations:
-
Inputs must be located on-device.
The op fundamentally operates on 4D tensors with shape [N, C, H, W], and with
dimof -1. The tensor will be manipulated as needed when this is not the case, and restored afterwards.For
input_tensor, N*C*H must be a multiple of 32W is ideally ≥64. If this is not the case the op will pad the tensor to satisfy this constraint.
The width of
input_tensoralongdimshould be a multiple of tile width, and will be padded to the nearest multiple of tile width if needed.The padding is currently only supported for bfloat16, float32, int32, and uint32.
To enable multicore execution, the width of
input_tensoralongdimmust be ≥8192 and <65536, andkmust be ≤64.All shape validations are performed on padded shapes.
Sharded output memory configs are not supported for this operation.
Example
# Create tensor tensor_input = ttnn.rand([1, 1, 32, 64], device=device) # Apply ttnn.topk() to get top 3 values along dim=1 values, indices = ttnn.topk(tensor_input, k=32, dim=-1, largest=True, sorted=True) logger.info(f"Topk values: {values}") logger.info(f"Topk indices: {indices}")