ttnn.moe_expert_token_remap
- ttnn.moe_expert_token_remap(topk_tensor: ttnn.Tensor, expert_mapping_tensor: ttnn.Tensor, expert_metadata_tensor: ttnn.Tensor, *, memory_config: ttnn.MemoryConfig = None, output_mapping_tensor: ttnn.Tensor = None, output_reduced_tensor: ttnn.Tensor = None, reduction_size: int | None) Tuple
-
Remap MoE CCL Metadata from global experts to local device experts
- Parameters:
-
topk_tensor (ttnn.Tensor) – tensor of MoE topk scores, [devices/devices, batch, seq, experts]
expert_mapping_tensor (ttnn.Tensor) – tensor that maps MoE experts to devices, [1, 1, experts, devices]
expert_metadata_tensor (ttnn.Tensor) – tensor that maps tokens to global experts [devices/devices, batch, seq, select_experts_k]`
- Keyword Arguments:
-
memory_config (ttnn.MemoryConfig, optional) – Memory configuration for the operation. Defaults to None.
output_mapping_tensor (ttnn.Tensor, optional) – Preallocated output mapping tensor. Defaults to None.
output_reduced_tensor (ttnn.Tensor, optional) – Preallocated output reduced tensor. Defaults to None.
reduction_size (int, optional) – reduction chunk size
- Returns:
-
Tuple – ttnn.Tensor: Tensor that maps batch tokens to local experts, [devices/devices, batch, seq, experts_per_device] ttnn.Tensor: Bool Tensor that reduces the mapping tensor by chunks of reduction_size, [devices/devices, batch*seq/reduction_size, experts_per_device]