ttnn.moe_expert_token_remap

ttnn.moe_expert_token_remap(topk_tensor: ttnn.Tensor, expert_mapping_tensor: ttnn.Tensor, expert_metadata_tensor: ttnn.Tensor, *, memory_config: ttnn.MemoryConfig = None, output_mapping_tensor: ttnn.Tensor = None, output_reduced_tensor: ttnn.Tensor = None, reduction_size: int | None) Tuple

Remap MoE CCL Metadata from global experts to local device experts

Parameters:
  • topk_tensor (ttnn.Tensor) – tensor of MoE topk scores, [devices/devices, batch, seq, experts]

  • expert_mapping_tensor (ttnn.Tensor) – tensor that maps MoE experts to devices, [1, 1, experts, devices]

  • expert_metadata_tensor (ttnn.Tensor) – tensor that maps tokens to global experts [devices/devices, batch, seq, select_experts_k]`

Keyword Arguments:
  • memory_config (ttnn.MemoryConfig, optional) – Memory configuration for the operation. Defaults to None.

  • output_mapping_tensor (ttnn.Tensor, optional) – Preallocated output mapping tensor. Defaults to None.

  • output_reduced_tensor (ttnn.Tensor, optional) – Preallocated output reduced tensor. Defaults to None.

  • reduction_size (int, optional) – reduction chunk size

Returns:

Tuple – ttnn.Tensor: Tensor that maps batch tokens to local experts, [devices/devices, batch, seq, experts_per_device] ttnn.Tensor: Bool Tensor that reduces the mapping tensor by chunks of reduction_size, [devices/devices, batch*seq/reduction_size, experts_per_device]