ttnn.reduce_scatter

ttnn.reduce_scatter(input_tensor: ttnn.Tensor, dim: int, cluster_axis: int, mesh_device: MeshDevice, *, num_links: int | None = 1, memory_config: ttnn.MemoryConfig | None = input tensor memory config, num_workers: int | None = None, num_buffers_per_channel: int | None = None, topology: ttnn.Topology | None = ttnn.Topology.Ring) ttnn.Tensor

Performs an reduce_scatter operation on multi-device input_tensor across all devices.

Parameters:
  • input_tensor (ttnn.Tensor) – multi-device tensor

  • dim (int) – Dimension to perform operation

  • cluster_axis (int) – Provided a MeshTensor, the axis corresponding to MeshDevice to perform the line-reduce-scatter operation on.

  • mesh_device (MeshDevice) – Device mesh to perform the line-reduce-scatter operation on.

  • cluster_axis and mesh_device parameters are applicable only for Linear Topology.

Mesh Tensor Programming Guide : https://github.com/tenstorrent/tt-metal/blob/main/tech_reports/Programming%20Mesh%20of%20Devices/Programming%20Mesh%20of%20Devices%20with%20TT-NN.md

Keyword Arguments:
  • num_links (int, optional) – Number of links to use for the reduce0scatter operation. Defaults to 1.

  • memory_config (ttnn.MemoryConfig, optional) – Memory configuration for the operation. Defaults to input tensor memory config.

  • num_workers (int, optional) – Number of workers to use for the operation. Defaults to None.

  • num_buffers_per_channel (int, optional) – Number of buffers per channel to use for the operation. Defaults to None.

  • topology (ttnn.Topology, optional) – The topology configuration to run the operation in. Valid options are Ring and Linear. Defaults to ttnn.Topology.Ring.

Returns:

ttnn.Tensor – the output tensor.

Example

>>> full_tensor = torch.randn([1, 1, 256, 256], dtype=torch.bfloat16)
>>> num_devices = 8
>>> dim = 3
>>> input_tensors = torch.chunk(full_tensor, num_devices, dim)
>>> physical_device_ids = ttnn.get_t3k_physical_device_ids_ring()
>>> mesh_device = ttnn.open_mesh_device(ttnn.MeshShape(1, 8), physical_device_ids=physical_device_ids[:8])
>>> tt_input_tensors = []
>>> for i, t in enumerate(input_tensors):
        tt_input_tensors.append(ttnn.Tensor(t, input_dtype).to(layout).to(mesh_device.get_devices()[i], mem_config))
>>> input_tensor_mesh = ttnn.aggregate_as_tensor(tt_input_tensors)
>>> output = ttnn.reduce_scatter(input_tensor_mesh, dim=0, topology=ttnn.Topology.Linear)