ttnn.reduce_scatter
- ttnn.reduce_scatter = FastOperation(python_fully_qualified_name='ttnn.reduce_scatter', function=<ttnn._ttnn.operations.ccl.reduce_scatter_t object>, preprocess_golden_function_inputs=None, golden_function=None, postprocess_golden_function_outputs=None, is_cpp_operation=True, is_experimental=False)
-
Performs an reduce_scatter operation on multi-device
input_tensor
across all devices.- Args:
-
input_tensor (ttnn.Tensor): multi-device tensor dim (int): Dimension to perform operation cluster_axis (int): Provided a MeshTensor, the axis corresponding to MeshDevice to perform the line-reduce-scatter operation on. mesh_device (MeshDevice): Device mesh to perform the line-reduce-scatter operation on.
cluster_axis and mesh_device parameters are applicable only for Linear Topology.
Mesh Tensor Programming Guide : https://github.com/tenstorrent/tt-metal/blob/main/tech_reports/Programming%20Mesh%20of%20Devices/Programming%20Mesh%20of%20Devices%20with%20TT-NN.md
- Keyword Args:
-
num_links (int, optional): Number of links to use for the reduce0scatter operation. Defaults to 1. memory_config (ttnn.MemoryConfig, optional): Memory configuration for the operation. Defaults to input tensor memory config. num_workers (int, optional): Number of workers to use for the operation. Defaults to None. num_buffers_per_channel (int, optional): Number of buffers per channel to use for the operation. Defaults to None. topology (ttnn.Topology, optional): The topology configuration to run the operation in. Valid options are Ring and Linear. Defaults to ttnn.Topology.Ring.
- Returns:
-
ttnn.Tensor: the output tensor.
Example:
>>> full_tensor = torch.randn([1, 1, 256, 256], dtype=torch.bfloat16) >>> num_devices = 8 >>> dim = 3 >>> input_tensors = torch.chunk(full_tensor, num_devices, dim) >>> physical_device_ids = ttnn.get_t3k_physical_device_ids_ring() >>> mesh_device = ttnn.open_mesh_device(ttnn.MeshShape(1, 8), physical_device_ids=physical_device_ids[:8]) >>> tt_input_tensors = [] >>> for i, t in enumerate(input_tensors): tt_input_tensors.append(ttnn.Tensor(t, input_dtype).to(layout).to(mesh_device.get_devices()[i], mem_config)) >>> input_tensor_mesh = ttnn.aggregate_as_tensor(tt_input_tensors)
>>> output = ttnn.reduce_scatter(input_tensor_mesh, dim=0, topology=ttnn.Topology.Linear)