ttnn.copy_device_to_host_tensor

ttnn.copy_device_to_host_tensor(device_tensor: ttnn.Tensor, host_tensor: ttnn.Tensor, blocking: bool = True, cq_id: ttnn.QueueId = None) → None

copy_device_to_host_tensor(device_tensor: ttnn._ttnn.tensor.Tensor, host_tensor: ttnn._ttnn.tensor.Tensor, blocking: bool = True, cq_id: ttnn._ttnn.types.QueueId | None = None) -> None

Copies a tensor from device to host.

Parameters:

device_tensor (ttnn.Tensor) – the tensor to be copied from device to host.
host_tensor (ttnn.Tensor) – the tensor to be copied to.
blocking (bool, optional) – whether the operation should be blocked until the copy is complete. Defaults to True.
cq_id (ttnn.QueueId, optional) – The queue id to use. Defaults to None.

Note

This operations supports tensors according to the following data types and layout:

device/host tensor
dtype - layout
BFLOAT16, BFLOAT8_B, BFLOAT4_B, FLOAT32, UINT32, INT32, UINT16, UINT8 - TILE
BFLOAT16, FLOAT32, UINT32, INT32, UINT16, UINT8 - ROW_MAJOR

Memory Support:

Interleaved: DRAM and L1
Height, Width, Block, and ND Sharded: DRAM and L1

Limitations:

Host and Device tensors must be the same shape, have the same datatype, and have the same data layout (ROW_MAJOR or TILE).

Example

# Create a TT-NN tensor and copy it to the host
ttnn_tensor = ttnn.rand((2, 3), dtype=ttnn.bfloat16, device=device)
host_tensor = ttnn.allocate_tensor_on_host(ttnn_tensor.spec, device)
ttnn.copy_device_to_host_tensor(ttnn_tensor, host_tensor)

logger.info("Host tensor shape after copying from device", host_tensor.shape)