Examples of Tensor and TT-LIB Use
Run one OP from TT-LIB on TT Accelerator device
In this code example we use TT Accelerator device to execute relu
op from TT-LIB library.
These are the steps:
create and initialize TT Accelerator device and get handle for host machine
create random PyTorch tensor, convert it to TT Tensor and send to TT Accelerator device
execute
relu
on TT Accelerator devicemove output TT Tensor to host machine and print it
import torch
import tt_lib as tt_lib
if __name__ == "__main__":
# Initialize TT Accelerator device on PCI slot 0
tt_device = tt_lib.device.CreateDevice(0)
# Create random PyTorch tensor
py_tensor = torch.randn((1, 1, 32, 32))
# Create TT tensor from PyTorch Tensor and send it to TT accelerator device
tt_tensor = tt_lib.tensor.Tensor(
py_tensor.reshape(-1).tolist(),
py_tensor.size(),
tt_lib.tensor.DataType.BFLOAT16,
tt_lib.tensor.Layout.ROW_MAJOR,
tt_device
)
# Run relu on TT accelerator device
tt_relu_out = tt_lib.tensor.relu(tt_tensor)
# Move TT Tensor tt_relu_out from TT accelerator device to host
tt_output = tt_relu_out.cpu()
# Print TT Tensor
print(tt_output)
# Close TT accelerator device
tt_lib.device.CloseDevice(tt_device)
Run TT-LIB and PyTorch OPs
In this code example we build on previous example and after relu
also execute pow
, silu
, and exp
.
Since pow
with exponent as tensor is not supported in TT-LIB library, we need to move TT Tensor produced by relu
to host machine,
convert it to PyTorch tensor, execute pow
from PyTorch, and then convert the output of pow
back to TT Tensor for silu
to be executed on device. The TT-LIB supports power
for integral scalar exponent, and power_fp
for floating point positive valued exponent.
After pow
is executed as a fallback op; this means that the operation will actually execute as a PyTorch operation
on host machine. But since silu
is supported as on-device operation in TT-LIB library we can
supply it with TT Tensor as input.
Lastly, we run exp
on TT Accelerator device (supplying it with output from silu
without any conversion).
import torch
import tt_lib as tt_lib
from tt_lib.fallback_ops import fallback_ops
if __name__ == "__main__":
# Initialize TT Accelerator device on PCI slot 0
tt_device = tt_lib.device.CreateDevice(0)
# Create random PyTorch tensor
py_tensor = torch.randn((1, 1, 32, 32))
py_tensor_exp = torch.randint(0,10,(1, 1, 32, 32))
# Create TT tensor from PyTorch Tensor and send it to TT accelerator device
tt_tensor = tt_lib.tensor.Tensor(
py_tensor.reshape(-1).tolist(),
py_tensor.size(),
tt_lib.tensor.DataType.BFLOAT16,
tt_lib.tensor.Layout.ROW_MAJOR,
tt_device
)
# Run relu on TT accelerator device
tt_relu_out = tt_lib.tensor.relu(tt_tensor)
# Move TT Tensor tt_relu_out to host and convert it to PyTorch tensor py_relu_out
tt_relu_out = tt_relu_out.cpu()
py_relu_out = torch.Tensor(tt_relu_out.data()).reshape(tt_relu_out.shape)
# Execute pow using PyTorch (since pow is not available from tt_lib)
py_pow_out = torch.pow(py_relu_out, py_tensor_exp)
# Create TT Tensor from py_pow_out and move it to TT accelerator device
tt_pow_out = tt_lib.tensor.Tensor(
py_pow_out.reshape(-1).tolist(),
py_pow_out.size(),
tt_lib.tensor.DataType.BFLOAT16,
tt_lib.tensor.Layout.ROW_MAJOR,
tt_device
)
# Run silu on TT Tensor tt_pow_out
# This is a fallback op and it will behave like regular ops on TT accelerator device,
# even though under the hood this op is executed on host.
tt_silu_out = tt_lib.tensor.silu(tt_pow_out)
# Run exp on TT accelerator device
tt_exp_out = tt_lib.tensor.exp(tt_silu_out)
# Move TT Tensor output from TT accelerator device to host
tt_output = tt_exp_out.cpu()
# Print TT Tensor
print(tt_output)
# Close TT accelerator device
tt_lib.device.CloseDevice(tt_device)
Tensors with odd size of last dim
We can’t create or move to TT Accelerator device a TT Tensor that is in ROW_MAJOR layout and has odd size of last dimension. This type of TT Tensor can be created on host machine and can be passed to TT-LIB operations.
A TT-LIB operation will automatically pad the tensor so that the size of last dimension is even, move it to TT Accelerator device, execute the operation, move output tensor back to host, and finally unpad the output tensor.
To use this functionality, you must call tt_lib.device.SetDefaultDevice(tt_device) to set your TT Accelerator device as the default device that will be used to execute operations on tensors that are on host machine.
So if you want to use a TT Tensor with odd size of last dimension, the first example with running one operation on TT Accelerator device can be modified as follow:
import torch
import tt_lib as tt_lib
if __name__ == "__main__":
# Initialize TT Accelerator device on PCI slot 0
tt_device = tt_lib.device.CreateDevice(0)
# Set default TT Accelerator device
# This device will be used to execute TT Tensors that are not assigned to a device
tt_lib.device.SetDefaultDevice(tt_device)
# Create random PyTorch tensor
py_tensor = torch.randn((1, 1, 32, 31))
# Create TT tensor from PyTorch Tensor and leave it on host device
tt_tensor = tt_lib.tensor.Tensor(
py_tensor.reshape(-1).tolist(),
py_tensor.size(),
tt_lib.tensor.DataType.BFLOAT16,
tt_lib.tensor.Layout.ROW_MAJOR,
)
# Run relu on TT accelerator device
# The ops will pad tensor as needed and send to TT Accelerator device for execution,
# then it will return result to host and unpad
tt_relu_out = tt_lib.tensor.relu(tt_tensor)
# Move TT Tensor output from TT accelerator device to host
# Note that in this example this call will not do anything since tt_relu_out is already on host machine
tt_output = tt_relu_out.cpu()
# Print TT Tensor
print(tt_output)
# Close TT accelerator device
tt_lib.device.CloseDevice(tt_device)