Basic Operations with TT-NN

We will review a simple example that demonstrates how to create various tensors and perform basic arithmetic operations on them using TT-NN, a high-level Python API. These operations include addition, multiplication, and matrix multiplication, as well as simulating broadcasting of a row vector across a tile.

Lets create the example file, ttnn_basic_operations.py

Import the necessary libraries

import torch
import numpy as np
import ttnn
from loguru import logger

Open Tenstorrent device

Create necessary device on which we will run our program.

# Open Tenstorrent device
device = ttnn.open_device(device_id=0)

Helper Function for Tensor Preparation

Lets create a helper function for convering from PyTorch tensors to TT-NN tiled tensors.

# Helper to create a TT-NN tensor from torch with TILE_LAYOUT and bfloat16
def to_tt_tile(torch_tensor):
   return ttnn.from_torch(torch_tensor, dtype=ttnn.bfloat16, layout=ttnn.TILE_LAYOUT, device=device)

Host Tensor Creation

Create a tensor for our tests and fill with different values. We will use this and other tensors to demonstrate various operations.

logger.info("\n--- TT-NN Tensor Creation with Tiles (32x32) ---")
host_rand = torch.rand((32, 32), dtype=torch.float32)

Convert Host Tensors to TT-NN Tiled Tensors or Create Natively on Device

Tensix cores operate most efficiently on tiled data, allowing them to perform a large amount of compute in parallel. Where necesasry, lets convert host tensors to TT-NN tiled tensors using the helper function we created earlier, and transfer them to the TT device. Alternatively, we can create tensors natively using TT-NN’s tensor creation functions, and initialize them directly on the TT device. TT-NN calls that create tensors natively on the device are a more efficient way to create tensors, as they avoid the overhead of transferring data from the host to the device.

tt_t1 = ttnn.full(
   shape=(32, 32),
   fill_value=1.0,
   dtype=ttnn.float32,
   layout=ttnn.TILE_LAYOUT,
   device=device,
)

tt_t2 = ttnn.zeros(
   shape=(32, 32),
   dtype=ttnn.bfloat16,
   layout=ttnn.TILE_LAYOUT,
   device=device,
)
tt_t3 = ttnn.ones(
   shape=(32, 32),
   dtype=ttnn.bfloat16,
   layout=ttnn.TILE_LAYOUT,
   device=device,
)
tt_t4 = to_tt_tile(host_rand)

t5 = np.array([[5, 6], [7, 8]], dtype=np.float32).repeat(16, axis=0).repeat(16, axis=1)
tt_t5 = ttnn.Tensor(t5, device=device, layout=ttnn.TILE_LAYOUT)

Tile-Based Arithmetic Operations

Lets use some of the tensors we created and perform different operations on them.

logger.info("\n--- TT-NN Tensor Operations on (32x32) Tiles ---")
add_result = ttnn.add(tt_t3, tt_t4)
mul_result = ttnn.mul(tt_t4, tt_t5)
matmul_result = ttnn.matmul(tt_t3, tt_t4, memory_config=ttnn.DRAM_MEMORY_CONFIG)

Simulated Broadcasting (Row Vector Expansion)

Lets simulated broadcasting a row vector across a tile. This is useful for operations that require expanding a smaller tensor to match the dimensions of a larger one.

logger.info("\n--- Simulated Broadcasting (32x32 + Broadcasted Row Vector) ---")
broadcast_vector = torch.tensor([[1.0] * 32], dtype=torch.float32).repeat(32, 1)
broadcast_tt = to_tt_tile(broadcast_vector)
broadcast_add_result = ttnn.add(tt_t4, broadcast_tt)

Full example and output

Lets put everything together in a complete example that can be run directly. This example will open a Tenstorrent device, create some input tensors and perform operations on them, log the output tensors, and close the device.

Source Code

# SPDX-FileCopyrightText: © 2025 Tenstorrent AI ULC
# SPDX-License-Identifier: Apache-2.0

import torch
import numpy as np
import ttnn
from loguru import logger


def main():
    # Open Tenstorrent device
    device = ttnn.open_device(device_id=0)

    try:
        # Helper to create a TT-NN tensor from torch with TILE_LAYOUT and bfloat16
        def to_tt_tile(torch_tensor):
            return ttnn.from_torch(torch_tensor, dtype=ttnn.bfloat16, layout=ttnn.TILE_LAYOUT, device=device)

        logger.info("\n--- TT-NN Tensor Creation with Tiles (32x32) ---")
        host_rand = torch.rand((32, 32), dtype=torch.float32)

        tt_t1 = ttnn.full(
            shape=(32, 32),
            fill_value=1.0,
            dtype=ttnn.float32,
            layout=ttnn.TILE_LAYOUT,
            device=device,
        )
        tt_t2 = ttnn.zeros(
            shape=(32, 32),
            dtype=ttnn.bfloat16,
            layout=ttnn.TILE_LAYOUT,
            device=device,
        )
        tt_t3 = ttnn.ones(
            shape=(32, 32),
            dtype=ttnn.bfloat16,
            layout=ttnn.TILE_LAYOUT,
            device=device,
        )
        tt_t4 = to_tt_tile(host_rand)

        t5 = np.array([[5, 6], [7, 8]], dtype=np.float32).repeat(16, axis=0).repeat(16, axis=1)
        tt_t5 = ttnn.Tensor(t5, device=device, layout=ttnn.TILE_LAYOUT)

        logger.info(f"Tensor from fill value 1:\n{ttnn.to_torch(tt_t1)}")
        logger.info(f"Zeros:\n{ttnn.to_torch(tt_t2)}")
        logger.info(f"Ones:\n{ttnn.to_torch(tt_t3)}")
        logger.info(f"Random:\n{ttnn.to_torch(tt_t4)}")
        logger.info(f"From expanded NumPy (TT-NN):\n{ttnn.to_torch(tt_t5)}")

        logger.info("\n--- TT-NN Tensor Operations on (32x32) Tiles ---")
        add_result = ttnn.add(tt_t3, tt_t4)
        mul_result = ttnn.mul(tt_t4, tt_t5)
        matmul_result = ttnn.matmul(tt_t3, tt_t4, memory_config=ttnn.DRAM_MEMORY_CONFIG)

        ttnn_add = ttnn.to_torch(add_result)
        logger.info(f"Addition:\n{ttnn_add}")

        ttnn_mul = ttnn.to_torch(mul_result)
        logger.info(f"Element-wise Multiplication:\n{ttnn_mul}")

        ttnn_matmul = ttnn.to_torch(matmul_result)
        logger.info(f"Matrix Multiplication:\n{ttnn_matmul}")

        logger.info("\n--- Simulated Broadcasting (32x32 + Broadcasted Row Vector) ---")
        broadcast_vector = torch.tensor([[1.0] * 32], dtype=torch.float32).repeat(32, 1)
        broadcast_tt = to_tt_tile(broadcast_vector)
        broadcast_add_result = ttnn.add(tt_t4, broadcast_tt)
        logger.info(f"Broadcast Add Result (TT-NN):\n{ttnn.to_torch(broadcast_add_result)}")

    finally:
        ttnn.close_device(device)


if __name__ == "__main__":
    main()

Running this script will output the operation results as shown below

$ python3 $TT_METAL_HOME/ttnn/tutorials/basic_python/ttnn_basic_operations.py
2025-06-23 09:47:12.093 | INFO     | __main__:main:19 -
--- TT-NN Tensor Creation with Tiles (32x32) ---
2025-06-23 09:47:12.117 | INFO     | __main__:main:47 - Tensor from fill value 1:
tensor([[1., 1., 1.,  ..., 1., 1., 1.],
      [1., 1., 1.,  ..., 1., 1., 1.],
      [1., 1., 1.,  ..., 1., 1., 1.],
      ...,
      [1., 1., 1.,  ..., 1., 1., 1.],
      [1., 1., 1.,  ..., 1., 1., 1.],
      [1., 1., 1.,  ..., 1., 1., 1.]])
2025-06-23 09:47:12.117 | INFO     | __main__:main:48 - Zeros:
tensor([[0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      ...,
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.]], dtype=torch.bfloat16)
2025-06-23 09:47:12.118 | INFO     | __main__:main:49 - Ones:
tensor([[1., 1., 1.,  ..., 1., 1., 1.],
      [1., 1., 1.,  ..., 1., 1., 1.],
      [1., 1., 1.,  ..., 1., 1., 1.],
      ...,
      [1., 1., 1.,  ..., 1., 1., 1.],
      [1., 1., 1.,  ..., 1., 1., 1.],
      [1., 1., 1.,  ..., 1., 1., 1.]], dtype=torch.bfloat16)
2025-06-23 09:47:12.119 | INFO     | __main__:main:50 - Random:
tensor([[0.1367, 0.3320, 0.8125,  ..., 0.7969, 0.6250, 0.8906],
      [0.6914, 0.1377, 0.2480,  ..., 0.6406, 0.0109, 0.2080],
      [0.6992, 0.8750, 0.6133,  ..., 0.3086, 0.6562, 0.6016],
      ...,
      [0.1455, 0.8672, 0.0221,  ..., 0.3926, 0.1074, 0.9414],
      [0.5859, 0.1426, 0.8906,  ..., 0.5820, 0.0182, 0.7031],
      [0.8711, 0.1377, 0.7305,  ..., 0.4102, 0.2812, 0.6836]],
      dtype=torch.bfloat16)
2025-06-23 09:47:12.120 | INFO     | __main__:main:51 - From expanded NumPy (TT-NN):
tensor([[5., 5., 5.,  ..., 6., 6., 6.],
      [5., 5., 5.,  ..., 6., 6., 6.],
      [5., 5., 5.,  ..., 6., 6., 6.],
      ...,
      [7., 7., 7.,  ..., 8., 8., 8.],
      [7., 7., 7.,  ..., 8., 8., 8.],
      [7., 7., 7.,  ..., 8., 8., 8.]])
2025-06-23 09:47:12.120 | INFO     | __main__:main:53 -
--- TT-NN Tensor Operations on (32x32) Tiles ---
2025-06-23 09:47:18.928 | INFO     | __main__:main:59 - Addition:
tensor([[1.1406, 1.3359, 1.8125,  ..., 1.7969, 1.6250, 1.8906],
      [1.6953, 1.1406, 1.2500,  ..., 1.6406, 1.0078, 1.2109],
      [1.7031, 1.8750, 1.6172,  ..., 1.3125, 1.6562, 1.6016],
      ...,
      [1.1484, 1.8672, 1.0234,  ..., 1.3906, 1.1094, 1.9453],
      [1.5859, 1.1406, 1.8906,  ..., 1.5859, 1.0156, 1.7031],
      [1.8750, 1.1406, 1.7344,  ..., 1.4141, 1.2812, 1.6875]],
      dtype=torch.bfloat16)
2025-06-23 09:47:18.929 | INFO     | __main__:main:62 - Element-wise Multiplication:
tensor([[0.6836, 1.6641, 4.0625,  ..., 4.7812, 3.7500, 5.3438],
      [3.4531, 0.6875, 1.2422,  ..., 3.8438, 0.0654, 1.2500],
      [3.5000, 4.3750, 3.0625,  ..., 1.8516, 3.9375, 3.6094],
      ...,
      [1.0156, 6.0625, 0.1543,  ..., 3.1406, 0.8594, 7.5312],
      [4.0938, 1.0000, 6.2500,  ..., 4.6562, 0.1455, 5.6250],
      [6.0938, 0.9648, 5.1250,  ..., 3.2812, 2.2500, 5.4688]],
      dtype=torch.bfloat16)
2025-06-23 09:47:18.930 | INFO     | __main__:main:65 - Matrix Multiplication:
tensor([[17.5000, 13.4375, 16.7500,  ..., 15.2500, 13.0625, 17.2500],
      [17.5000, 13.4375, 16.7500,  ..., 15.2500, 13.0625, 17.2500],
      [17.5000, 13.4375, 16.7500,  ..., 15.2500, 13.0625, 17.2500],
      ...,
      [17.5000, 13.4375, 16.7500,  ..., 15.2500, 13.0625, 17.2500],
      [17.5000, 13.4375, 16.7500,  ..., 15.2500, 13.0625, 17.2500],
      [17.5000, 13.4375, 16.7500,  ..., 15.2500, 13.0625, 17.2500]],
      dtype=torch.bfloat16)
2025-06-23 09:47:18.930 | INFO     | __main__:main:67 -
--- Simulated Broadcasting (32x32 + Broadcasted Row Vector) ---
2025-06-23 09:47:18.932 | INFO     | __main__:main:71 - Broadcast Add Result (TT-NN):
tensor([[1.1406, 1.3359, 1.8125,  ..., 1.7969, 1.6250, 1.8906],
      [1.6953, 1.1406, 1.2500,  ..., 1.6406, 1.0078, 1.2109],
      [1.7031, 1.8750, 1.6172,  ..., 1.3125, 1.6562, 1.6016],
      ...,
      [1.1484, 1.8672, 1.0234,  ..., 1.3906, 1.1094, 1.9453],
      [1.5859, 1.1406, 1.8906,  ..., 1.5859, 1.0156, 1.7031],
      [1.8750, 1.1406, 1.7344,  ..., 1.4141, 1.2812, 1.6875]],
      dtype=torch.bfloat16)