Adding New TT-NN Operation

Note

This document is meant for contributors to TT-NN.

Not all operations may be functional on all Tenstorrent hardware (Grayskull, Wormhole, or others).

FAQ

What is a TT-NN operation?

A TT-NN operation is a function that takes in one or more input tensors and produces one or more output tensors. It is implemented in C++ and can be called from Python.

What steps are needed to add TT-NN operation in C++?

  1. There are 2 options for writing a new operation. Option a is to write a device operation and option b is to write an operation that calls other operations a. Implement device operation in C++. Device operation is a struct that satisfies DeviceOperationConcept and specifies how to create output tensors and a program to run on the device. b. Implement an operation in C++ that calls other operations. This type of operation simply defines an invoke() method that calls other operations.

  2. Expose the operation as a free function under ttnn or ttnn::experimental (e.g. ttnn::tilize or ttnn::experimental::dropout) namespace that invokes corresponding ttnn::prim operation(e.g. ttnn::prim::dropout).

What steps are needed to add TT-NN operation in Python?

  1. Take an existing C++ operation and add a nanobind Python binding for it using ttnn::bind_function. If the operation is called ttnn::add in C++, then the Python binding will be ttnn.add.

  2. (Optional) Attach golden function to the operation using ttnn.attach_golden_function. This is useful for debugging and testing.

Example of Adding a new Device Operation

Let’s implement ttnn.example (It will just copy the input tensor to the output tensor on the device)

C++ Implementation

Step 1: Implement device operation

In order to add a new device operation, follow the directory structure shown below:

ttnn/cpp/ttnn/operations/<category>/<operation_name>/device/<operation_name>_device_operation.hpp ttnn/cpp/ttnn/operations/<category>/<operation_name>/device/<operation_name>_device_operation.cpp ttnn/cpp/ttnn/operations/<category>/<operation_name>/device/<program_factory_0>_program_factory.cpp

Note

Add as many program factories as needed. But the minimum requirement is one program factory.

Note

All new operations must use the ProgramDescriptor pattern (see below). The old CachedProgram / shared_variables_t pattern is legacy and should not be used for new operations.

A concrete example of a device operation can be found in ttnn/cpp/ttnn/operations/examples/example/device

Step 2: Implement the operation in C++

In order to add a new operation, add the following file:

ttnn/cpp/ttnn/operations/<category>/<operation_name>/<operation_name>.hpp

A concrete example:

ttnn/cpp/ttnn/operations/examples/example/example.hpp
 1// SPDX-FileCopyrightText: © 2023 Tenstorrent USA, Inc.
 2//
 3// SPDX-License-Identifier: Apache-2.0
 4
 5#pragma once
 6
 7#include "device/example_device_operation.hpp"
 8
 9namespace ttnn {
10
11// A composite operation is an operation that calls multiple operations in sequence
12// It is written using invoke and can be used to call multiple primitive and/or composite operations
13Tensor composite_example(const Tensor& input_tensor);
14
15}  // namespace ttnn

Python Implementation

Step 1: Add Python binding

In order to add a python binding for the operation, follow the directory structure shown below:

ttnn/python/ttnn/operations/<category>/<operation_name>/<operation_name>_nanobind.hpp ttnn/python/ttnn/operations/<category>/<category>_nanobind.hpp

A concrete example:

ttnn/cpp/ttnn/operations/examples/example/example_nanobind.hpp
 1// SPDX-FileCopyrightText: © 2025 Tenstorrent USA, Inc.
 2//
 3// SPDX-License-Identifier: Apache-2.0
 4
 5#pragma once
 6
 7#include "ttnn-nanobind/nanobind_fwd.hpp"
 8
 9namespace ttnn::operations::examples {
10namespace nb = nanobind;
11void bind_example_operation(nb::module_& mod);
12}  // namespace ttnn::operations::examples
ttnn/cpp/ttnn/operations/examples/examples_nanobind.hpp
 1// SPDX-FileCopyrightText: © 2025 Tenstorrent USA, Inc.
 2//
 3// SPDX-License-Identifier: Apache-2.0
 4
 5#pragma once
 6
 7#include "ttnn-nanobind/nanobind_fwd.hpp"
 8
 9namespace ttnn::operations::examples {
10
11namespace nb = nanobind;
12void py_module(nb::module_& mod);
13
14}  // namespace ttnn::operations::examples

Finally, call the module defined in examples/example/example_nanobind.hpp wherever you want it to be added.

Step 2: (Optional) Add golden function for the operation in Python

A golden function can be added to an operation in order to compare its output with an equivalent torch implementation

Add the following code in a python file:

import ttnn

# For the golden function, use the same signature as the operation
# Keep in mind that all `ttnn.Tensor`s are converted to `torch.Tensor`s
# And arguments not needed by torch can be ignored using `*args` and `**kwargs`
def golden_function(input_tensor: "torch.Tensor", *args, **kwargs):
    output_tensor:  "torch.Tensor" = ...
    return output_tensor

# TT-NN Tensors are converted to torch tensors before calling the golden function automatically
# And the outputs are converted back to TT-NN Tensors
# But in some cases you may need to preprocess the inputs and postprocess the outputs manually

# In order to preprocess the inputs manually, use the following signature
# Note that the arguments are not packed into *args and **kwargs as in the golden function!!!
def preprocess_golden_function_inputs(args, kwargs):
    # i.e.
    ttnn_input_tensor = args[0]
    return ttnn.to_torch(ttnn_input_tensor)

# In order to postprocess the outputs manually, use the following signature
# Note that the arguments are not packed into *args and **kwargs as in the golden function!!!
def postprocess_golden_function_outputs(args, kwargs, output):
    # i.e.
    ttnn_input_tensor = args[0]
    torch_output_tensor = outputs[0]
    return ttnn.from_torch(torch_output_tensor, dtype=ttnn_input_tensor.dtype, device=ttnn_input_tensor.device)

ttnn.attach_golden_function(
    ttnn.example,
    golden_function=golden_function,
    preprocess_golden_function_inputs=preprocess_golden_function_inputs, # Optional
    postprocess_golden_function_outputs=postprocess_golden_function_outputs # Optional
)

Note

ttnn.example is the name of the operation in Python because the operation was registered as ttnn::example in C++.

Step 3: (Optional) Add example usage to docs

It is good practice to include an example demonstrating how to use the new function. The simplest method is to add an Example section directly in the documentation passed to the ttnn::bind_function function. However, this approach makes it difficult to keep the example up to date and prevents the snippet from being tested.

A better approach is to place the example code in a test file and have it included automatically during the documentation build process.

In the file examples_mapping.py, each function is mapped to an example usage snippet that will appear in its documentation.

Add the new operation to the FUNCTION_TO_EXAMPLES_MAPPING_DICT dictionary, as shown below:

FUNCTION_TO_EXAMPLES_MAPPING_DICT = {
    ...
    "ttnn.example": example.test_example,
    ...
}

Place the example usage function in a new file named test_example_examples.py (or an existing file, if appropriate). Make sure the file is imported at the top of examples_mapping.py:

# ...
from . import test_data_movement_examples as data_movement
from . import test_core_examples as core

# Import the new file
from . import test_example_examples as example
# ...

Implement the example as a standard ttnn pytest:

def test_example(device):
    # Create tensor
    tensor = ttnn.rand((2, 3), ttnn.bfloat16, layout=ttnn.ROW_MAJOR_LAYOUT, device=device)

    # Call the new operation
    output_tensor = ttnn.example(tensor)

This ensures that all example code snippets are executed and validated in the TT-NN CI pipeline.