APIs
Device
int, l1_small_size: int, trace_region_size: int, dispatch_core_config: ttnn.device.DispatchCoreConfig, worker_l1_size: int) |
|
Close the device and remove it from the device cache. |
|
Context manager for opening and closing a device. |
|
ttnn._ttnn.multi_device.MeshDevice, cq_id: ttnn._ttnn.types.QueueId | None = None, sub_device_ids: collections.abc.Sequence[ttnn._ttnn.device.SubDeviceId] = []) -> None |
|
ttnn._ttnn.multi_device.MeshDevice | None = None) -> None |
|
GetDefaultDevice() -> ttnn._ttnn.multi_device.MeshDevice |
|
collections.abc.Sequence[int]) -> list[int] |
Memory Config
Creates a MemoryConfig object with a sharding spec, required for sharded ops. |
Operations
Core
Converts the torch.Tensor tensor into a ttnn.Tensor. |
|
copy_device_to_host_tensor(device_tensor: ttnn._ttnn.tensor.Tensor, host_tensor: ttnn._ttnn.tensor.Tensor, blocking: bool = True, cq_id: ttnn._ttnn.types.QueueId | None = None) -> None |
|
copy_host_to_device_tensor(host_tensor: ttnn._ttnn.tensor.Tensor, device_tensor: ttnn._ttnn.tensor.Tensor, cq_id: ttnn._ttnn.types.QueueId | None = None) -> None |
|
Releases the resources for ttnn.Tensor |
|
Dump tensor to a file. |
|
Copies the ttnn.Tensor |
|
Converts the torch.Tensor tensor into a ttnn.Tensor. |
|
ttnn._ttnn.tensor.Tensor) -> list[ttnn._ttnn.tensor.Tensor] |
|
Load tensor from a file. |
|
reallocate(tensor: ttnn._ttnn.tensor.Tensor, memory_config: ttnn._ttnn.tensor.MemoryConfig | None = None) -> ttnn._ttnn.tensor.Tensor |
|
ttnn._ttnn.tensor.CoreCoord, units_to_divide: int, row_wise: bool = False) -> tuple[int, ttnn._ttnn.tensor.CoreRangeSet, ttnn._ttnn.tensor.CoreRangeSet, ttnn._ttnn.tensor.CoreRangeSet, int, int] split_work_to_cores(core_grid: ttnn._ttnn.tensor.CoreRangeSet, units_to_divide: int, row_wise: bool = False) -> tuple[int, ttnn._ttnn.tensor.CoreRangeSet, ttnn._ttnn.tensor.CoreRangeSet, ttnn._ttnn.tensor.CoreRangeSet, int, int] |
|
Copies the ttnn.Tensor |
|
Converts a tensor to the desired dtype |
|
Organizes the ttnn.Tensor tensor into either ttnn.ROW_MAJOR_LAYOUT or ttnn.TILE_LAYOUT. |
|
Converts a tensor to the desired memory configuration. |
|
Converts the ttnn.Tensor tensor into a torch.Tensor. |
|
Applies typecast to |
Tensor Creation
Creates a tensor with values ranging from start (inclusive) to end (exclusive) with a specified step size. |
|
Generates a tensor to draw binary random numbers (0 or 1) from a Bernoulli distribution. |
|
Create a complex tensor from real and imaginary part tensors. |
|
Creates a device tensor with uninitialized values of the specified shape, data type, layout, and memory configuration. |
|
Creates a new tensor with the same shape as the given reference, but without initializing its values. |
|
Creates a device tensor with values from a buffer of the specified, data type, layout, and memory configuration. |
|
Creates a tensor of the specified shape and fills it with the specified scalar value. |
|
Creates a tensor of the same shape as the input tensor and fills it with the specified scalar value. |
|
index_fill(input: Tensor, dim: uint32, index: Tensor, value: int or float, memory_config: MemoryConfig) -> Tensor Create or fill a tensor with the given value, with the specified memory_config. |
|
Creates a tensor with the specified shape and fills it with the value of 1.0. |
|
Creates a tensor of the same shape as the input tensor and fills it with the value of 1.0. |
|
Generates a tensor with the given shape, filled with random values from a uniform distribution. |
|
Update in-place the input tensor with values drawn from the continuous uniform distribution 1 / (to - from). |
|
Creates a tensor with the specified shape and fills it with the value of 0.0. |
|
Creates a tensor of the same shape as the input tensor and fills it with the value of 0.0. |
Matrix Multiplication
Returns the matrix product of two tensors. |
|
Returns the linear transformation of the inputs. |
|
Returns a matrix products of tensors mat1_tensor and mat2_tensor. |
|
Returns the matrix product of two tensors. |
Configuration class for multi-core reusable matmul operations. |
|
The "2D" matmul program config is used for block sharded tensors, and general interleaved tensors. |
|
Configuration class for 1D multicast matmul operations with advanced features. |
|
This program config is a specialized config for very narrow tensors stored in DRAM. |
Pointwise Unary
Applies abs to |
|
Applies acos to |
|
Applies acosh to |
|
Applies alt_complex_rotate90 to |
|
Performs complex operations for angle of |
|
Applies asin to |
|
Applies asinh to |
|
Applies atan to |
|
Performs atanh function on |
|
Bitcast reinterprets the bit pattern without conversion (unlike typecast which converts values). |
|
Perform bitwise_left_shift operation on |
|
Applies bitwise_not to |
|
Perform bitwise_right_shift operation on |
|
Applies cbrt to |
|
Applies ceil to |
|
Applies celu to |
|
Applies clamp to |
|
Performs clip function on |
|
Clones the input tensor, creating a copy with the specified memory configuration and converting its data type to dtype. |
|
Returns complex conjugate value of complex tensor |
|
Applies cos to |
|
Performs cosh function on |
|
Applies deg2rad to |
|
Performs digamma function on |
|
Applies eqz to |
|
Applies erf to |
|
Applies erfc to |
|
Applies erfinv to |
|
Applies exp to |
|
Applies exp2 to |
|
Applies dropout to |
|
Performs elu function on |
|
Applies expm1 to |
|
Applies fill to |
|
Applies floor to |
|
Applies frac to |
|
Applies geglu to |
|
Applies gelu to |
|
Applies gez to |
|
Applies glu to |
|
Applies gtz to |
|
Applies hardmish to |
|
Performs hardshrink function on |
|
Applies hardsigmoid to |
|
Applies hardswish to |
|
Performs hardtanh function on |
|
Applies heaviside to |
|
Applies i0 to |
|
Applies i1 to |
|
Returns a copy of the |
|
Performs complex operations for imag of |
|
Returns boolean tensor if value of |
|
Returns boolean tensor if value of |
|
Applies isfinite to |
|
Applies isinf to |
|
Applies isnan to |
|
Applies isneginf to |
|
Applies isposinf to |
|
Applies leaky_relu to |
|
Applies lez to |
|
Performs lgamma function on |
|
Applies log to |
|
Applies log10 to |
|
Applies log1p to |
|
Applies log2 to |
|
Applies log_sigmoid to |
|
Perform logical_left_shift operation on |
|
Applies logical_not to |
|
Performs logical_not inplace function on |
|
Perform logical_right_shift operation on |
|
Performs logit function on |
|
Applies ltz to |
|
Applies mish to |
|
Performs multigammaln function on |
|
Applies neg to |
|
Applies nez to |
|
Performs normalize_global function on |
|
Performs normalize_hw function on |
|
Perform an polar to Cartesian transformation on |
|
Performs polygamma function on |
|
Perform an eltwise-prelu operation. |
|
Applies rad2deg to |
|
Performs the element-wise division of a scalar |
|
Performs complex operations for real of |
|
Applies reciprocal to |
|
Applies reglu to |
|
Applies relu to |
|
Applies relu6 to |
|
Applies relu_max to |
|
Applies relu_min to |
|
Performs an eltwise-modulus operation. |
|
Applies round to |
|
Applies rsqrt to |
|
Performs selu function on |
|
Applies sigmoid to |
|
Applies sigmoid_accurate to |
|
Applies sign to |
|
Applies signbit to |
|
Applies silu to |
|
Applies sin to |
|
Performs sinh function on |
|
Applies softplus to |
|
Performs softshrink function on |
|
Applies softsign to |
|
Applies sqrt to |
|
Applies square to |
|
Computes the standard deviation across the height (H) and width (W) dimensions for each batch and channel. |
|
Applies swiglu to |
|
Applies swish to |
|
Applies tan to |
|
Applies tanh to |
|
Applies tanhshrink to |
|
Performs threshold function on |
|
Performs tril function on |
|
Performs triu function on |
|
Applies trunc to |
|
Applies unary_chain to |
|
Computes the variance across the height (H) and width (W) dimensions for each batch and channel. |
Pointwise Binary
Adds |
|
Adds |
|
Computes addalpha for |
|
Computes atan2 |
|
Computes bias_gelu of |
|
Performs bias_gelu in-place operation on |
|
Perform bitwise_and operation on |
|
Perform bitwise_or operation on |
|
Perform bitwise_xor operation on |
|
Divides |
|
Computes div_no_nan for |
|
Divides |
|
Performs division in-place operation on |
|
Compares if |
|
Performs Equal to in-place operation on |
|
Computes floor division for |
|
Performs an eltwise-fmod operation. |
|
Computes Greatest common divisor of |
|
Compares if |
|
Performs Greater than or equal to in-place operation on |
|
Compares if |
|
Performs Greater than in-place operation on |
|
Computes hypot |
|
Computes isclose for |
|
Computes Least common multiple of |
|
Computes ldexp of |
|
Performs ldexp in-place operation on |
|
Compares if |
|
Performs Less than or equal to in-place operation on |
|
Computes logaddexp of |
|
Computes logaddexp2 of |
|
Performs logaddexp2 in-place operation on |
|
Performs logaddexp in-place operation on |
|
Computes logical AND of |
|
Computes inplace logical AND of |
|
Computes logical OR of |
|
Computes inplace logical OR of |
|
Compute logical_xor |
|
Computes inplace logical XOR of |
|
Compares if |
|
Performs Less than in-place operation on |
|
Computes maximum for |
|
Computes minimum for |
|
Multiplies |
|
Multiplies |
|
Compares if |
|
Performs Not equal to in-place operation on |
|
Computes nextafter |
|
Computes outer for |
|
Computes polyval of all elements of |
|
Perform element-wise pow operation on |
|
Performs an eltwise-modulus operation. |
|
Applies rpow to |
|
Subtracts |
|
Subtracts |
|
Scatters the source tensor's values along a given dimension according to the index tensor. |
|
Scatters the source tensor's values along a given dimension according to the index tensor, adding source values associated with according repeated indices. |
|
Computes squared difference of |
|
Performs squared_difference in-place operation on |
|
Computes subalpha for |
|
Subtracts |
|
Subtracts |
|
Computes xlogy |
Pointwise Ternary
Computes Addcdiv on |
|
Computes addcmul: output = input_a + value * input_b * input_c |
|
Computes Lerp on |
|
Computes Mac on |
|
Selects elements from |
Quantization
De-quantize Operation |
|
Quantize Operation |
|
Re-quantize Operation |
Losses
Returns mean absolute error loss function for input_reference and input_prediction |
|
Returns mean squared error loss function for input_reference and input_prediction |
Reduction
Returns the indices of the maximum value of elements in the |
|
Returns cumulative product of input along dimension dim For a given input of size N, the output will also contain N elements and be such that: |
|
Returns cumulative sum of |
|
|
|
Sets a seed to pseudo random number generators (PRNGs) on the specified device. |
|
Computes the max of the input tensor |
|
Computes the mean of the input tensor |
|
Computes the min of the input tensor |
|
Returns the weight of the zero-th MoE expert. |
|
Computes the product of all elements on specified |
|
Samples from the |
|
Computes the std of the input tensor |
|
Computes the sum of the input tensor |
|
Returns the |
|
Computes the var of the input tensor |
Data Movement
Returns a new tensor which is a new copy of input tensor. |
|
Perform a binary elementwise operation |
|
Splits a tensor into multiple chunks along a specified dimension. |
|
|
|
Copies the elements from |
|
Returns a new tensor where singleton dimensions are expanded to a larger side. |
|
Fills the implicit padding of a tiled input tensor with the specified value. |
|
Same as |
|
Generates an NCHW row-major tensor and fill it with high values up to hOnes, wOnes in each HW tile with the rest padded with high values. |
|
Fold TT Tensor. |
|
The gather operation extracts values from the input tensor based on indices provided in the index tensor along a specified dimension. |
|
Replaces batch of input in input_b denoted by batch_ids into input_a. |
|
Converts a tensor from interleaved to sharded memory layout |
|
Converts a partial tensor from interleaved to sharded memory layout |
|
Remap MoE CCL Metadata from global experts to local device experts |
|
Remap MoE routing weights to local device routing weights. |
|
Moves the elements of the input tensor |
|
Returns the number of elements (N) that are non-zero as well as a tensor of the same shape as input where the first N elements are the indices of non-zero elements. |
|
Returns a padded tensor, with a specified value at the specified location. |
|
Permutes the dimensions of the input tensor according to the specified permutation. |
|
Returns a new tensor filled with repetition of input |
|
Repeats elements of a |
|
Note: for a 0 cost view, the following conditions must be met: |
|
Returns a tensor with the new shape of |
|
Converts a tensor from one sharded layout to another sharded layout |
|
Performs circular shifting of elements along the specified dimension(s). |
|
Converts a tensor from sharded to interleaved memory layout |
|
Converts a partial tensor from sharded_to_interleaved memory layout |
|
Returns a sliced tensor. |
|
Sorts the elements of the input tensor along the specified dimension in ascending order by default. |
|
Returns a tensor that is in num_splits ways on dim. |
|
Returns a tensor with the specified dimensions squeezed. |
|
Stacks tensors along a new dimension. |
|
Changes data layout of input tensor to TILE. |
|
Changes data layout of input tensor to TILE. |
|
Changes data layout of input tensor to TILE. |
|
Returns a tensor that is transposed along dims dim1 and dim2 |
|
Returns a tensor unsqueezed at the specified dimension |
|
unsqueeze_to_4D(tensor: ttnn._ttnn.tensor.Tensor) -> ttnn._ttnn.tensor.Tensor |
|
Changes data layout of input tensor to ROW_MAJOR. |
|
Changes data layout of input tensor to ROW_MAJOR and unpads/removes elements from the tensor. |
|
This is a 0 cost view operation that returns the same tensor that was passed to it but with a new shape |
Normalization
Applies batch norm over each channel on |
|
Computes group_norm over |
|
Computes layer norm over |
|
This operation is used in conjunction with |
|
This operation is used in conjunction with |
|
Computes RMS norm over |
|
This operation is used in conjunction with |
|
This operation is used in conjunction with |
|
Specialized in-place operation for causal masked softmax with height-width dimension constraints. |
|
Computes a fused scale-mask-softmax operation along the last dimension of the input tensor. |
|
Computes a fused scale-mask-softmax operation along the last dimension in-place. |
|
Computes the softmax function over the specified dimension of the input tensor. |
|
Computes the softmax function along the last dimension of the input tensor in-place. |
Normalization Program Configs
Default program configuration for Softmax operations. |
|
Base program configuration variant for Softmax operations. |
|
Multi-core sharded program configuration for Softmax operations. |
Transformer
Divides |
|
In-Place divides |
|
Chunked causal scaled dot product attention for processing long sequences in chunks. |
|
Chunked causal scaled dot product attention for processing long sequences in chunks. |
|
Takes in a tensor of shape |
|
Causal MLA attention." |
|
A version of scaled dot product attention specifically for decode. |
|
JointAttention operation that efficiently performs non-causal attention over two sets of query, key, and value tensors. |
|
A version of scaled dot product attention specifically for decode. |
|
A version of scaled dot product attention specifically for decode. |
|
|
Ring-distributed causal scaled dot product attention for multi-device execution. |
RingJointAttention operation that efficiently performs non-causal attention over two sets of query, key, and value tensors, where the first set is sharded across devices in the sequence dimension. |
|
Causal scaled dot product attention. |
|
A version of scaled dot product attention specifically for decode. |
|
Splits |
|
Windowed scaled dot product attention. |
CCL
All-broadcast operation across devices. |
|
All-gather operation across devices along a selected dimension and optional cluster axis. |
|
All-reduce operation across devices with Sum reduction. |
|
All to all combine operation for combining the output tokens from the experts, based on the expert metadata and expert mapping tensors. |
|
All to all dispatch operation for dispatching the input tokens to devices with the selected experts, based on the expert indices and expert mapping tensors. |
|
Performs a broadcast operation from a sender device to all other mesh devices across a cluster axis. |
|
Partitions the input tensor across the mesh such that each device has the i/num_devices-th partition of the input tensor along the specified dimension. |
|
Point-to-point send and receive operation. Send a tensor from one device to another. |
|
Reduce-scatter operation across devices along a selected dimension and optional cluster axis. |
|
Reduce-to-root operation. Performs sdpa tree reduction across 4 devices and stores the output on the root device only. |
Embedding
Retrieves word embeddings using input_tensor. |
Convolution
Applies a 1D convolution over an input signal composed of several input planes. |
|
Applies a 2D convolution over an input signal composed of several input planes. |
|
Applies a 2D transposed convolution operator over an input image composed of several input planes. |
|
Applies a 3D convolution over an input signal composed of several input planes. |
|
TTNN Conv2D applies preprocessing to the bias tensors before performing the convolution operation, to convert the bias into a format suitable for the operation. |
|
TTNN ConvTranspose2D applies preprocessing to the bias tensors before performing the convolution operation, to convert the bias into a format suitable for the operation. |
|
TTNN ConvTranspose2D applies preprocessing to the weights tensors before performing the conv_tranpose2D operation, to convert the weights into a format suitable for the operation. |
|
TTNN Conv2D applies preprocessing to the weights tensors before performing the convolution operation, to convert the weights into a format suitable for the operation. |
Conv2DConfig is a structure that contains all the Tenstorrent device specific & implementation specific flags for the |
|
alias of |
Pooling
Applies experimental adaptive average pooling to the input tensor. |
|
Applies experimental adaptive max pooling to the input tensor. |
|
Applies an average pool convolution to the input tensor. |
|
Applies global_avg_pool2d to |
|
Applies a max pool convolution to the input tensor. |
Prefetcher
Asynchronously pre-fetch tensors from DRAM into the neighbouring L1 cores. |
Vision
Performs grid sampling on the input tensor using the provided sampling grid. |
|
Upsamples a given multi-channel 2D (spatial) data. |
Generic
Executes a custom operation with user-defined kernels on the device. |
KV Cache
Populates the |
|
Updates the |
|
Fills the cache tensor in place with the values from input at the specified batch_idx. |
|
Updates the cache tensor in place with the values from input at the specified update_idx. |
Backward operations
Performs backward operations for abs on |
|
Performs backward operations for inverse cosine (acos) on |
|
Performs backward operations for inverse hyperbolic cosine (acosh) on |
|
Performs backward operations for add of |
|
Performs backward operations for addalpha on |
|
Performs backward operations for addcdiv of |
|
Performs backward operations for addcmul of |
|
Performs backward operations for complex angle function on |
|
Performs backward operations for inverse sine (asin) on |
|
Performs backward operations for inverse hyperbolic sine (asinh) on |
|
Performs backward operations for assign of |
|
Performs backward operations for atan2 of |
|
Performs backward operations for inverse tangent (atan) on |
|
Performs backward operations for inverse hyperbolic tangent (atanh) on |
|
Performs backward operations for bias_gelu on |
|
Performs backward operations for ceil on |
|
Performs backward operations for celu on |
|
Performs backward operations for clamp on |
|
Performs backward operations for clip on |
|
Performs backward operations for concat on |
|
Performs backward operations for complex conj function on |
|
Performs backward operations for cosine on |
|
Performs backward operations for hyperbolic cosine (cosh) on |
|
Performs backward operations for degree to radian conversion (deg2rad) on |
|
Performs backward operations for digamma on |
|
Performs backward operations for divide on |
|
Performs backward operations for div_no_nan on |
|
Performs backward operations for elu on |
|
Returns the input gradients of the output gradients tensor with respect to the input indices. |
|
Performs backward operations for erf on |
|
Performs backward operations for erfc on |
|
Performs backward operations for erfinv on |
|
Performs backward operations for exp2 on |
|
Performs backward operations for exponential function on |
|
Applies the backward pass of the GELU function using ttnn experimental kernels. |
|
Performs backward operations for expm1 on |
|
Performs backward operations for fill on |
|
Performs backward operations for fill zero on |
|
Performs backward operations for floor on |
|
Performs backward operations for fmod of |
|
Performs backward operations for frac on |
|
Performs backward operations for gelu on |
|
Performs backward operations for hardshrink on |
|
Performs backward operations for hardsigmoid on |
|
Performs backward operations for hardswish on |
|
Performs backward operations for hardtanh activation function on |
|
Performs backward operations for hypot of |
|
Performs backward operations for i0 on |
|
Performs backward operations for complex imaginary function on |
|
Performs backward operations for ldexp of |
|
Performs backward operations for leaky_relu on |
|
Performs backward operations for lerp of |
|
Performs backward operations for lgamma on |
|
Performs backward operations for log10 on |
|
Performs backward operations for log1p on |
|
Performs backward operations for log2 on |
|
Performs backward operations for logarithm on |
|
Performs backward operations for log sigmoid on |
|
Performs backward operations for logaddexp2 of |
|
Performs backward operations for logaddexp of |
|
Performs backward operations for logit on |
|
Performs backward operations for logiteps on |
|
Performs backward operations for maximum of |
|
Performs backward operations for minimum of |
|
Performs backward operations for multiply on |
|
Performs backward operations for multivariate logarithmic gamma function (also referred to as mvlgamma) on |
|
Performs backward operations for neg on |
|
Performs backward operations for complex polar function on |
|
Performs backward operations for polygamma on |
|
Performs backward operations for power on |
|
Performs backward operations for prod on |
|
Performs backward operations for radian to degree conversion (rad2deg) on |
|
Performs backward operations for Unary rdiv on |
|
Performs backward operations for complex real function on |
|
Performs backward operations for reciprocal on |
|
Performs backward operations for relu6 on |
|
Performs backward operations for relu on |
|
Performs backward operations for remainder of |
|
Performs backward operations for repeat on |
|
Performs backward operations for round on |
|
Performs backward operations for rpow on |
|
Performs backward operations for reciprocal of square-root on |
|
Performs backward operations for subraction of |
|
Performs backward operations for selu on |
|
Performs backward operations for sigmoid on |
|
Performs backward operations for sign on |
|
Performs backward operations for silu on |
|
Performs backward operations for sin on |
|
Performs backward operations for hyperbolic sine (sinh) on |
|
Performs backward operations for softplus on |
|
Performs backward operations for softshrink on |
|
Performs backward operations for softsign on |
|
Performs backward operations for square-root on |
|
Performs backward operations for square on |
|
Performs backward operations for squared_difference of |
|
Performs backward operations for subtract of |
|
Performs backward operations for subalpha of |
|
Performs backward operations for tan on |
|
Performs backward operations for hyperbolic tangent (tanh) function on |
|
Performs backward operations for tanhshrink on |
|
Performs backward operations for threshold on |
|
Performs backward operations for truncation on |
|
Performs backward operations for where of |
|
Performs backward operations for xlogy of |
Model Conversion
Preprocess modules and parameters of a given model. |
|
Preprocess parameters of a given model. |
Reports
str, sci_mode: Optional[str|bool], precision: Optional[int] |
Operation Hooks
register_pre_operation_hook is a context manager that registers a pre-operation hook. |
|
register_post_operation_hook is a context manager that registers a post-operation hook. |