ttnn.requantize

ttnn.requantize(input_tensor: ttnn.Tensor, in_scale: ttnn.Tensor or Number, in_zero_point: ttnn.Tensor or Number, out_scale: ttnn.Tensor or Number, out_zero_point: ttnn.Tensor or Number, *, axis: Number | None, memory_config: ttnn.MemoryConfig = None) ttnn.Tensor

Re-quantize Operation

Parameters:
  • input_tensor (ttnn.Tensor) – the input tensor.

  • in_scale (ttnn.Tensor or Number) – the input quantization scale.

  • in_zero_point (ttnn.Tensor or Number) – the input quantization zero point.

  • out_scale (ttnn.Tensor or Number) – the output quantization scale.

  • out_zero_point (ttnn.Tensor or Number) – the output quantization zero point.

Keyword Arguments:
  • axis (Number, optional) – the axis of the quantization dimension of the input tensor.

  • memory_config (ttnn.MemoryConfig, optional) – memory configuration for the operation. Defaults to None.

Returns:

ttnn.Tensor – the output tensor.

Note

Supported dtypes, layouts, and ranks:

Dtypes

Layouts

Ranks

BFLOAT16

TILE

2, 3, 4

bfloat8_b/bfloat4_b supports only on TILE_LAYOUT

Mixed Quantization Support:

This operation supports mixed quantization schemes:

  • Per-tensor to Per-channel: Convert from global quantization parameters to per-channel parameters along the specified axis.

  • Per-channel to Per-tensor: Convert from per-channel quantization parameters to global parameters.

  • Per-tensor to Per-tensor: Standard requantization with scalar parameters.

  • Per-channel to Per-channel: Requantization with per-channel parameters along the same axis.

Execution Paths:

When all four parameters (in_scale, in_zero_point, out_scale, out_zero_point) are provided as tensors and an axis is specified: - The operation uses a path with explicit shape expansion and broadcasting. - Per-tensor parameters (scalar tensors) are broadcast to match the input tensor shape. - Per-channel parameters (1D tensors) are reshaped and expanded along the specified axis. - The implementation performs the mathematical requantization in floating point and typecasts to the output dtype: q’ = q * (s_in/s_out) + (z_out - z_in * s_in/s_out).

When all four parameters are provided as scalar values (float/int32): - Uses a path with a specialized kernel operation. - Computes the requantization directly in a single fused operation.

When there is a mix of scalar and tensor parameters: - Falls back to a composite operation path. - Decomposes requantization into separate dequantize and quantize operations.

Example

>>> input_tensor = ttnn.from_torch(torch.tensor([[0.1, 0.2], [0.3, 0.4]], dtype=torch.bfloat16), layout=ttnn.TILE_LAYOUT, device=device)
>>> in_scale = 0.001173
>>> in_zero_point = -213
>>> out_scale = 0.002727
>>> out_zero_point = -73
>>> output = ttnn.requantize(input_tensor, in_scale, in_zero_point, out_scale, out_zero_point)