ttnn.quantize

ttnn.quantize(input_tensor: ttnn.Tensor, scale: ttnn.Tensor or Number, zero_point: ttnn.Tensor or Number, *, axis: Number | None, memory_config: ttnn.MemoryConfig = None) ttnn.Tensor

Quantize Operation

Parameters:
  • input_tensor (ttnn.Tensor) – the input tensor.

  • scale (ttnn.Tensor or Number) – the quantization scale.

  • zero_point (ttnn.Tensor or Number) – the quantization zero point.

Keyword Arguments:
  • axis (Number, optional) – the axis of the quantization dimension of the input tensor.

  • memory_config (ttnn.MemoryConfig, optional) – memory configuration for the operation. Defaults to None.

Returns:

ttnn.Tensor – the output tensor.

Note

Supported dtypes, layouts, and ranks:

Dtypes

Layouts

Ranks

BFLOAT16

TILE

2, 3, 4

bfloat8_b/bfloat4_b supports only on TILE_LAYOUT

Example

>>> input_tensor = ttnn.from_torch(torch.tensor([[0.1, 0.2], [0.3, 0.4]], dtype=torch.bfloat16), layout=ttnn.TILE_LAYOUT, device=device)
>>> scale = 0.001173
>>> zero_point = -213
>>> output = ttnn.quantize(input_tensor, scale, zero_point)