ttnn.quantize
- ttnn.quantize(input_tensor: ttnn.Tensor, scale: ttnn.Tensor or Number, zero_point: ttnn.Tensor or Number, *, axis: Number | None, memory_config: ttnn.MemoryConfig = None) ttnn.Tensor
-
Quantize Operation
- Parameters:
-
input_tensor (ttnn.Tensor) – the input tensor.
scale (ttnn.Tensor or Number) – the quantization scale.
zero_point (ttnn.Tensor or Number) – the quantization zero point.
- Keyword Arguments:
-
axis (Number, optional) – the axis of the quantization dimension of the input tensor.
memory_config (ttnn.MemoryConfig, optional) – memory configuration for the operation. Defaults to None.
- Returns:
-
ttnn.Tensor – the output tensor.
Note
Supported dtypes, layouts, and ranks:
Dtypes
Layouts
Ranks
BFLOAT16
TILE
2, 3, 4
bfloat8_b/bfloat4_b supports only on TILE_LAYOUT
Example
>>> input_tensor = ttnn.from_torch(torch.tensor([[0.1, 0.2], [0.3, 0.4]], dtype=torch.bfloat16), layout=ttnn.TILE_LAYOUT, device=device) >>> scale = 0.001173 >>> zero_point = -213 >>> output = ttnn.quantize(input_tensor, scale, zero_point)