ttnn.bias_gelu_bw
- ttnn.bias_gelu_bw(grad_tensor: ttnn.Tensor, input_tensor_a: ttnn.Tensor, input_tensor_b: ttnn.Tensor or Number, *, approximate: string = none, memory_config: ttnn.MemoryConfig | None = None) List of ttnn.Tensor
-
Performs backward operations for bias_gelu on
input_tensor_a
andinput_tensor_b
orinput_tensor
andbias
, with givengrad_tensor
using givenapproximate
mode.approximate
mode can be ‘none’, ‘tanh’.- Parameters:
-
grad_tensor (ttnn.Tensor) – the input gradient tensor.
input_tensor_a (ttnn.Tensor) – the input tensor.
input_tensor_b (ttnn.Tensor or Number) – the input tensor.
- Keyword Arguments:
-
approximate (string) – Approximation type. Defaults to none.
memory_config (ttnn.MemoryConfig, optional) – Memory configuration for the operation. Defaults to None.
- Returns:
-
List of ttnn.Tensor – the output tensor.
Note
Supported dtypes, layouts, and ranks:
Dtypes
Layouts
Ranks
BFLOAT16
TILE
2, 3, 4
bfloat8_b/bfloat4_b is only supported on TILE_LAYOUT
For more details about BFLOAT8_B, refer to the BFLOAT8_B limitations.
Example
>>> grad_tensor = ttnn.from_torch(torch.tensor([[1, 2], [3, 4]], dtype=torch.bfloat16), layout=ttnn.TILE_LAYOUT, device=device) >>> tensor1 = ttnn.from_torch(torch.tensor([[1, 2], [3, 4]], dtype=torch.bfloat16, requires_grad=True), layout=ttnn.TILE_LAYOUT, device=device) >>> tensor2 = ttnn.from_torch(torch.tensor([[1, 2], [3, 4]], dtype=torch.bfloat16, requires_grad=True), layout=ttnn.TILE_LAYOUT, device=device) >>> approximate = "none" >>> output = ttnn.bias_gelu_bw(grad_tensor, tensor1, tensor2, approximate)