Multi-Head Attention
-
Multi-Head Attention
- Enable program cache
- Write Multi-Head Attention using ttnn
- Configuration
- Initialize activations and weights using torch
- Convert activations and weights to ttnn
- Run the first iteration of Multi-Head Attention
- Run a subsequent iteration of Multi-Head Attention
- Write optimized version of Multi-Head Attention
- Pre-process the parameters of the optimized model
- Run the first iteration of the optimized Multi-Head Attention
- Run a subsequent iteration of the optimized Multi-Head Attention
- Check that the output of the optimized version matches the output of the original implementation
- Close the device