long-context-attention
Implementation of Block Recurrent Transformer - Pytorch
Flash Attention - in Jax