Attention Python Packages

sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

461.1M 30K 7K

flashinfer-python

FlashInfer: Kernel Library for LLM Serving

4.5M 6K 1K

nvidia-cudnn-frontend

cuDNN Frontend is NVIDIA's modern, open-source entry point to the cuDNN library and a growing collection of high-performance open-source kernels.

3.4M 862 197

flashinfer-cubin

FlashInfer: Kernel Library for LLM Serving

3M 6K 1K

sglang-kernel

SGLang is a high-performance serving framework for large language models and multimodal models.

468K 30K 7K

sgl-kernel

SGLang is a high-performance serving framework for large language models and multimodal models.

280K 30K 7K

sageattention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

144K 3K 441

dreamer4

Implementation of Danijar's latest iteration for his Dreamer line of work

34K 200 19

performer-pytorch

An implementation of Performer, a linear attention-based transformer, in Pytorch

15K 1K 149

keras-cv-attention-models

Keras beit,caformer,CMT,CoAtNet,convnext,davit,dino,efficientdet,edgenext,efficientformer,efficientnet,eva,fasternet,fastervit,fastvit,flexivit,gcvit,ghostnet,gpvit,hornet,hiera,iformer,inceptionnext,lcnet,levit,maxvit,mobilevit,moganet,nat,nfnets,pvt,swin,tinynet,tinyvit,uniformer,volo,vanillanet,yolor,yolov7,yolov8,yolox,gpt2,llama2, alias kecam

9K 627 97

keras-transformer

Transformer implemented in Keras

7K 368 94

activations

AReLU: Attention-based-Rectified-Linear-Unit

6K 62 8

keras-position-wise-feed-forward

Feed forward layer implemented in Keras

5K 8 5

native-sparse-attention-pytorch

Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper

4K 810 53

transfusion-pytorch

Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI

4K 1K 73

optimumai

Unlock the math behind AI — run any operation with explain=True for a step-by-step trace, terminal visualization, and the intuition of why AI uses it.

4K 0 0

rotary-spatial-embeddings

PyTorch implementation of Rotary Spatial Embeddings

3K 7 1

sglang-kt

SGLang is a high-performance serving framework for large language models and multimodal models.

3K 30K 7K

qwen

My personal implementation of the model from "Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities", they haven't released model code yet sooo...

3K 13 3

nexusquant-kv

Training-free KV cache compression via E8 lattice quantization; fits longer context in the same VRAM

3K 20 0

labml-nn

🧑‍🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠

3K 67K 7K

nmn

Not the neurons we want, but the neurons we need

3K 2 0

h-transformer-1d

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning

2K 167 23

rl4co

A PyTorch library for all things Reinforcement Learning (RL) for Combinatorial Optimization (CO)

2K 888 151