Flash Attention Python Packages

nvidia-cudnn-frontend

cuDNN Frontend is NVIDIA's modern, open-source entry point to the cuDNN library and a growing collection of high-performance open-source kernels.

3.4M 862 197

ffpa-attn

🤖FFPA: Extends FlashAttention-2 via Split-D for large headdims, 1.5x~3×↑🎉 vs SDPA, up to 430T🎉 on H200.

7K 313 22

morphottention

Mathematical Morphology-based self-attention module for PyTorch (CUDA) using Flash-style kernel fusion.

2K 0 0

flash-sparse-attn

Trainable fast and memory-efficient sparse attention

1K 721 52

inf-cl

A highly memory-efficient contrastive loss.

694 288 12

flash-sinkhorn

The official repository of FlashSinkhorn [ICML 2026 Oral]

393 203 20

flashmha

An simple pytorch implementation of Flash MultiHead Attention

309 22 4

flash-attention-triton

Cross-platform FlashAttention-2 Triton implementation for Turing+ GPUs with custom configuration mode

271 26 0

flash-dmattn

Trainable fast and memory-efficient sparse attention

257 721 52

gpkg

GPU package manager — find prebuilt CUDA wheels, build missing ones, generate uv pyproject.toml

212 0 0

easywheels

Smart GPU wheel installer. Auto-detects CUDA, GPU, torch, and Python to install the right pre-built wheel.

189 0 0

jax-flash-attn2

Flash Attention Implementation with Multiple Backend Support and Sharding This module provides a flexible implementation of Flash Attention with support for different backends (GPU, TPU, CPU) and platforms (Triton, Pallas, JAX).

166 34 1

scree

Cross-framework ragged tensor primitive with reference varlen kernels

149 0 0