Triton Python Packages

liger-kernel

Efficient Triton Kernels for LLM Training

582K 6K 552

sageattention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

144K 3K 441

liger-kernel-nightly

Efficient Triton Kernels for LLM Training

51K 6K 552

dsalt

Official implementation of 'Noise Accumulation and Rank Collapse in Dense Self-Attention: DSALT'

29K 2 0

tritonparse

TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels

10K 212 26

clearml-serving

ClearML - Model-Serving Orchestration and Repository Solution

4K 167 50

late-interaction-kernels

Fused Triton kernels for late-interaction (MaxSim) scoring

4K 23 0

tokenspeed-iris

AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming

3K 191 42

fastdeploy

Deploy DL/ ML inference pipelines with minimal extra code.

3K 105 17

hilbertsfc

Ultra-fast 2D & 3D Hilbert curve kernels in Python. JIT compiled, branchless, L1-cache-friendly lookup tables, loop unrolling, SIMD, and multi-threading.

3K 8 1

kernelmeter

Query every CUDA device attribute without profiling, and benchmark kernels against your hardware's theoretical peak.

3K 4 0

turboquant-vllm

TurboQuant KV cache compression plugin for vLLM — asymmetric K/V, 8 models validated, consumer GPUs

2K 48 6

flashspec

Adaptive speculative-decoding inference engine with Triton-optimised verification and online bandit draft selection.

2K 11 0

tridec

Vendor-portable GPU decoders for quantum LDPC codes — Triton min-sum BP & Relay-BP on NVIDIA (CUDA), AMD (ROCm), and Apple-silicon (Metal), consuming any stim DetectorErrorModel.

2K 0 0

flash-sparse-attn

Trainable fast and memory-efficient sparse attention

1K 721 52

transformersplus

Add Some plus extra features to transformers

962 0 0

flag-gems

FlagGems is an operator library for large language models implemented in the Triton Language.

848 1K 436

grammared-language

Adding Grammarly (and other) open source ML models to LanguageTool

828 18 0

yolo26-kit

YOLO26 ↔ YOLOv8-pipeline bridge — shape adapters, decoder, letterbox, ORT wrapper. Python + TypeScript.

762 0 0

fdclient

Deploy DL/ ML inference pipelines with minimal extra code.

707 105 17

hip-attn

HiP Attention

705 152 15

triton-msl

Apple Silicon (Metal) backend for OpenAI Triton — write standard @triton.jit kernels on your Mac GPU. The same source runs on NVIDIA (fp32 verified bit-identical), so you develop kernel logic locally and rent a GPU only for the perf pass.

675 0 0

dlblas

611 42 27

atlas-quantum

GPU-accelerated quantum tensor network simulator with adaptive MPS

528 0 0