PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Triton Python Packages

Python packages with the GitHub topic triton. Sorted by relevance, with stars and monthly downloads.
linkedin
liger-kernel

Efficient Triton Kernels for LLM Training

807K 6K 528
thu-ml
sageattention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

160K 3K 416
linkedin
liger-kernel-nightly

Efficient Triton Kernels for LLM Training

47K 6K 528
meta-pytorch
tritonparse

TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels

14K 207 27
clearml
clearml-serving

ClearML - Model-Serving Orchestration and Repository Solution

10K 164 50
notAI-tech
fastdeploy

Deploy DL/ ML inference pipelines with minimal extra code.

4K 104 17
Alberto-Codes
turboquant-vllm

TurboQuant KV cache compression plugin for vLLM — asymmetric K/V, 8 models validated, consumer GPUs

4K 48 5
ROCm
tokenspeed-iris

AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming

3K 189 39
remcofl
hilbertsfc

Ultra-fast 2D & 3D Hilbert curve kernels in Python. JIT compiled, branchless, L1-cache-friendly lookup tables, loop unrolling, SIMD, and multi-threading.

3K 7 1
DeepLink-org
dlblas

DLBlas: clean and efficient kernels

1K 40 14
MohibShaikh
yolo26-kit

YOLO26 ↔ YOLOv8-pipeline bridge: shape adapters, decoder, letterbox, and ORT wrappers.

878 0 0
msclock
transformersplus

Add Some plus extra features to transformers

862 0 0
flagos-ai
flag-gems

FlagGems is a function library written in Triton language.

805 995 362
HKUSTDial
flash-sparse-attn

Trainable fast and memory-efficient sparse attention

595 681 56
notAI-tech
fdclient

Deploy DL/ ML inference pipelines with minimal extra code.

577 104 17
dame-cell
triformer

Transformer components in Triton

549 34 1
rayliuca
grammared-language

Adding Grammarly (and other) open source ML models to LanguageTool

547 8 0
toyaix
triton-runner

Multi-Level Triton Runner supporting Python, IR, PTX, and cubin.

546 96 5
followthesapper
atlas-quantum

GPU-accelerated quantum tensor network simulator with coherence-aware VQE, adaptive MPS, molecular chemistry, VRA integration, and cuQuantum backend

451 0 0
Argonaut790
fused-turboquant

Fused Triton kernels for TurboQuant KV cache compression — 2-4 bit quantization with RHT rotation. Drop-in HuggingFace & vLLM integration. Up to 4.9x KV cache compression for Llama, Qwen, Mistral, and more.

441 8 1
chansigit
torchgw

TorchGW — Fast Sampled Gromov-Wasserstein optimal transport in pure PyTorch. GPU-accelerated with Triton fused Sinkhorn kernels. 3-175x faster than POT.

411 2 0
DeepAuto-AI
hip-attn

Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.

352 152 15
ot-triton-lab
flash-sinkhorn

Sinkhorn optimal transport kernels in PyTorch + Triton (squared Euclidean, no cost matrix materialization).

302 194 20
toyaix
tritonllm

LLM Inference via Triton (Flexible & Modular): Focused on Kernel Optimization using CUBIN binaries, Starting from gpt-oss Model

178 116 5
    • Data from PyPI, GitHub, ClickHouse, and BigQuery