PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Inference Acceleration Python Packages

Python packages with the GitHub topic inference-acceleration. Sorted by relevance, with stars and monthly downloads.
thu-ml
sageattention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

160K 3K 416
thu-ml
turbodiffusion

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

2K 3K 256
autonomi-ai
autonomi-nos

⚡️ A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW.

754 147 12
autonomi-ai
torch-nos

⚡️ A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW.

618 147 12
    • Data from PyPI, GitHub, ClickHouse, and BigQuery