PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Quantization Python Packages

Python packages with the GitHub topic quantization. Sorted by relevance, with stars and monthly downloads.
OpenNMT
ctranslate2

Fast inference engine for Transformer models

8.3M 4K 487
SYSTRAN
faster-whisper

Faster Whisper transcription with CTranslate2

7.1M 23K 2K
bitsandbytes-foundation
bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.

6.4M 8K 851
pytorch
torchao

PyTorch native quantization and sparsity for training and inference

3.8M 3K 505
huggingface
optimum

🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools

1.9M 3K 640
PINTO0309
onnx2tf

A tool for converting ONNX files to LiteRT/TFLite/TensorFlow, PyTorch native code (nn.Module), TorchScript (.pt), state_dict (.pt), Exported Program (.pt2), and Dynamo ONNX. It also supports direct conversion from LiteRT to PyTorch.

1.4M 956 100
openvinotoolkit
nncf

Neural Network Compression Framework for enhanced OpenVINOâ„¢ inference

482K 1K 294
vllm-project
llmcompressor

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

402K 3K 515
huggingface
optimum-quanto

A pytorch quantization backend for optimum

274K 1K 86
hiyouga
llamafactory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

235K 71K 9K
thu-ml
sageattention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

160K 3K 416
tensorflow
tensorflow-model-optimization

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.

111K 2K 347
intel
auto-round

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

79K 1K 135
PanQiWei
auto-gptq

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

70K 5K 543
natasha
navec

Compact high quality word embeddings for Russian language

51K 219 19
ModelCloud
gptqmodel

LLM model quantization (compression) toolkit with HW acceleration support for Nvidia, AMD, Intel GPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

35K 1K 184
intel
intel-extension-for-pytorch

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform

32K 2K 315
intel
neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

32K 3K 305
quic
aimet-torch

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

30K 3K 451
fastmachinelearning
qonnx

QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX

26K 187 57
Xilinx
brevitas

Brevitas: neural network quantization in PyTorch

24K 2K 242
jyunming
tqdb

Embedded vector database using the TurboQuant algorithm (arXiv:2504.19874) — zero training, 2-4 bit compression, fast inner-product search

24K 2 0
foundation-model-stack
fms-model-optimizer

FMS Model Optimizer is a framework for developing reduced precision neural network models.

24K 21 20
quic
aimet-onnx

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

20K 3K 451
    • Data from PyPI, GitHub, ClickHouse, and BigQuery