Model Compression Python Packages

tensorflow-model-optimization

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.

87K 2K 348

deepcache

[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free

53K 969 54

nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

29K 14K 2K

torch-pruning

[CVPR 2023] DepGraph: Towards Any Structural Pruning; LLMs, Vision Foundation Models, etc.

29K 3K 383

tf-model-optimization-nightly

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.

11K 2K 348

kxy

A toolkit to boost the productivity of machine learning engineers.

4K 49 12

dnaty

Compress PyTorch models for edge devices — CPU-only, no GPU, no retraining. One function call.

4K 1 0

bigsmall

Lossless AI model compression - ~34% smaller with bit-identical weights; the autopilot profiles your machine, picks the highest fidelity that runs, and streams models bigger than your RAM.

3K 1 0

picollm

On-device LLM Inference Powered by X-Bit Quantization

2K 313 26

picollmdemo

On-device LLM Inference Powered by X-Bit Quantization

2K 313 26

model-quantizer

Effortlessly quantize, benchmark, and publish Hugging Face models with cross-platform support for CPU/GPU. Reduce model size by 75% while maintaining performance.

1K 2 0

aquvitae

Knowledge Distillation Toolkit

964 88 10

micronet

A model compression and deploy lib.

929 2K 471

torch-optim

PyTorch models optimization by neural network pruning

901 3 1

fasterai

FasterAI: Prune and Distill your models with FastAI and PyTorch

837 265 19

distill-anything

Distill any AI model into one you own — a teacher (Claude/GPT/HF/Ollama) generates your dataset, a student trains on its logits or its words, a judge scores it blind, a benchmark prices it. Runs on a MacBook.

654 2 0

only-train-once

OTOv1-v3, NeurIPS, ICLR, TMLR, DNN Training, Compression, Structured Pruning, Erasing Operators, CNN, Diffusion, LLM

580 311 47

uni-layer

A Universal Framework for Layer Contribution Analysis

539 0 0

nni-daily

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

534 14K 2K

musco-pytorch

MUSCO: MUlti-Stage COmpression of neural networks

439 72 16

kd-lib

A Pytorch Knowledge Distillation library for benchmarking and extending works in the domains of Knowledge Distillation, Pruning, and Quantization.

427 649 61

prismkv

3-D stacked-plane KV cache quantization + RAG framework. Defensive prior-art publication extending TurboQuant to conditional 3-D polar cells with adaptive bit allocation.

421 1 1

fused-turboquant

Fused Triton kernels for TurboQuant KV cache compression — 2-4 bit quantization with RHT rotation. Drop-in HuggingFace & vLLM integration. Up to 4.9x KV cache compression for Llama, Qwen, Mistral, and more.

398 8 1

octopus-ml

A collection of handy ML and data visualization and validation tools. Go ahead and train, evaluate and validate your ML models and data with minimal effort.

394 23 5