Pruning Python Packages

nncf

Neural Network Compression Framework for enhanced OpenVINO™ inference

851K 1K 299

tensorflow-model-optimization

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.

87K 2K 348

aimet-torch

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

36K 3K 457

cozempic

Context cleaning for Claude Code — prune bloated sessions, protect Agent Teams from context loss, auto-guard with tiered pruning

32K 345 31

neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

31K 3K 313

torch-pruning

[CVPR 2023] DepGraph: Towards Any Structural Pruning; LLMs, Vision Foundation Models, etc.

29K 3K 383

aimet-onnx

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

28K 3K 457

tf-model-optimization-nightly

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.

11K 2K 348

sparsezoo

Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes

5K 389 28

deepsparse

Sparsity-aware deep learning inference runtime for CPUs

4K 3K 191

sparseml

Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models

3K 2K 155

retinal-thin-vessels

A Python package for computing the recall and precision scores specifically on thin vessels in retinal images and generating weight masks for BCE Loss to enhance models perfomance on segmenting these fine structures, as detailed in the paper "Vessel-Width-Based Metrics and Weight Masks for Retinal Blood Vessel Segmentation".

2K 4 1

neural-compressor-pt

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

1K 3K 313

sconce

E2E AutoML Model Compression Package

1K 45 4

optipfair

Structured pruning, knowledge distillation, and fairness analysis for Large Language Models.

1K 41 10

mmrazor

OpenMMLab Model Compression Toolbox and Benchmark.

1K 2K 245

deepsparse-ent

Sparsity-aware deep learning inference runtime for CPUs

1K 3K 191

thekaveh-nnx

Lightweight PyTorch toolkit for training, fine-tuning, and exporting modern neural nets. FFN + GNN + decoder-only LM + diffusion + JEPA + MoE; PEFT (LoRA/DoRA/IA3/Prefix/Prompt), PTQ/QAT, pruning, surgery; ONNX/GGUF/Ollama/HuggingFace Hub interop. Dataclass-configured runs with fluent builders, automatic checkpointing, and Plotly viz.

1K 2 0

neural-compressor-tf

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

1K 3K 313