PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Awq Python Packages

Python packages with the GitHub topic awq. Sorted by relevance, with stars and monthly downloads.
intel
neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

32K 3K 305
lpalbou
model-quantizer

Effortlessly quantize, benchmark, and publish Hugging Face models with cross-platform support for CPU/GPU. Reduce model size by 75% while maintaining performance.

3K 2 0
intel
neural-compressor-pt

Repository of Intel® Neural Compressor

1K 3K 305
intel
neural-compressor-tf

Repository of Intel® Neural Compressor

1K 3K 305
intel
neural-compressor-full

Repository of Intel® Neural Compressor

454 3K 305
intel
neural-solution

Repository of Intel® Neural Compressor

416 3K 305
intel
neural-insights

Repository of Intel® Neural Compressor

349 3K 305
intel
lpot

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

273 3K 306
chris-colinsky
zorac

Interactive CLI chat client for vLLM inference servers with persistent sessions and automatic context management

251 1 0
stef41
quantbenchx

Quantization quality analyzer - benchmark GGUF/GPTQ/AWQ quantization accuracy.

204 1 0
ShipItAndPray
quantcrush

Compress Any LLM Up to 6x in One Command. Unified CLI for GGUF, GPTQ, and AWQ quantization.

148 4 0
intel
ilit

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

98 3K 306
intel
neural-compressor-3x-pt

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

17 3K 305
intel
neural-compressor-3x-ort

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

3 3K 305
intel
neural-compressor-3x-tf

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

3 3K 305
    • Data from PyPI, GitHub, ClickHouse, and BigQuery