PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Gptq Python Packages

Python packages with the GitHub topic gptq. Sorted by relevance, with stars and monthly downloads.
ModelCloud
gptqmodel

LLM model quantization (compression) toolkit with HW acceleration support for Nvidia, AMD, Intel GPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

35K 1K 184
intel
neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

32K 3K 305
lpalbou
model-quantizer

Effortlessly quantize, benchmark, and publish Hugging Face models with cross-platform support for CPU/GPU. Reduce model size by 75% while maintaining performance.

3K 2 0
intel
neural-compressor-pt

Repository of Intel® Neural Compressor

1K 3K 305
intel
neural-compressor-tf

Repository of Intel® Neural Compressor

1K 3K 305
matlok-ai
bampe-weights

An alternative approach to building foundational generative AI models with visualizations using Blender

932 10 0
bobazooba
xllm

🦖 X—LLM: Cutting Edge & Easy LLM Finetuning

583 408 21
intel
neural-compressor-full

Repository of Intel® Neural Compressor

454 3K 305
intel
neural-solution

Repository of Intel® Neural Compressor

416 3K 305
intel
neural-insights

Repository of Intel® Neural Compressor

349 3K 305
intel
lpot

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

273 3K 306
stef41
quantbenchx

Quantization quality analyzer - benchmark GGUF/GPTQ/AWQ quantization accuracy.

204 1 0
ShipItAndPray
quantcrush

Compress Any LLM Up to 6x in One Command. Unified CLI for GGUF, GPTQ, and AWQ quantization.

148 4 0
intel
ilit

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

98 3K 306
intel
neural-compressor-3x-pt

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

17 3K 305
intel
neural-compressor-3x-ort

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

3 3K 305
intel
neural-compressor-3x-tf

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

3 3K 305
    • Data from PyPI, GitHub, ClickHouse, and BigQuery