PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Gguf Python Packages

Python packages with the GitHub topic gguf. Sorted by relevance, with stars and monthly downloads.
intel
auto-round

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

79K 1K 135
Andyyyy64
whichllm

Find the local LLM that actually runs and performs best on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it instantly.

23K 897 34
MakazhanAlpamys
soup-cli

Soup turns the pain of LLM fine-tuning into a simple workflow. One config, one command, done.

23K 60 11
AlexsJones
llmfit

Hundreds of models & providers. One command to find what runs on your hardware.

22K 26K 2K
intel
auto-round-nightly

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

18K 1K 135
alvarobartt
hf-mem

A CLI to estimate inference memory requirements for Hugging Face models, written in Python.

17K 916 84
calcuis
gguf-connector

gguf (GPT-Generated Unified Format) connector

15K 56 11
jjang-ai
jang

JANG — GGUF for MLX. YOU MUST USE JANG_Q RUNTIME. Adaptive Mixed-Precision Quantization + Runtime for Apple Silicon

13K 160 22
intel
auto-round-lib

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

12K 1K 135
OEvortex
webscout

Webscout is the all-in-one search and AI toolkit you need. Discover insights with Yep.com, DuckDuckGo, and Phind; access cutting-edge AI models; transcribe YouTube videos; generate temporary emails and phone numbers; perform text-to-speech conversions; and much more!

9K 345 65
quantumaikr
quantcpp

LLM inference with 7x longer context. Pure C, zero dependencies. Lossless KV cache compression + single-header library.

9K 387 42
rohitgarg19
opencode-llmstack

Cursor-Auto / Claude-tier-style serving for local GGUF models on Mac (M4 Max, 64 GB). FastAPI router fronts llama-swap + llama.cpp, classifying each request into a coder, planner, or uncensored-planner tier. OpenAI-compatible API, opencode integration, per-project subshell, one `llmstack` console-script.

6K 0 0
edwko
outetts

Interface for OuteTTS models.

6K 1K 116
FarisZahrani
llama-cpp-py-sync

Auto-synced CFFI ABI python bindings for llama.cpp with prebuilt wheels (CPU/CUDA/Vulkan/Metal).

6K 4 1
calcuis
llama-core

solo connector core built on llama.cpp

2K 1 1
LoSealL
onnxifier

Convert ANY IR to ONNX format

2K 28 4
thilomichael
llama-buddy

CLI wrapper for llama.cpp providing an ollama-like experience

2K 8 0
calcuis
gguf-core

a simple way to interact llama with gguf

2K 5 1
calcuis
gguf-node

gguf node for comfyui

2K 224 15
TigreGotico
ovos-gguf-embeddings-plugin

A gguf embeddings plugin for OVOS

2K 0 0
LoSealL
openvino2onnx

Convert ANY IR to ONNX format

1K 28 4
calcuis
cgg

cgg is a short form of call gguf model/file; cgg is a cmd-based app built on gguf-connector, which allows users interacting with large language model (i.e., chatgpt) via a simple command without coding a long long syntax

1K 7 1
intel
auto-round-hpu

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

1K 1K 135
calcuis
ctransformer-core

solo connector core built on ctransformers

1K 0 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery