PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Vram Python Packages

Python packages with the GitHub topic vram. Sorted by relevance, with stars and monthly downloads.
Andyyyy64
whichllm

Find the local LLM that actually runs and performs best on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it instantly.

23K 897 34
back2matching
turboquant

First open-source TurboQuant KV cache compression for LLM inference. Drop-in for HuggingFace. pip install turboquant.

2K 36 7
FlyTOmeLight
llm-cal

LLM inference hardware calculator — architecture-aware, engine-version-aware, honest-labeled.

2K 1 0
angelnicolasc
gemma4-adaptive-router

Adaptive dual-tier serving for Gemma 4 on consumer 16GB GPUs. Complexity + real-time VRAM routing between vLLM E4B and llama.cpp 27B. Production stack with OpenWebUI, monitoring, and more.

634 1 1
HFerrahoglu
llm-neofetch-plus

LLM-Neofetch++ is an advanced system information tool designed specifically for local LLM (Large Language Model) usage. It provides detailed hardware detection with personalized recommendations for running AI models on your system.

198 1 0
back2matching
quantsim-bench

Which quantization should I use? One command benchmarks every quant level on YOUR GPU.

164 0 0
Hkshoonya
spectral-kv

Up to 28x KV cache compression for LLMs via spectral SVD projection

136 0 0
CastelDazur
gpu-memory-guard

CLI tool to check GPU VRAM before loading AI models. Prevent OOM crashes.

131 10 0
back2matching
kvcache-bench

Benchmark every KV cache compression method on your GPU. One command, real numbers. Supports Ollama + llama.cpp.

126 0 0
mnisperuza
hcgk-kernel

Hardware Control GateKeeper Kernels for AI inference within frameworks.

89 0 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery