Vram Python Packages

whichllm

Find the local LLM that actually runs and performs best on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it instantly.

53K 6K 293

turboquant

First open-source TurboQuant KV cache compression for LLM inference. Drop-in for HuggingFace. pip install turboquant.

2K 41 8

canifinetune

Estimate whether a Hugging Face model fits and fine-tunes on your local GPU.

990 788 106

llm-cal

LLM inference hardware calculator — architecture-aware (MLA/NSA/MoE), engine-aware (vLLM/SGLang), honest-labeled. Reads real safetensors bytes; supports 53 GPUs (NVIDIA / AMD / Huawei Ascend / 沐曦 / 昆仑芯 / 壁仞 / 寒武纪 / 海光).

619 1 0

superclean-cli

An agentic-dev garbage collector for Windows, macOS, and Linux: a tiered RAM, VRAM, and disk cleanup ladder that never kills your active editors, terminals, or AI tools.

446 0 1

gemma4-adaptive-router

Complexity + VRAM-aware routing for local dual-tier LLM deployments

307 1 1

llm-neofetch-plus

LLM-Neofetch++ is an advanced system information tool designed specifically for local LLM (Large Language Model) usage. It provides detailed hardware detection with personalized recommendations for running AI models on your system.

230 1 0

kvcache-bench

Benchmark every KV cache compression method on your GPU. One command, real numbers.

182 0 0

quantsim-bench

Which quantization should I use? One command benchmarks every quant level on YOUR GPU.

165 0 0

hcgk-kernel

Hardware Control Gatekeeper Kernel - Authorization system for heavy AI models

164 0 0

spectral-kv

Up to 28x KV cache compression for LLMs via spectral SVD projection. Practically lossless on modern architectures.

134 0 0

gpu-memory-guard

CLI tool to check GPU VRAM before loading AI models

125 10 0