PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Kv Cache Python Packages

Python packages with the GitHub topic kv-cache. Sorted by relevance, with stars and monthly downloads.
LMCache
lmcache

Supercharge Your LLM with the Fastest KV Cache Layer

64K 8K 1K
manav8498
processfork

git for AI agents — snapshot, fork, and merge live LLM sessions in 8 ms. Drop-in for Claude Code, LangGraph, vLLM, SGLang.

10K 1 0
quantumaikr
quantcpp

LLM inference with 7x longer context. Pure C, zero dependencies. Lossless KV cache compression + single-header library.

9K 387 42
tanavc1
llm-autotune

Zero-config local LLM optimization for Ollama, LM Studio, and Apple Silicon MLX. Reduces TTFT by 40%, wall time for local agents by 46%, and RAM usage by 3x.

8K 25 1
Omodaka9375
fade-kv

Frequency-Adaptive Decay Encoding: Attention-aware tiered KV cache compression for LLM inference.

4K 0 0
Alberto-Codes
turboquant-vllm

TurboQuant KV cache compression plugin for vLLM — asymmetric K/V, 8 models validated, consumer GPUs

4K 48 5
manjunathshiva
turboquant-mlx-full

Extreme weight + KV cache compression for LLMs on Apple Silicon (MLX implementation of Google's TurboQuant)

3K 23 3
back2matching
turboquant

First open-source TurboQuant KV cache compression for LLM inference. Drop-in for HuggingFace. pip install turboquant.

2K 36 7
VectorArc
avp

Python SDK for Agent Vector Protocol – transfer KV-cache between LLM agents instead of text

2K 19 1
AlphaWaveSystems
tqai

TurboQuant KV cache compression for local LLM inference

2K 1 0
pythongiant
kvboost

Make LLM inference faster with chunk-level KV cache reuse

1K 7 0
kddubey
cappr

Completion After Prompt Probability. Make your LLM make a choice

1K 82 3
basnetlachu
memopt-engine

MEMOPT: Universal Memory Fabric for AI Infrastructure. Open-sourced by Sophisticates.

1K 4 1
FluffyAIcode
kakeyalattice

Discrete Kakeya cover for LLM KV cache: D4/E8 nested-lattice quantisation realising a Kakeya-style tube-cover over the direction sphere. 2.4x-2.8x compression at <1% perplexity loss on Qwen3, Llama-3, DeepSeek, GLM-4, Gemma. Drop-in transformers.DynamicCache. pip install kakeyalattice.

458 8 2
LMCache
lmcache-cli

Supercharge Your LLM with the Fastest KV Cache Layer

456 8K 1K
Argonaut790
fused-turboquant

Fused Triton kernels for TurboQuant KV cache compression — 2-4 bit quantization with RHT rotation. Drop-in HuggingFace & vLLM integration. Up to 4.9x KV cache compression for Llama, Qwen, Mistral, and more.

441 8 1
danhicks96
prismkv

3-D stacked-plane KV cache quantization + RAG framework. Defensive prior-art publication extending TurboQuant to conditional 3-D polar cells with adaptive bit allocation.

379 1 1
l3tchupkt
adaptq

High-performance CPU KV-cache quantization engine for LLM inference (~10× speedup, 4× memory reduction) with Python & PyTorch support.

355 1 0
vivekvar-dl
turbokv

First open-source implementation of TurboQuant (arXiv 2504.19874) — 4-7x LLM KV cache compression. pip install turbokv

341 0 0
adwantg
kvfleet

KV-cache-aware intelligent routing for self-hosted and hybrid LLM fleets. Route requests using model quality, latency, cost, policy, and live GPU state.

319 0 0
Keyvanhardani
kvat

Automatic KV-Cache optimization for HuggingFace Transformers. Find the optimal cache strategy, attention backend, and dtype for your LLM inference workload.

230 1 0
wjddusrb03
langchain-turboquant

LangChain VectorStore with TurboQuant compression (ICLR 2026) - 6x memory reduction, training-free, no GPU required. The first LangChain integration for Google Research's TurboQuant algorithm.

191 1 2
jagmarques
nexusquant-kv

Training-free KV cache compression via E8 lattice quantization and attention-aware token eviction

182 13 0
Mmorgan-ML
phase-slip-sampler

An entropic perturbation sampler for LLMs using orthonormal vector rotation and logit anchoring.

177 6 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery