Turboquant Python Packages

turbovec

A vector index built on TurboQuant, written in Rust with Python bindings

44K 13K 1K

quantcpp

LLM inference with 7x longer context. Pure C, zero dependencies. Lossless KV cache compression + single-header library.

6K 395 43

turboquant-mlx-full

Extreme weight + KV cache compression for LLMs on Apple Silicon (MLX implementation of Google's TurboQuant)

4K 56 11

turboquant-vllm

TurboQuant KV cache compression plugin for vLLM — asymmetric K/V, 8 models validated, consumer GPUs

2K 48 6

turboquant

First open-source TurboQuant KV cache compression for LLM inference. Drop-in for HuggingFace. pip install turboquant.

2K 41 8

turboquant-vectors

Compress embeddings 6x instantly with TurboQuant. First pip package using Google's TurboQuant (ICLR 2026) for vector search. 71.9% recall vs FAISS PQ 13.3%.

1K 1 1

tqai

TurboQuant KV cache compression for local LLM inference — 80% memory savings, near-zero quality loss on 8B+ models. PyTorch + MLX (Apple Silicon). Based on arXiv:2504.19874 (Google Research, ICLR 2026).

936 0 0

turborag-ahx47

Offline, CPU, RAM RAG engine with quantized vectors (TurboVec), GGUF embedding/LLM models, REST API, MCP server, and multi‑language SDKs.

636 1 0

fused-turboquant

Fused Triton kernels for TurboQuant KV cache compression — 2-4 bit quantization with RHT rotation. Drop-in HuggingFace & vLLM integration. Up to 4.9x KV cache compression for Llama, Qwen, Mistral, and more.

398 8 1

turboquant-space

TurboQuant (ICLR 2026) — SIMD-accelerated 4/8-bit quantization Space for ANN

384 6 0

turbokv

First open-source implementation of TurboQuant (arXiv 2504.19874) — 4-7x LLM KV cache compression. pip install turbokv

317 0 0

langchain-turboquant

LangChain VectorStore with TurboQuant compression (ICLR 2026) - 6x memory reduction, training-free, no GPU required. The first LangChain integration for Google Research's TurboQuant algorithm.

241 1 2

turboquant-hf

Near-optimal weight quantization for LLMs using the Google's TurboQuant algorithm

223 0 0

commitmind

CommitMind: Semantic search for Git commit history powered by TurboQuant vector compression (ICLR 2026). Search commits by meaning, not just keywords.

146 0 0

turboquant-impl

First open-source implementation of TurboQuant (arXiv 2504.19874) — 4-7x LLM KV cache compression. pip install turbokv

141 0 0