Token Optimization Python Packages

headroom-ai

Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.

1.3M 57K 4K

jcodemunch-mcp

Cut AI token costs 95%+ on code exploration. The leading MCP server for precise, symbol-level GitHub code retrieval via tree-sitter AST. Works with Claude Code, Cursor & any MCP client. 313B+ tokens saved.

93K 2K 300

distil-llm

Compression with a quality contract — cache-aware, causally-pruned LLM context compression for agentic runtimes, certified non-inferior across 7 domains. Works with any SDK.

16K 1 0

entroly-core

Cut your Claude / OpenAI / Gemini bill 70–95% on AI coding. Local proxy that compresses context, keeps provider caches hot, and verifies LLM output ($0 hallucination guard). Drop-in for Cursor, Claude Code, Codex, Aider + 34 more and custom providers — 30s, no code changes

15K 419 66

gcf-proxy

MCP proxy: zero-code GCF adoption. Wraps any MCP server, converts JSON to GCF mid-flight. 53-71% fewer tokens. Works with any structured data.

13K 5 1

entroly

10K 419 66

gcf-python

GCF Python implementation. 100% LLM comprehension on every frontier model. 50-92% fewer tokens than JSON. 43B+ round-trips verified. Zero dependencies.

10K 4 0

agent-friend

MCP schema linter & quality grader — validate, audit, optimize, grade (A+ to F). Also: @tool exports to OpenAI, Claude, Gemini, MCP.

10K 4 0

llmtrim

Local proxy that compresses your LLM API requests so you pay less, with no change to the answers. Trims wasted tokens from prompts, history, tool output, and code before they're sent: -31% input / -74% output, measured live. Any provider, no extra model calls. Also an MCP server and embeddable library (Rust, Python, Ruby, Kotlin, Swift, JS/TS).

9K 140 7

token-goat

Token burn reducer and focus keeper for Claude Code, Codex, Gemini CLI, Cline, Windsurf, Aider, Cursor, Copilot, and more: surgical read hints, PDF/Office/CSV/markdown file interception, 160+ filter & interception rules, compact manifest injection, image shrinking, cache and compact skills, cache MCP calls, and much more.

9K 44 3

guardian-runtime

A zero-latency, local-first runtime firewall for LLMs. Intercept every prompt and response locally to stop data leaks and runaway token costs.

8K 20 1

simplicio-loop

🔁 Finishes your entire backlog while you sleep. The AI orchestrator that DOES the work end-to-end on ANY LLM — discover → implement → verify → merge → 24/7 — behind safety gates, at up to 90% fewer tokens. 48 extension points. Not a chatbot. A worker.

7K 11 2

llm-zip

Self-hosted HTTP sidecar for LLM context compression. Reduce token costs 3–5× before calling any AI API — powered by LLMLingua-2 and MarkItDown. No proxy, no API keys, no GPU required.

7K 0 0

tokenpak

Drop-in HTTP proxy that compresses LLM context, optimizes cache hits, routes smart, and tracks every dollar. Zero SDK changes required.

6K 1 1

rtk-hermes

RTK plugin for Hermes — rewrites shell commands for 60-90% LLM token savings

5K 236 14

cortyxia

Cortyxia SDK and CLI Guide

5K 0 0

graph-tool-call

Graph-based tool retrieval for LLM agents — 248 tools → 82% accuracy, 79% fewer tokens. Zero dependencies. OpenAPI / MCP / LangChain.

4K 7 2

octave-mcp

OCTAVE protocol - structured AI communication with 3-20x token reduction. MCP server with lenient-to-canonical pipeline and schema validation.

3K 52 4

leancontext

Trim the redundant tool output your AI agent re-sends every turn. Deterministic, type-aware token reduction with a fidelity score — never breaks the agent.

3K 0 0

ratel-ai

Context engineering for AI agents. ~80% fewer tokens. Fix tool overload. Skills and memory with in-process BM25 retrieval. No vector DB. No embeddings.

3K 168 9

precis-mcp

MCP server giving LLM agents a seven-verb API over papers, documents, code, state, patents, and cached web/Wolfram/YouTube tool calls

3K 1 0

ssk

SolidSKeleton is an easily typable and generatable, solid (CSG) open data format

3K 3 0

latent-gate

VL-JEPA inspired pipeline — compress images/text locally via Ollama, send compact payloads to any LLM API. Cut token costs by ~80%.

2K 22 0

agy-headless-bridge

Call the Google Antigravity CLI (agy) headlessly from any non-TTY context — Windows ConPTY + POSIX pty + MCP server. Fixes the empty-output bug (#76).

2K 9 2