PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Token Optimization Python Packages

Python packages with the GitHub topic token-optimization. Sorted by relevance, with stars and monthly downloads.
chopratejas
headroom-ai

Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.

176K 2K 160
tokenpak
tokenpak

Drop-in HTTP proxy that compresses LLM context, optimizes cache hits, routes smart, and tracks every dollar. Zero SDK changes required.

15K 1 1
juyterman1000
entroly

Open-source context engine that catches AI hallucinations and cuts your token bill 70–95%. The only AI helper that shows its work. Claude · Cursor · Codex,GPT & Custom Providers

13K 381 62
juyterman1000
entroly-core

Open-source context engine that catches AI hallucinations and cuts your token bill 70–95%. The only AI helper that shows its work. Claude · Cursor · Codex,GPT & Custom Providers

13K 381 62
0-co
agent-friend

Find the MCP schema issues eating your context window. 156 checks. Grade A+ through F. ESLint for MCP.

13K 4 0
ogallotti
rtk-hermes

RTK plugin for Hermes — rewrites shell commands for 60-90% LLM token savings

9K 36K 2K
retospect
precis-mcp

MCP server giving LLM agents a seven-verb API over papers, documents, code, state, patents, and cached web/Wolfram/YouTube tool calls

5K 1 1
juanlumanmx29
anvl-monitor

Session monitor for Claude Code — detects inflated sessions, saves quota with smart rotation and handoffs. Growth-aware health tracking with dual-signal waste detection.

4K 0 0
tokmacher
tok-protocol

tok is an invisible bridge

3K 0 0
mohankrishnaalavala
context-router-cli

Local-first CLI and MCP server for AI coding context — minimum useful context, maximum agent performance

3K 9 3
elevanaltd
octave-mcp

OCTAVE protocol - structured AI communication with 3-20x token reduction. MCP server with lenient-to-canonical pipeline and schema validation.

2K 51 4
SonAIengine
graph-tool-call

Graph-based tool retrieval for LLM agents — 248 tools → 82% accuracy, 79% fewer tokens. Zero dependencies. OpenAPI / MCP / LangChain.

2K 6 1
ojuschugh1
sqz

Compress LLM context to save tokens and reduce costs

2K 269 11
z3knayr0
tailspin-ai

Token optimization and compression for Claude API requests

1K 1K 123
bhanuprasadthota
agentatlas

Shared browser interaction schema registry for AI agents. 80-100% token reduction.

1K 0 0
smartass-4ever
mnemon-ai

Cut LLM agent token costs by 93%. Execution cache for LangChain, CrewAI, AutoGen — 2.66ms vs 20 seconds, zero tokens on repeat runs.

1K 3 1
sayeem3051
ctxeng

Build perfect LLM context from your Python codebase — automatically

952 2 0
jakeefr
prism-cc

Session intelligence for Claude Code - find extra token usage, why your sessions fail, and how to fix it.

899 19 1
JacobHuang91
llm-prompt-refiner

🚀 Lightweight Python library for building production LLM applications with smart context management and automatic token optimization. Save 10-20% on API costs while fitting RAG docs, chat history, and prompts into your token budget.

778 37 3
h2cker
vecr-compress

Deterministic, auditable LLM context compression — regex whitelist guarantees structured facts (IDs, URLs, dates, code) survive. Two layers: retention + heuristic knapsack.

772 0 0
entroplain
entroplain

Entropy-based early exit for efficient agent reasoning. Stop burning tokens. Know when your agent has finished thinking.

715 2 0
h2cker
llama-index-postprocessor-vecr

Deterministic, auditable LLM context compression — regex whitelist guarantees structured facts (IDs, URLs, dates, code) survive. Two layers: retention + heuristic knapsack.

556 0 0
Harsh-Daga
lattice-transport

A complete transport layer for LLM traffic

536 0 0
castnettech
mnemosyne-engine

State aware knowledge compression, ingestion, and hybrid retrieval engine. Zero dependencies. Sub-100ms queries.

504 58 9
    • Data from PyPI, GitHub, ClickHouse, and BigQuery