PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Inference Optimization Python Packages

Python packages with the GitHub topic inference-optimization. Sorted by relevance, with stars and monthly downloads.
tanavc1
llm-autotune

Zero-config local LLM optimization for Ollama, LM Studio, and Apple Silicon MLX. Reduces TTFT by 40%, wall time for local agents by 46%, and RAM usage by 3x.

9K 25 1
Alberto-Codes
turboquant-vllm

TurboQuant KV cache compression plugin for vLLM — asymmetric K/V, 8 models validated, consumer GPUs

4K 48 5
brontoguana
krasis

Krasis is a Hybrid LLM runtime which focuses on efficient running of larger models on consumer grade VRAM limited hardware

3K 452 26
BenevolentJoker-JohnL
sollol

Super Ollama Load Balancer - Performance-aware routing for distributed Ollama deployments with Ray, Dask, and adaptive metrics

2K 4 2
EfficientContext
contextpilot

Accelerating Long Context LLM Inference with Accuracy-Preserving Context Optimization in SGLang, vLLM, llama.cpp, OpenClaw, RAG, and Agentic AI.

1K 104 6
alibaba
torch-quant

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

451 925 168
ManuelSLemos
rabbitllm

Run 70B+ LLMs on a single 4GB GPU — no quantization required.

403 58 9
vnmoorthy
pavo-bench

A 50K-turn voice pipeline benchmark and an 85K-param meta-controller that cuts P95 latency 10.3% and energy 71% vs fixed cloud. TMLR 2026.

360 1 0
wild-edge
wildedge-sdk

Python SDK for WildEdge

305 14 1
ZFTurbo
kito

Optimize layers structure of Keras model to reduce computation time

263 157 18
saikoushiknalubola
thinkrouter

Cut LLM reasoning-token costs by 60% with one line of code

229 2 0
fabriziopfannl
llm-autobatch

Turn single LLM calls into fast micro-batches. Rust core, Python API.

189 5 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery