ttft
Zero-config local LLM optimization for Ollama, LM Studio, and Apple Silicon MLX. Reduces TTFT by 40%, wall time for local agents by 46%, and RAM usage by 3x.
Benchmark any OpenAI-compatible LLM endpoint. TTFT, inter-token latency, throughput, P50-P99 — in one command.
The only voice agent context manager with a TTFT feedback loop