thinking-tokens
Adaptive inference budget controller - caps how much your LLM thinks per query, tracks GPU cost in real-time.