Local Llm Python Packages

rapid-mlx

The fastest local AI engine for Apple Silicon. 4.2x faster than Ollama, 0.08s cached TTFT, 100% tool calling. 17 tool parsers, prompt cache, reasoning separation, cloud routing. Drop-in OpenAI replacement. Works with Claude Code, Cursor, Aider.

63K 3K 370

whichllm

Find the local LLM that actually runs and performs best on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it instantly.

53K 6K 293

lilbee

A local AI search engine: it runs and manages local AI models, searches your files and code, and crawls the web, all in one program. Cited answers, local-first, with an MCP server for your coding agent. TUI, CLI, REST API, and Python library. Works with Ollama and LM Studio.

28K 36 4

local-deep-research

~95% on SimpleQA (e.g. Qwen3.6-27B on a 3090). Supports all local and cloud LLMs (llama.cpp, Ollama, Google, ...). 10+ search engines - arXiv, PubMed, your private documents. Everything Local & Encrypted.

16K 9K 763

soup-cli

Soup turns the pain of LLM fine-tuning into a simple workflow. One config, one command, done.

14K 72 21

mcp-client-for-ollama

Harness the power of local LLMs with this TUI MCP Client for Ollama. Featuring all core MCP primitives (tools, prompts, resources), agent mode, multi-server, model switching, streaming responses, human-in-the-loop, thinking mode, model params config, system prompts, and saved preferences.

12K 766 108

ollmcp

11K 766 108

mlx-tune

Fine-tune LLMs on your Mac with Apple Silicon. SFT, DPO, GRPO, Vision, TTS, STT, Embedding, and OCR fine-tuning — natively on MLX. Unsloth-compatible API.

9K 1K 88

m3-memory

Local-first Memory Framework for AI Agents · 99.2% LongMemEval-S retrieval @ k=10 · Supports Claude · Gemini · Antigravity · OpenCode · OpenClaw · Hermes · MCP-native and plugins · Hybrid search (FTS5 + vector + MMR) · GDPR · FIPS 140-3 ready · 100% local (fully offline) or cloud capable

9K 14 2

squish-ai

⚡️ The fastest way to run local LLMs on Apple Silicon — sub-second model loads, beats Ollama on throughput, tail latency, and full-response time. OpenAI/Ollama-compatible. No cloud, no API keys.

9K 10 0

bazinga-indeed

BAZINGA - Distributed AI that belongs to everyone

8K 3 0

scribe-llm

Universal TUI & web agent that connects to any llama.cpp server and uses RAG + semantic memory to research, write, and remember across sessions.

8K 0 0

hive-agent

Local-first agent OS. Spawn persistent AI agents that collaborate, write code, and use tools autonomously. Multi-model (Claude, Codex, LM Studio). Config-driven with pre-built agent presets.

6K 6 0

mlx-mcp-server

Offload-first MCP server that routes Claude's eligible work (summarize, extract, refactor, review) to a free local MLX model — with self-correcting structural/executable gates and automatic local → bigger-local → Claude escalation to cut token cost.

5K 1 0

llcat

LLM infradebugging and diagnostic tool

5K 36 1

ayder-cli

Autonomous multi-agent software development harness, powered by local LLMs

4K 5 1

velune-cli

VELUNE CLI is an open-source AI engineering CLI that unifies local LLMs (Ollama), cloud AI providers, MCP servers, tools, memory, and project context into a single developer workflow. Build, code, automate, and orchestrate AI with one extensible, provider-agnostic command-line interface.

4K 9 0

cafitac-hermit-agent

Quiet MCP executor for Claude Code and Codex. Offload edits, tests, refactors, and commits to cheaper local or flat-rate models.

4K 3 1

kritrim-smriti

A local-first, hybrid AI study assistant that refactors technical documentation into VARK learning tracks and automated Anki flashcards.

3K 2 1

synto

More than just Karpathy’s LLM Wiki, 100% local with Ollama. Drop Markdown notes → AI extracts concepts → your Obsidian wiki auto-links and grows. Zero sharing. Your notes stay yours.

3K 192 15

switchboard-local

Privacy-aware, local-first router for your CLI coding agents (Codex, Claude Code) and local LLMs (Ollama) — keeps sensitive prompts on-device and cuts premium-model usage.

3K 3 0

consolidation-memory

Local-first persistent memory for AI agents — store, recall, and consolidate knowledge across sessions using FAISS, SQLite, and any LLM. MCP server for coding agents and any MCP client.

3K 6 2

llm-autotune

Zero-config local LLM optimization for Ollama, LM Studio, and Apple Silicon MLX. Reduces TTFT by 40%, wall time for local agents by 46%, and RAM usage by 3x.

3K 31 3

ollama-mcp-bridge

Extend the Ollama API with dynamic AI tool integration from multiple MCP (Model Context Protocol) servers. Fully compatible, transparent, and developer-friendly, ideal for building powerful local LLM applications, AI agents, and custom chatbots

3K 95 34