PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Local Llm Python Packages

Python packages with the GitHub topic local-llm. Sorted by relevance, with stars and monthly downloads.
raullenchai
rapid-mlx

The fastest local AI engine for Apple Silicon. 4.2x faster than Ollama, 0.08s cached TTFT, 100% tool calling. 17 tool parsers, prompt cache, reasoning separation, cloud routing. Drop-in OpenAI replacement. Works with Claude Code, Cursor, Aider.

48K 2K 287
tobocop2
lilbee

Terminal-first local search and AI chat over your documents, code, and crawled websites. Semantic + hybrid search, vision OCR, auto-built wiki, browsable GGUF model catalog. Works as CLI, TUI, MCP server, REST API, or Python library. Offline by default, no sidecar services.

37K 18 3
cafitac
cafitac-hermit-agent

Quiet MCP executor for Claude Code and Codex. Offload edits, tests, refactors, and commits to cheaper local or flat-rate models.

28K 3 1
LearningCircuit
local-deep-research

~95% on SimpleQA (e.g. Qwen3.6-27B on a 3090). Supports all local and cloud LLMs (llama.cpp, Ollama, Google, ...). 10+ search engines - arXiv, PubMed, your private documents. Everything Local & Encrypted.

24K 8K 670
Andyyyy64
whichllm

Find the local LLM that actually runs and performs best on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it instantly.

23K 897 34
MakazhanAlpamys
soup-cli

Soup turns the pain of LLM fine-tuning into a simple workflow. One config, one command, done.

23K 60 11
skynetcmd
m3-memory

Local-first Agentic Memory Layer Framework for MCP Agents and Multiple Computers • Over 60 tools • Hybrid search (FTS5 + vector + MMR) • GDPR • 100% local) • FIPS 140-3 ready

9K 11 2
ARahim3
mlx-tune

Fine-tune LLMs on your Mac with Apple Silicon. SFT, DPO, GRPO, Vision, TTS, STT, Embedding, and OCR fine-tuning — natively on MLX. Unsloth-compatible API.

9K 1K 80
tanavc1
llm-autotune

Zero-config local LLM optimization for Ollama, LM Studio, and Apple Silicon MLX. Reduces TTFT by 40%, wall time for local agents by 46%, and RAM usage by 3x.

8K 25 1
0x-auth
bazinga-indeed

BAZINGA - The first AI you actually own. Free, private, works offline. Multi-AI consensus through φ-coherence. TrD + TD = 1.

7K 2 0
jonigl
mcp-client-for-ollama

A text-based user interface (TUI) client for interacting with MCP servers using Ollama. Features include agent mode, multi-server, model switching, streaming responses, tool management, human-in-the-loop, thinking mode, model params config, MCP prompts, custom system prompt and saved preferences. Built for developers working with local LLMs.

7K 699 93
kytmanov
obsidian-llm-wiki

Karpathy’s LLM Wiki, 100% local with Ollama. Drop Markdown notes → AI extracts concepts → your Obsidian wiki auto-links and grows. Zero sharing. Your notes stay yours.

7K 614 102
rohitgarg19
opencode-llmstack

Cursor-Auto / Claude-tier-style serving for local GGUF models on Mac (M4 Max, 64 GB). FastAPI router fronts llama-swap + llama.cpp, classifying each request into a coder, planner, or uncensored-planner tier. OpenAI-compatible API, opencode integration, per-project subshell, one `llmstack` console-script.

6K 0 0
jonigl
ollmcp

A text-based user interface (TUI) client for interacting with MCP servers using Ollama. Features include agent mode, multi-server, model switching, streaming responses, tool management, human-in-the-loop, thinking mode, model params config, MCP prompts, custom system prompt and saved preferences. Built for developers working with local LLMs.

5K 699 93
tolitius
cupel

discover LLMs punching above their weight

5K 49 0
outsourc-e
benchloop-cli

Local-first CLI for benchmarking LLMs on real hardware — quality, speed, reliability, and a real multi-turn agent loop.

4K 18 3
day50-dev
llcat

Easily pipe anything into an LLM

4K 35 1
anzalks
loctran

local translator with OCR and AI assisted translation for images and PDFs

3K 0 0
rahulsiiitm
vidchain

✅A Lightweight Video RAG Framework for Multimodal Reasoning

3K 1 0
MRWillisT
pullnexus

Pull from the Nexus. Give back to the Nexus. Keep local AI smart.

3K 1 0
RNBBarrett
thought-mcp

Local MCP memory server for LLMs. Bi-temporal graph + vector + Cypher queries + auto-write/auto-recall hooks for Claude Code. Works with Ollama, LM Studio, Anthropic, OpenAI.

2K 0 0
HZYAI
ragscore

⚡️ The "1-Minute RAG Audit" — Generate QA datasets & evaluate RAG systems in Colab, Jupyter, or CLI. Privacy-first, async, visual reports.

2K 31 5
geeks-accelerator
ollama-herd

Local AI load balancer for Ollama fleets — auto-discovery, smart routing, OpenAI-compatible API, zero config. Perfect for Mac Minis & Studios.

2K 7 0
youngharold
tightwad

Mixed-vendor GPU inference cluster manager with speculative decoding

2K 22 2
    • Data from PyPI, GitHub, ClickHouse, and BigQuery