Apple Silicon Python Packages

mlx-vlm

MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.

883K 5K 663

mlx-audio

A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.

300K 7K 656

rapid-mlx

The fastest local AI engine for Apple Silicon. 4.2x faster than Ollama, 0.08s cached TTFT, 100% tool calling. 17 tool parsers, prompt cache, reasoning separation, cloud routing. Drop-in OpenAI replacement. Works with Claude Code, Cursor, Aider.

63K 3K 370

whichllm

Find the local LLM that actually runs and performs best on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it instantly.

53K 6K 293

mflux

MLX native implementations of state-of-the-art generative image models

39K 2K 156

jax-mps

A JAX backend for Apple Metal Performance Shaders (MPS), enabling GPU-accelerated JAX computations on Apple Silicon.

17K 180 19

mlx-openai-server

A high-performance API server that provides OpenAI-compatible endpoints for MLX models. Developed using Python and powered by the FastAPI framework, it provides an efficient, scalable, and user-friendly solution for running MLX-based vision and language models locally with an OpenAI-compatible interface.

16K 349 66

mlx-tune

Fine-tune LLMs on your Mac with Apple Silicon. SFT, DPO, GRPO, Vision, TTS, STT, Embedding, and OCR fine-tuning — natively on MLX. Unsloth-compatible API.

9K 1K 88

squish-ai

⚡️ The fastest way to run local LLMs on Apple Silicon — sub-second model loads, beats Ollama on throughput, tail latency, and full-response time. OpenAI/Ollama-compatible. No cloud, no API keys.

9K 10 0

pyroboframes

Fast ML dataloader for robot learning. LeRobot datasets, hardware video decode, multi-output formats (NumPy/MLX/PyTorch/JAX).

7K 0 0

actop

Apple Silicon (M1–M4) power, GPU, ANE & memory-bandwidth monitor — sudoless TUI + Python API for profiling local LLM / MLX / CoreML inference

7K 0 0

mlx-model-doctor

Validate an MLX / Hugging Face model repository before you load it.

6K 2 1

mtplx

2.24x decode TPS increase On Qwen 3.6 27B @ temp 0.6 | Native MTP Speculative Decoding On Apple Silicon With No External Drafter.

6K 911 60

jang

JANG — GGUF for MLX. YOU MUST USE JANG_Q RUNTIME. Adaptive Mixed-Precision Quantization + Runtime for Apple Silicon

6K 201 24

asiai

Multi-engine LLM benchmark & monitoring CLI for Apple Silicon

5K 10 1

mlx-taef

Tiny AutoEncoders (TAESD family) for diffusion latents on Apple Silicon — pure MLX. Live previews + low-memory decode for FLUX.1, FLUX.2 Klein, SD1.x, SDXL.

5K 9 2

asitop

Perf monitoring CLI tool for Apple Silicon

5K 5K 213

mlx-mcp-server

Offload-first MCP server that routes Claude's eligible work (summarize, extract, refactor, review) to a free local MLX model — with self-correcting structural/executable gates and automatic local → bigger-local → Claude escalation to cut token cost.

5K 1 0

mlx-memo

Persistent semantic memory for AI agents — 100% local on Apple Silicon (MLX) or Linux/Ubuntu (CPU). Markdown source of truth, sqlite-vec + BM25 hybrid search, a codegraph-backed knowledge graph, MCP server + CLI. No cloud, no keys.

4K 2 2

mlx-teacache

TeaCache step-skipping for FLUX, Qwen-Image & Z-Image diffusion on Apple Silicon, in pure MLX

4K 3 0

libstreamvbyte

A C++ implementation of StreamVByte, with Python bindings.

4K 10 1

turboquant-mlx-full

Extreme weight + KV cache compression for LLMs on Apple Silicon (MLX implementation of Google's TurboQuant)

4K 56 11

mlx-speech

Pure-MLX speech synthesis, voice cloning, dialogue, sound-effects, and ASR for Apple Silicon: Fish S2 Pro, VibeVoice, LongCat, MOSS, Step-Audio, Cohere ASR.

3K 29 5

mlx-chronos

Community-driven benchmark suite for MLX inference engines on Apple Silicon

3K 17 3