Qwen Python Packages

sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

461.1M 30K 7K

transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

170.3M 162K 34K

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

5.8M 85K 19K

unsloth

Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.

2.5M 68K 6K

tokenspeed-mla

TokenSpeed is a speed-of-light LLM inference engine.

1.5M 2K 184

unsloth-zoo

Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.

1.3M 68K 6K

llamafactory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

863K 73K 9K

sglang-kernel

SGLang is a high-performance serving framework for large language models and multimodal models.

468K 30K 7K

sgl-kernel

SGLang is a high-performance serving framework for large language models and multimodal models.

280K 30K 7K

pytorch-pretrained-bert

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

87K 162K 34K

rapid-mlx

The fastest local AI engine for Apple Silicon. 4.2x faster than Ollama, 0.08s cached TTFT, 100% tool calling. 17 tool parsers, prompt cache, reasoning separation, cloud routing. Drop-in OpenAI replacement. Works with Claude Code, Cursor, Aider.

63K 3K 370

vllm-tpu

A high-throughput and memory-efficient inference and serving engine for LLMs

58K 85K 19K

hud-python

RL environments + evals for AI agents. Define once, train anything.

56K 271 61

pytorch-transformers-pvt-nightly

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

53K 162K 34K

mflux

MLX native implementations of state-of-the-art generative image models

39K 2K 156

pytorch-transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

32K 162K 34K

xinference

Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-prem, or your laptop — all through one unified, production-ready inference API.

28K 9K 844

tokenspeed-smg

TokenSpeed is a speed-of-light LLM inference engine.

23K 2K 184

vllm-cpu-nightly

A high-throughput and memory-efficient inference and serving engine for LLMs

16K 85K 19K

mlx-code

Git-Native Coding Agent for Mac

9K 51 12

squish-ai

⚡️ The fastest way to run local LLMs on Apple Silicon — sub-second model loads, beats Ollama on throughput, tail latency, and full-response time. OpenAI/Ollama-compatible. No cloud, no API keys.

9K 10 0

angelslim

Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.

9K 1K 155

agent6

A coding agent that jails model commands and uses editable state machines for long-running tasks

7K 1 0

coding-proxy

A High-Availability, Transparent, and Smart Multi-Vendor Proxy for Claude Code. Support Claude Plans, GitHub Copilot, Google Antigravity, ZAI/GLM, MiniMax, Qwen, Xiaomi, Kimi, Doubao...

6K 18 2