PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Vllm Python Packages

Python packages with the GitHub topic vllm. Sorted by relevance, with stars and monthly downloads.
lightseekorg
smg-grpc-proto

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.

1.1M 271 78
lightseekorg
smg-grpc-servicer

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.

795K 271 78
kvcache-ai
mooncake-transfer-engine

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

344K 5K 753
kserve
kserve

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

113K 5K 1K
intel
auto-round

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

79K 1K 135
LMCache
lmcache

Supercharge Your LLM with the Fastest KV Cache Layer

64K 8K 1K
stackav-oss
conch-triton-kernels

A "standard library" of Triton kernels.

50K 26 3
xorbitsai
xinference

Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-prem, or your laptop — all through one unified, production-ready inference API.

41K 9K 824
ModelCloud
gptqmodel

LLM model quantization (compression) toolkit with HW acceleration support for Nvidia, AMD, Intel GPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

35K 1K 184
MekayelAnik
vllm-cpu

Wheels & Docker images for running vLLM on CPU-only systems, optimized for different CPU instruction sets

30K 6 0
rustakka
atomr-infer

Multi-runtime GPU + remote inference as a supervised actor system on atomr. OpenAI / Anthropic / Gemini / LiteLLM remote runtimes + vLLM / TensorRT / ORT / mistral.rs local; remote-only build compiles zero GPU deps.

22K 0 0
intel
auto-round-nightly

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

18K 1K 135
kvcache-ai
mooncake-transfer-engine-cuda13

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

15K 5K 753
intel
auto-round-lib

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

12K 1K 135
containers
ramalama

RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.

11K 3K 338
manav8498
processfork

git for AI agents — snapshot, fork, and merge live LLM sessions in 8 ms. Drop-in for Claude Code, LangGraph, vLLM, SGLang.

10K 1 0
santhoshkammari
kivi-ai

🥝 Kivi — Unified AI Chat Interface. Provider-agnostic streaming chat with tools, sessions & auto-compaction. Supports OpenAI, vLLM, Copilot SDK, Claude SDK.

10K 0 0
ModelEngine-Group
uc-manager

Persist and reuse KV Cache to speedup your LLM.

8K 277 74
vllm-project
vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

7K 2K 1K
district-solutions
oats-coder

Enables small-to-large self-hosted ai models to use local source code when running tool-calling agentic workloads. We actively data mine 20,900+ (2+ TB) popular github repos using large and small ai models to create reuseable: json, markdown and parquet files for local-first tool-calling models.

5K 2 1
Gavin-Qiao
vserve

CLI for managing vLLM inference on GPU workstations — download, tune, serve, fan control

5K 0 0
guoqingbao
vllm-rs

Minimalist vLLM implementation in Rust

5K 203 27
lightseekorg
tokenspeed-smg-grpc-proto

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.

5K 1K 94
lightseekorg
tokenspeed-smg-grpc-servicer

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.

5K 1K 94
    • Data from PyPI, GitHub, ClickHouse, and BigQuery