Sglang Python Packages

smg-grpc-proto

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across vLLM, TRT-LLM, TokenSpeed, SGLang, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.

1.2M 376 111

smg-grpc-servicer

887K 376 111

mooncake-transfer-engine

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

176K 6K 926

mooncake-transfer-engine-cuda13

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

79K 6K 926

auto-round

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

76K 2K 149

ai-dynamo-runtime

A Datacenter Scale Distributed Inference Serving Framework

50K 7K 1K

ai-dynamo

A Datacenter Scale Distributed Inference Serving Framework

38K 7K 1K

gptqmodel

LLM model quantization (compression) toolkit with HW acceleration support for Nvidia, AMD, Intel GPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

34K 1K 191

auto-round-lib

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

23K 2K 149

tokenspeed-smg-grpc-proto

22K 376 111

tokenspeed-smg-grpc-servicer

22K 376 111

terradev-cli

An imperative command-line-interface for AI workload orchestration

20K 21 3

auto-round-nightly

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

19K 2K 149

tokenspeed-mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

12K 6K 926

dgxarley

Ansible playbooks for a 4-node K3s cluster with NVIDIA DGX Spark nodes for distributed LLM inference

5K 6 4

strands-sglang

SGLang model provider of Strands Agents for on-policy agentic RL training.

5K 72 13

smg

4K 376 111

mooncake-transfer-engine-non-cuda

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

4K 6K 926

processfork

git for AI agents — snapshot, fork, and merge live LLM sessions in 8 ms. Drop-in for Claude Code, LangGraph, vLLM, SGLang.

2K 2 0

thaw-vllm

The fork primitive for LLM inference. Snapshot a running session — weights + KV cache + scheduler state — and hydrate it into N divergent children that skip prefill. For RL rollouts, parallel coding agents, agent branching. Supports vLLM and SGLang.

2K 6 1

auto-round-hpu

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

2K 2K 149

mooncake-transfer-engine-npu

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

1K 6K 926

kvbm

A Datacenter Scale Distributed Inference Serving Framework

1K 7K 1K

kvcached

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

1K 1K 122