PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Sglang Python Packages

Python packages with the GitHub topic sglang. Sorted by relevance, with stars and monthly downloads.
lightseekorg
smg-grpc-proto

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.

1.1M 271 78
lightseekorg
smg-grpc-servicer

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.

795K 271 78
kvcache-ai
mooncake-transfer-engine

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

344K 5K 753
intel
auto-round

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

79K 1K 135
ModelCloud
gptqmodel

LLM model quantization (compression) toolkit with HW acceleration support for Nvidia, AMD, Intel GPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

35K 1K 184
intel
auto-round-nightly

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

18K 1K 135
kvcache-ai
mooncake-transfer-engine-cuda13

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

15K 5K 753
intel
auto-round-lib

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

12K 1K 135
manav8498
processfork

git for AI agents — snapshot, fork, and merge live LLM sessions in 8 ms. Drop-in for Claude Code, LangGraph, vLLM, SGLang.

10K 1 0
lightseekorg
tokenspeed-smg-grpc-proto

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.

5K 1K 94
lightseekorg
tokenspeed-smg-grpc-servicer

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.

5K 1K 94
theoddden
terradev-cli

NUMA-aware GPU provisioning and orchestration for stateless MoE workloads of all sizes

4K 11 2
vroomfondel
dgxarley

Integration testing, streaming utilities, and repetition detection for distributed LLM inference on DGX Spark clusters

4K 1 0
kvcache-ai
mooncake-transfer-engine-non-cuda

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

2K 5K 753
lightseekorg
smg

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.

2K 271 78
FlyTOmeLight
llm-cal

LLM inference hardware calculator — architecture-aware, engine-version-aware, honest-labeled.

2K 1 0
horizon-rl
strands-sglang

SGLang model provider for Strands Agents for on-policy agentic RL training.

2K 52 9
coconut-labs
kvwarden

Tenant-fair LLM inference orchestration on a single GPU. No Kubernetes.

2K 2 1
intel
auto-round-hpu

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

1K 1K 135
Touchdown-Labs
inferguard

Read-only disaggregated-serving diagnostics for vLLM, SGLang, Dynamo, and llm-d.

1K 3 2
ovg-project
kvcached

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

1K 1K 112
recursia-lab
anchor-vision

Python client for Anchor — PaliGemma2 multi-LoRA vision inference

900 0 0
fahmiaziz98
docvision

Production-ready document parsing with Vision Language Models

895 1 0
intel
auto-round-kernel

Auto Round Kernel binary package

813 1K 135
    • Data from PyPI, GitHub, ClickHouse, and BigQuery