Gpt Oss Python Packages

sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

461.1M 30K 7K

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

5.8M 85K 19K

unsloth

Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.

2.5M 68K 6K

tokenspeed-mla

TokenSpeed is a speed-of-light LLM inference engine.

1.5M 2K 185

unsloth-zoo

Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.

1.3M 68K 6K

sglang-kernel

SGLang is a high-performance serving framework for large language models and multimodal models.

468K 30K 7K

sgl-kernel

SGLang is a high-performance serving framework for large language models and multimodal models.

280K 30K 7K

vllm-tpu

A high-throughput and memory-efficient inference and serving engine for LLMs

58K 85K 19K

mcore-bridge

MCore-Bridge: Providing Megatron-Core model definitions for state-of-the-art large models and making Megatron training as simple as Transformers — with support for 300+ large language models (Qwen3-Next, GLM-5.1, Deepseek-V4, MiniMax-2.7, ...) and 200+ multimodal large models (Qwen3.5, Qwen3-Omni, Gemma4, ...).

32K 82 22

nemo-automodel

🚀 Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support

25K 677 200

tokenspeed-smg

TokenSpeed is a speed-of-light LLM inference engine.

23K 2K 185

vllm-cpu-nightly

A high-throughput and memory-efficient inference and serving engine for LLMs

16K 85K 19K

leann

[MLsys2026]: RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.

6K 13K 1K

sglang-kt

SGLang is a high-performance serving framework for large language models and multimodal models.

3K 30K 7K

oumi

Easily fine-tune, evaluate and deploy Gemma 4, Qwen3.5, Qwen3.6, gpt-oss, DeepSeek-R1, or any open source LLM / VLM!

3K 9K 783

xtuner

A Next-Generation Training Engine Built for Ultra-Large MoE Models

2K 5K 427

tokenspeed-kernel-amd

TokenSpeed AMD-specific high-performance kernels.

1K 2K 185

lsglang

SGLang is a fast serving framework for large language models and vision language models.

1K 30K 7K

unsloth-studio

Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.

943 68K 6K

indigo-print

Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.

808 68K 6K

agentsculptor

AgentSculptor: Refactor, restructure & modernize codebases with natural language — powered by GPT-OSS and vLLM.

704 11 0

ai-dynamo-vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

606 85K 19K

vllm-acc

A high-throughput and memory-efficient inference and serving engine for LLMs

575 85K 19K

vllm-xft

A high-throughput and memory-efficient inference and serving engine for LLMs

506 85K 19K