Blackwell Python Packages

sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

461.1M 30K 7K

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

5.8M 85K 19K

nvidia-cudnn-frontend

cuDNN Frontend is NVIDIA's modern, open-source entry point to the cuDNN library and a growing collection of high-performance open-source kernels.

3.4M 862 197

tokenspeed-mla

TokenSpeed is a speed-of-light LLM inference engine.

1.5M 2K 184

sglang-kernel

SGLang is a high-performance serving framework for large language models and multimodal models.

468K 30K 7K

sgl-kernel

SGLang is a high-performance serving framework for large language models and multimodal models.

280K 30K 7K

vllm-tpu

A high-throughput and memory-efficient inference and serving engine for LLMs

58K 85K 19K

tokenspeed-smg

TokenSpeed is a speed-of-light LLM inference engine.

23K 2K 184

vllm-cpu-nightly

A high-throughput and memory-efficient inference and serving engine for LLMs

16K 85K 19K

tensorrt-llm

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

14K 14K 3K

sglang-kt

SGLang is a high-performance serving framework for large language models and multimodal models.

3K 30K 7K

pygpukit

Minimal GPU runtime for Python - high-performance CUDA kernels, memory management, and LLM inference without heavy dependencies

2K 3 0

tokenspeed-kernel-amd

TokenSpeed AMD-specific high-performance kernels.

1K 2K 184

lsglang

SGLang is a fast serving framework for large language models and vision language models.

1K 30K 7K

taiwan-asr-toolkit

Production-grade Traditional Chinese / Taiwan Mandarin speech-to-text. Qwen3-ASR + MediaTek Breeze-ASR-25, hot-word injection, LLM polish, speaker diarization. RTF up to 1554x on RTX 5090, 56 TDD tests.

873 2 0

pyptx

A Python DSL to write Nvidia PTX for Hopper and Blackwell in JAX and PyTorch

726 315 27

ai-dynamo-vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

606 85K 19K

vllm-acc

A high-throughput and memory-efficient inference and serving engine for LLMs

575 85K 19K

vllm-xft

A high-throughput and memory-efficient inference and serving engine for LLMs

506 85K 19K

dblcsgen

SGLang is a high-performance serving framework for large language models and multimodal models.

487 30K 7K

vllm-musa

vLLM platform plugin for Moore Threads MUSA GPUs

426 85K 19K

vllm-consul

A high-throughput and memory-efficient inference and serving engine for LLMs

407 85K 19K

vllm-npu

A high-throughput and memory-efficient inference and serving engine for LLMs

401 85K 19K

nextai-vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

399 85K 19K