PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Blackwell Python Packages

Python packages with the GitHub topic blackwell. Sorted by relevance, with stars and monthly downloads.
sgl-project
sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

303.2M 28K 6K
vllm-project
vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

6.2M 81K 17K
NVIDIA
nvidia-cudnn-frontend

cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it

3.5M 723 153
sgl-project
sglang-kernel

SGLang is a high-performance serving framework for large language models and multimodal models.

330K 28K 6K
sgl-project
sgl-kernel

SGLang is a high-performance serving framework for large language models and multimodal models.

295K 28K 6K
lightseekorg
tokenspeed-mla

TokenSpeed is a speed-of-light LLM inference engine.

236K 1K 94
vllm-project
vllm-tpu

A high-throughput and memory-efficient inference and serving engine for LLMs

170K 81K 17K
NVIDIA
tensorrt-llm

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

16K 14K 2K
jpietek
penguin-burner

Nvidia ultimate undervolting companion on Linux. Now with a nice UI. Supports MSI Afterburner profile imports and LACT profile exports. Can automatically scan for the most optimal GPU VF curve and generate silent fan curves.

6K 42 0
lightseekorg
tokenspeed-smg

TokenSpeed is a speed-of-light LLM inference engine.

5K 1K 94
sgl-project
sglang-kt

SGLang is a high-performance serving framework for large language models and multimodal models.

4K 28K 6K
patrick-toulme
pyptx

A Python DSL to write Nvidia PTX for Hopper and Blackwell in JAX and PyTorch

4K 295 24
m96-chan
pygpukit

Minimal GPU runtime for Python - high-performance CUDA kernels, memory management, and LLM inference without heavy dependencies

3K 2 0
thc1006
taiwan-asr-toolkit

Production-grade Traditional Chinese / Taiwan Mandarin speech-to-text. Qwen3-ASR + MediaTek Breeze-ASR-25, hot-word injection, LLM polish, speaker diarization. RTF up to 1554x on RTX 5090, 56 TDD tests.

1K 2 0
vllm-project
ai-dynamo-vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

620 81K 17K
sgl-project
dblcsgen

SGLang is a high-performance serving framework for large language models and multimodal models.

571 28K 6K
vllm-project
vllm-acc

A high-throughput and memory-efficient inference and serving engine for LLMs

536 81K 17K
vllm-project
vllm-xft

A high-throughput and memory-efficient inference and serving engine for LLMs

510 81K 17K
vllm-project
wxy-test

A high-throughput and memory-efficient inference and serving engine for LLMs

407 2K 1K
vllm-project
nextai-vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

406 81K 17K
vllm-project
vllm-consul

A high-throughput and memory-efficient inference and serving engine for LLMs

384 81K 17K
vllm-project
vllm-musa

A high-throughput and memory-efficient inference and serving engine for LLMs

364 81K 17K
vllm-project
vllm-npu

A high-throughput and memory-efficient inference and serving engine for LLMs

354 81K 17K
vllm-project
vllm-hust

A high-throughput and memory-efficient inference and serving engine for LLMs

296 81K 17K
    • Data from PyPI, GitHub, ClickHouse, and BigQuery