PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Moe Python Packages

Python packages with the GitHub topic moe. Sorted by relevance, with stars and monthly downloads.
sgl-project
sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

303.2M 28K 6K
vllm-project
vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

6.2M 81K 17K
flashinfer-ai
flashinfer-python

FlashInfer: Kernel Library for LLM Serving

5M 6K 977
flashinfer-ai
flashinfer-cubin

FlashInfer: Kernel Library for LLM Serving

3.5M 6K 977
NVIDIA
nvidia-cudnn-frontend

cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it

3.5M 723 153
sgl-project
sglang-kernel

SGLang is a high-performance serving framework for large language models and multimodal models.

330K 28K 6K
sgl-project
sgl-kernel

SGLang is a high-performance serving framework for large language models and multimodal models.

295K 28K 6K
hiyouga
llamafactory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

235K 71K 9K
vllm-project
vllm-tpu

A high-throughput and memory-efficient inference and serving engine for LLMs

170K 81K 17K
modelscope
ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-R1, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Gemma4, Llava, Phi4, ...) (AAAI 2025).

141K 14K 1K
NVIDIA
tensorrt-llm

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

16K 14K 2K
theoddden
terradev-cli

NUMA-aware GPU provisioning and orchestration for stateless MoE workloads of all sizes

4K 11 2
sgl-project
sglang-kt

SGLang is a high-performance serving framework for large language models and multimodal models.

4K 28K 6K
inclusionAI
awex

A high-performance RL training-inference weight synchronization framework, designed to enable second-level parameter updates from training to inference in RL workflows

4K 150 17
uccl-project
uccl

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

2K 1K 146
wuwangzhang1216
abliterix

Automated alignment adjustment for LLMs — direct steering, LoRA, and MoE expert-granular abliteration, optimized via multi-objective Optuna TPE.

2K 220 40
kyegomez
switch-transformers

Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity"

1K 141 18
szibis
mlx-flash

Run AI models too large for your Mac's memory — expert caching, speculative execution, and 15+ research techniques for MoE inference on Apple Silicon

1K 2 0
hiyouga
llmtuner

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

1K 71K 9K
SuperInstance
plato-edge

Edge-optimized Cocapn fleet packages for ARM64 devices with limited resources

840 2 0
hiyouga
lazyllm-llamafactory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

685 71K 9K
vllm-project
ai-dynamo-vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

620 81K 17K
sgl-project
dblcsgen

SGLang is a high-performance serving framework for large language models and multimodal models.

571 28K 6K
vllm-project
vllm-acc

A high-throughput and memory-efficient inference and serving engine for LLMs

536 81K 17K
    • Data from PyPI, GitHub, ClickHouse, and BigQuery