PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Inference Python Packages

Python packages with the GitHub topic inference. Sorted by relevance, with stars and monthly downloads.
sgl-project
sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

304.1M 28K 6K
h2non
filetype

Small, dependency-free, fast Python package to infer binary file types checking the magic numbers signature

36M 765 120
OpenNMT
ctranslate2

Fast inference engine for Transformer models

8.4M 4K 487
SYSTRAN
faster-whisper

Faster Whisper transcription with CTranslate2

7.1M 23K 2K
vllm-project
vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

6.3M 81K 17K
pytorch
torchao

PyTorch native quantization and sparsity for training and inference

3.7M 3K 505
google
mediapipe

Cross-platform, customizable ML solutions for live and streaming media.

3.1M 35K 6K
huggingface
optimum

šŸš€ Accelerate inference and training of šŸ¤— Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools

1.9M 3K 640
openvinotoolkit
openvino

OpenVINOā„¢ is an open source toolkit for optimizing and deploying AI inference

1.4M 10K 3K
deepspeedai
deepspeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

1.2M 42K 5K
roboflow
inference-gpu

Turn any computer or edge device into a command center for your computer vision projects.

977K 2K 265
roboflow
inference-cli

Turn any computer or edge device into a command center for your computer vision projects.

906K 2K 265
NVIDIA
onnx-graphsurgeon

NVIDIAĀ® TensorRTā„¢ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

742K 13K 2K
xaviviro
python-toon

šŸ TOON for Python (Token-Oriented Object Notation) Encoder/Decoder - Reduce LLM token costs by 30-60% with structured data.

395K 343 13
kvcache-ai
mooncake-transfer-engine

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

342K 5K 753
mozilla-ai
any-llm-sdk

Communicate with an LLM provider using a single interface

339K 2K 167
sgl-project
sglang-kernel

SGLang is a high-performance serving framework for large language models and multimodal models.

326K 28K 6K
nvidia
tensorrt-cu12-bindings

NVIDIAĀ® TensorRTā„¢ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

316K 13K 2K
awslabs
multi-model-server

Multi Model Server is a tool for serving neural net models for inference

316K 1K 229
awslabs
model-archiver

Multi Model Server is a tool for serving neural net models for inference

306K 1K 229
sgl-project
sgl-kernel

SGLang is a high-performance serving framework for large language models and multimodal models.

302K 28K 6K
aws
sagemaker-inference

Serve machine learning models within a 🐳 Docker container using 🧠 Amazon SageMaker.

261K 413 81
nvidia
tensorrt-cu12

NVIDIAĀ® TensorRTā„¢ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

256K 13K 2K
Tencent
pnnx

ncnn is a high-performance neural network inference framework optimized for the mobile platform

245K 23K 4K
    • Data from PyPI, GitHub, ClickHouse, and BigQuery