Inference Python Packages

sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

461.1M 30K 7K

filetype

Small, dependency-free, fast Python package to infer binary file types checking the magic numbers signature

43.7M 768 120

ctranslate2

Fast inference engine for Transformer models

10.8M 5K 496

faster-whisper

Faster Whisper transcription with CTranslate2

9M 24K 2K

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

5.8M 86K 19K

torchao

PyTorch native quantization and sparsity for training and inference

3.3M 3K 553

mediapipe

Cross-platform, customizable ML solutions for live and streaming media.

2.9M 36K 6K

optimum

🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools

2M 3K 658

openvino

OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

1.5M 10K 3K

deepspeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

1.4M 43K 5K

onnx-graphsurgeon

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

767K 13K 2K

owlrl

A simple implementation of the OWL2 RL Profile on top of RDFLib: it expands the graph with all possible triples that OWL RL defines. It can be used together with RDFLib to expand an RDFLib Graph object, or as a stand alone service with its own serialization.

710K 173 30

tensorrt-cu12-bindings

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

531K 13K 2K

inference-gpu

Turn any computer or edge device into a command center for your computer vision projects.

468K 2K 285

sglang-kernel

SGLang is a high-performance serving framework for large language models and multimodal models.

468K 30K 7K

python-toon

🐍 TOON for Python (Token-Oriented Object Notation) Encoder/Decoder - Reduce LLM token costs by 30-60% with structured data.

455K 343 13

multi-model-server

Multi Model Server is a tool for serving neural net models for inference

419K 1K 229

model-archiver

Multi Model Server is a tool for serving neural net models for inference

405K 1K 229

inference-cli

Turn any computer or edge device into a command center for your computer vision projects.

387K 2K 285

tensorrt-cu12

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

368K 13K 2K

tensorrt-cu12-libs

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

349K 13K 2K

sagemaker-inference

Serve machine learning models within a 🐳 Docker container using 🧠 Amazon SageMaker.

302K 413 81

text-generation

Large Language Model Text Generation Inference

299K 11K 1K

inference-sdk

Turn any computer or edge device into a command center for your computer vision projects.

292K 2K 285