PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Model Serving Python Packages

Python packages with the GitHub topic model-serving. Sorted by relevance, with stars and monthly downloads.
vllm-project
vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

6.3M 81K 17K
basetenlabs
truss

The simplest way to serve AI/ML models in production

717K 1K 103
basetenlabs
truss-transfer

The simplest way to serve AI/ML models in production

315K 1K 103
basetenlabs
baseten-performance-client

The simplest way to serve AI/ML models in production

219K 1K 103
bentoml
bentoml

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

188K 9K 965
vllm-project
vllm-tpu

A high-throughput and memory-efficient inference and serving engine for LLMs

176K 81K 17K
kserve
kserve

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

113K 5K 1K
mlrun
mlrun

MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.

40K 2K 306
mlrun
mlrun-pipelines-kfp-common

MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.

33K 2K 306
mlrun
mlrun-pipelines-kfp-v1-8

MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.

33K 2K 306
tensorchord
envd

🏕️ Reproducible development environment for humans and agents

28K 2K 167
vllm-project
vllm-omni

A framework for efficient model inference with omni-modality models

19K 5K 933
mosecorg
mosec

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

11K 900 72
openvinotoolkit
ovmsclient

A scalable inference server for models optimized with OpenVINO™

11K 875 253
clearml
clearml-serving

ClearML - Model-Serving Orchestration and Repository Solution

10K 164 50
predibase
lorax-client

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

8K 4K 312
vllm-project
vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

7K 2K 1K
NimbleBoxAI
nbox

The official python package for NimbleBox. Exposes all APIs as CLIs and contains modules to make ML 🌸

7K 87 13
google
google-jetstream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

4K 434 63
notAI-tech
fastdeploy

Deploy DL/ ML inference pipelines with minimal extra code.

4K 104 17
logicalclocks
hsml

Hopsworks Machine Learning Api 🚀 Model management with a model registry and model serving

3K 8 20
aniketmaurya
chitra

A multi-functional library for full-stack Deep Learning. Simplifies Model Building, API development, and Model Deployment.

3K 234 37
FedML-AI
fedml

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.

3K 4K 766
JRJSolutions
jrjmodelregistry

Simple and secure model registry for storing, versioning, and serving ML models using S3 and MongoDB.

2K 1 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery