Model Serving Python Packages

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

5.8M 86K 19K

truss

The simplest way to serve AI/ML models in production

1.3M 1K 109

baseten-performance-client

The simplest way to serve AI/ML models in production

330K 1K 109

truss-transfer

The simplest way to serve AI/ML models in production

310K 1K 109

bentoml

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

213K 9K 980

kserve

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

123K 6K 2K

vllm-tpu

A high-throughput and memory-efficient inference and serving engine for LLMs

58K 86K 19K

predikit

The missing bridge between your ML models and your AI agents.

46K 446 137

mlrun

MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.

42K 2K 308

mosec

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

39K 901 73

flama

The production framework for Predictive and Generative AI. Serve any model as an API in one line, with OpenAI/Anthropic/Ollama-compatible endpoints, a built-in chat UI, and native MCP.

38K 291 17

mlrun-pipelines-kfp-v1-8

36K 2K 308

mlrun-pipelines-kfp-common

36K 2K 308