PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Llm Inference Python Packages

Python packages with the GitHub topic llm-inference. Sorted by relevance, with stars and monthly downloads.
ray-project
ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

56.1M 43K 8K
flashinfer-ai
flashinfer-python

FlashInfer: Kernel Library for LLM Serving

5M 6K 977
flashinfer-ai
flashinfer-cubin

FlashInfer: Kernel Library for LLM Serving

3.5M 6K 977
openvinotoolkit
openvino

OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

1.4M 10K 3K
bentoml
bentoml

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

186K 9K 965
openvinotoolkit
openvino-dev

OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

181K 10K 3K
kserve
kserve

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

113K 5K 1K
nomic-ai
gpt4all

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

67K 77K 8K
ray-project
ant-ray-cpp-nightly

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

43K 43K 8K
ray-project
ray-cpp

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

42K 43K 8K
feifeibear
yunchang

USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference

39K 667 78
MekayelAnik
vllm-cpu

Wheels & Docker images for running vLLM on CPU-only systems, optimized for different CPU instruction sets

30K 6 0
character-ai
prompt-poet

Streamlines and simplifies prompt design for both developers and non-technical users with a low code approach.

18K 1K 95
lightning-AI
litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

15K 13K 1K
monocle2ai
monocle-apptrace

Monocle is a framework for tracing GenAI app code. This repo contains implementation of Monocle for GenAI apps written in Python.

12K 122 35
bentoml
openllm

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

11K 12K 811
quantumaikr
quantcpp

LLM inference with 7x longer context. Pure C, zero dependencies. Lossless KV cache compression + single-header library.

9K 387 42
predibase
lorax-client

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

8K 4K 312
ray-project
ant-ray-nightly

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

7K 43K 8K
codelion
optillm

Optimizing inference proxy for LLMs

7K 4K 319
stratusadv
dandy

Dandy is an intelligence framework for developing programmatic solutions using artificial intelligence.

5K 4 1
intel
intel-extension-for-transformers

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

5K 2K 217
Omodaka9375
fade-kv

Frequency-Adaptive Decay Encoding: Attention-aware tiered KV cache compression for LLM inference.

4K 0 0
friendliai
friendli-client

Friendli Suite Client

4K 50 7
    • Data from PyPI, GitHub, ClickHouse, and BigQuery