PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Inference Engine Python Packages

Python packages with the GitHub topic inference-engine. Sorted by relevance, with stars and monthly downloads.
pylint-dev
astroid

A common base representation of python source code for pylint and other projects

50.9M 575 326
quic
qai-hub-models

Qualcomm® AI Hub Models is our collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.

40K 1K 179
youssofal
mtplx

2.24x decode TPS increase On Qwen 3.6 27B @ temp 0.6 | Native MTP Speculative Decoding On Apple Silicon With No External Drafter.

16K 492 19
nilp0inter
experta

Expert Systems for Python

13K 189 46
qualcomm
qai-hub-models-cli

Qualcomm® AI Hub Models is our collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.

8K 1K 179
aphrodite-engine
aphrodite-engine

Large-scale LLM inference engine

5K 2K 197
friendliai
friendli-client

Friendli Suite Client

4K 50 7
siliconflow
onediff

an out-of-the-box acceleration library for diffusion models

4K 2K 129
nobodywho-ooo
nobodywho

NobodyWho is an inference engine that lets you run LLMs locally and efficiently on any device.

3K 914 60
siliconflow
onediffx

OneDiff: An out-of-the-box acceleration library for diffusion models.

3K 2K 129
FedML-AI
fedml

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.

3K 4K 766
brontoguana
krasis

Krasis is a Hybrid LLM runtime which focuses on efficient running of larger models on consumer grade VRAM limited hardware

3K 452 26
Auctalis
nocturnusai

Python SDK for NocturnusAI — a logic-based inference engine and knowledge database

2K 2 0
dualform-labs
m5-infer

Extraordinary speed, extraordinary quality — MLX-based inference engine for Apple Silicon.

2K 0 1
Hexa08
nef2

NEF2: A high-performance, unified multi-backend AI infrastructure stack. Native support for CUDA, ROCm, Metal, and Distributed Fabric.

2K 0 0
NeuroBrix
neurobrix

Universal AI Runtime — Execute any model on any hardware

2K 75 1
kyegomez
exxa

Exa - Pytorch

1K 26 4
ovg-project
kvcached

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

1K 1K 112
jithin8mathew
yolomosaic

A Python library for visualizing YOLO detections and segmented instances on large orthomosaic images, with the ability to generate shapefiles for GIS integration

980 0 0
chengzeyi
para-attn

https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching

969 426 45
Zyora-Dev
zllm-zse

The inference engine the open-source world built for itself.

935 151 2
iBz-04
quaynor

Local GGUF inference, tool calling, and streaming chat for Python (Quaynor bindings).

722 3 0
banderlog
opencv-python-inference-engine

Wrapper package for OpenCV with Inference Engine python bindings.

648 34 6
qualcomm
qai-hub-apps

The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.

485 409 99
    • Data from PyPI, GitHub, ClickHouse, and BigQuery