Inference Engine Python Packages

astroid

A common base representation of python source code for pylint and other projects

53.5M 580 339

qai-hub-models

Qualcomm® AI Hub Models is our collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.

35K 1K 204

tessera-hypernetwork

Generate per-session LoRA adapters in <1s for agentic inference efficiency.

26K 4 2

qai-hub-models-cli

Qualcomm® AI Hub Models is our collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.

10K 1K 204

experta

Expert Systems for Python

9K 190 45

squish-ai

⚡️ The fastest way to run local LLMs on Apple Silicon — sub-second model loads, beats Ollama on throughput, tail latency, and full-response time. OpenAI/Ollama-compatible. No cloud, no API keys.

9K 10 0

sigit-code

A local-Llm-first coding agent.

7K 17 1

mtplx

2.24x decode TPS increase On Qwen 3.6 27B @ temp 0.6 | Native MTP Speculative Decoding On Apple Silicon With No External Drafter.

6K 911 60

krasis

Krasis is a Hybrid LLM runtime which focuses on efficient running of larger models on consumer grade VRAM limited hardware

6K 477 27

friendli-client

Friendli Suite Client

5K 50 7

onediff

OneDiff: An out-of-the-box acceleration library for diffusion models.

4K 2K 129

onediffx

OneDiff: An out-of-the-box acceleration library for diffusion models.

3K 2K 129

ai4i-core

A comprehensive microservices based platform to handle language based AI models inferences at scale.

3K 3 3

aphrodite-engine

Large-scale LLM inference engine

3K 2K 201

fedml

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.

2K 4K 766

neurobrix

Universal AI Runtime — Execute any model on any hardware

2K 47 1

nobodywho

NobodyWho is an inference engine that lets you run LLMs locally and efficiently on any device.

2K 1K 70

nocturnusai

Verified knowledge for AI agents. Compress context, extract and store facts, define rules, and ask questions — get deterministic answers with proof, not LLM guesses. Connect agents via MCP, Python SDK, TypeSc

2K 3 0

tala-locale

Phone number → country, currency, and language. One call. Zero dependencies.

2K 1 0

kvcached

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

1K 1K 122

ai4icore-core

A comprehensive microservices based platform to handle language based AI models inferences at scale.

1K 3 3

zllm-zse

The inference engine the open-source world built for itself.

1K 153 3

exxa

Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and minimal learning curve.

1K 27 4

qai-hub-apps

The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.

1K 431 109