gpu-inference
Krasis is a Hybrid LLM runtime which focuses on efficient running of larger models on consumer grade VRAM limited hardware
Mixed-vendor GPU inference cluster manager with speculative decoding
GPU-accelerated LLaMA inference wrapper for legacy Vulkan-capable systems a Pythonic way to run AI with knowledge (Ilm) on fire (Vulkan).