Kernel Fusion Python Packages

tokenspeed-iris

AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming

3K 191 42

mhc-mlx

MLX + Metal implementation of mHC: Manifold-Constrained Hyper-Connections by DeepSeek-AI.

1K 3 1

alia-molten

Write the math. Get the kernel. Fused CUDA kernel generation from mathematical specifications.

515 1 0

fused-turboquant

Fused Triton kernels for TurboQuant KV cache compression — 2-4 bit quantization with RHT rotation. Drop-in HuggingFace & vLLM integration. Up to 4.9x KV cache compression for Llama, Qwen, Mistral, and more.

398 8 1