disaggregated-inference
NUMA-aware GPU provisioning and orchestration for stateless MoE workloads of all sizes
An imperative command-line-interface for AI workload orchestration