UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)
Pool Apple Silicon Macs into a distributed ML training cluster. Zero-config discovery, ring AllReduce gradient sync, HMAC-authenticated fleet isolation with TLS encryption. PyTorch + MLX engines.