token-eviction
Training-free KV cache compression via E8 lattice quantization and attention-aware token eviction