llm-decoding
[Pytorch] Efficient tokenization library for recommendations and generative retrieval using pre-trained language models. Inspired by "Vectorizing the Trie: Efficient Constrained Decoding for LLM-based Generative Retrieval on Accelerators"