byte-pair-encoding
A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust
This summarizer attempts to leverage Byte Pair Encoding (BPE) tokenization and the Bart vocabulary to filter text by semantic meaningfulness.
Feature extraction from sequential data
PGN Tokenizer, a Byte Pair Encoding (BPE) tokenizer for Chess Portable Game Notiation (PGN).
Text Tokenizer in C++