document-representation
Interface for easier topic modelling.
Pydantic models for representing a text document as a hierarchical structure.
Wrapper for the PageXML C++ library to ease handling of Page XML files within python.
This is python implementation of Bag-of-Concepts, as proposed by the paper "Bag-of-Concepts: Comprehending Document Representation through Clustering Words in Distributed Representation"