Computational Linguistics Python Packages

pythainlp

Thai natural language processing in Python

1M 1K 299

pycantonese

Cantonese Linguistics and NLP

38K 411 41

rustling

A high-performance library for computational linguistics

25K 3 0

botok

🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python

20K 80 15

pylangacq

Language Acquisition Research Tools

17K 45 18

wordseg

Word segmentation models

13K 5 1

arpa

:snake: Python library for n-gram models in ARPA format

6K 40 14

pynlpl

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

5K 475 66

wikipron

Massively multilingual pronunciation mining

4K 368 77

linguaf

python package for calculating famous measures in computational linguistics

3K 15 5

python-ucto

This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).

2K 31 5

folia-linguistic-annotation-tool

FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Flat allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm.

1K 113 15

folia

An extensive Python library for dealing with FoLiA (Format for Linguistic Annotation) documents, a rich XML-based format for linguistic annotation finding application in Natural Language Processing (NLP). This library was formerly part of PyNLPl.

1K 18 5

rutextnorm

Fast Russian Text normalization for TTS using only RegEx.

1K 33 3

chandassu

Chandassu: First Python Library for Global Metrical Poetry

1K 15 1

finntk

Some simple high level tools for processing Finnish text

1K 7 0

russian-tts-normalization

Fast Russian Text normalization for TTS using only RegEx.

1K 33 3

calamancy

NLP pipelines for Tagalog using spaCy

1K 71 6

pybo

🦜 NLP for Tibetan, in Python.

1K 42 13

linguistica

Linguistica 5: Unsupervised Learning of Linguistic Structure

976 32 16

morphseg

An efficient and easy-to-use morpheme segmentation library

950 3 0

fonetika

Russian/English/Estonian/Finnish/Swedish phonetic algorithm based on Soundex and Metaphone

917 54 6

bllipparser

BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser) See http://pypi.python.org/pypi/bllipparser/ for Python module.

876 227 53

glazing

Unified data models and interfaces for syntactic and semantic frame ontologies.

874 7 0