PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Computational Linguistics Python Packages

Python packages with the GitHub topic computational-linguistics. Sorted by relevance, with stars and monthly downloads.
PyThaiNLP
pythainlp

Thai natural language processing in Python

1.1M 1K 297
OpenPecha
botok

🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python

24K 80 15
jacksonllee
pycantonese

Cantonese Linguistics and NLP

23K 405 42
jacksonllee
rustling

A high-performance library for computational linguistics

15K 2 0
jacksonllee
pylangacq

Language Acquisition Research Tools

12K 45 18
jacksonllee
wordseg

Word segmentation models

8K 5 1
sfischer13
arpa

:snake: Python library for n-gram models in ARPA format

7K 40 14
proycon
pynlpl

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

6K 476 66
CUNY-CL
wikipron

Massively multilingual pronunciation mining

3K 368 77
Perevalov
linguaf

python package for calculating famous measures in computational linguistics

3K 15 5
proycon
folia

An extensive library for processing FoLiA documents. FoLiA stands for Format for Linguistic Annotation and is a very rich XML-based format used by various Natural Language Processing tools.

2K 18 5
proycon
python-ucto

This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).

2K 31 5
incrementaliser
dynamicsyntax

DyLan, the Dynamic Syntax parser in Python

2K 2 0
proycon
folia-linguistic-annotation-tool

FLAT is a web-based linguistic annotation environment based around the FoLiA format (https://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Flat allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm.

2K 113 15
frankier
finntk

Finnish NLP toolkit

2K 7 0
ljvmiranda921
calamancy

NLP pipelines for Tagalog using spaCy

1K 70 6
Esukhia
pybo

Python utils for processing Tibetan

1K 40 13
TheWelcomer
morphseg

A multilingual package for segmenting text into morphemes using supervised deep learning.

1K 2 0
roddar92
fonetika

Russian/English/Estonian/Finnish/Swedish phonetic algorithm based on Soundex and Metaphone

769 54 6
BLLIP
bllipparser

BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser) See http://pypi.python.org/pypi/bllipparser/ for Python module.

591 228 53
factslab
glazing

Unified data models and interfaces for syntactic and semantic frame ontologies.

470 7 0
alex-rusakevich
ramonak

Універсальная бібліятэка па працы з тэкстам на беларускай мове для Python

430 0 0
craigtrim
pystylometry

Comprehensive Python toolkit for stylometry

402 2 0
delph-in
pydmrs

A library for manipulating DMRS structures

394 14 5
    • Data from PyPI, GitHub, ClickHouse, and BigQuery