PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Corpus Python Packages

Python packages with the GitHub topic corpus. Sorted by relevance, with stars and monthly downloads.
neocl
speach

🐍🍑 Python 3 library for managing, annotating, and converting natural language corpuses using popular formats (CoNLL, ELAN, Praat, CSV, JSON, SQLite, VTT, Audacity, TTL, TIG, ISF, etc.)

34K 21 6
flairNLP
fundus

A very simple news crawler with a funny name

8K 455 109
gunthercox
chatterbot-corpus

A multilingual dialog corpus

8K 1K 1K
gambolputty
german-nouns

A list of ~100,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus a module to look up the data and parse compound words.

4K 170 23
johentsch
ms3

A parser for annotated MuseScore 3 files.

4K 56 6
MozillaSecurity
corpus-replicator

A corpus generation tool

3K 27 3
ko-nlp
korpora

This package provides easy-download and easy-usage for various Korean corpora.

2K 748 79
affjljoo3581
expanda

Integrated Corpus-Building Environment

1K 33 7
lovit
krwordrank

Korean corpus repository

1K 748 79
eggplants
aovec

Make Word2Vec from aozorabunko/aozorabunko

967 3 0
GlobalMaksimum
sadedegel

A General Purpose NLP library for Turkish

967 95 14
tanloong
neosca

Another syntactic complexity analyzer of written English language samples

865 43 14
NationalLibraryOfNorway
maalfrid-toolkit

Toolkit for the Målfrid project

856 1 0
entelecheia
ekorpkit

ekorpkit: NLP Library for Social Science Research

846 6 2
kunansy
rnc

API for Russian National Corpus

663 9 1
yonkornilov
opus-api

OPUS (opus.nlpl.eu) Python API

653 18 5
NetherlandsForensicInstitute
demeuk

Demeuk is a simple tool to clean up corpora (like dictionaries) or any dataset containing plain text strings.

647 22 4
asshatter
keywords

This is a simple library for extracting keywords from data with/without using a corpus.

505 8 3
CLUEBenchmark
pyclue

Python toolkit for Chinese Language Understanding(CLUE) Evaluation benchmark

504 133 15
chatopera
efaqa-corpus-zh

❤️Emotional First Aid Dataset, 心理咨询问答、聊天机器人语料库

491 752 88
IngoKl
textdirectory

TextDirectory allows you to combine multiple text files into one. While doing this, filters and transformations can be applied.

487 11 2
letuananh
texttaglib

Python library for managing and annotating text corpuses in different formats (ELAN, TIG, TTL, et cetera)

468 0 0
chatopera
insuranceqa-data

:helicopter: 保险行业语料库,聊天机器人

431 1K 341
tarepan
npvcc2016

Python loader of npVCC2016 corpus

413 0 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery