PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Unstructured Data Python Packages

Python packages with the GitHub topic unstructured-data. Sorted by relevance, with stars and monthly downloads.
treeverse
dvc

🦉 Data Versioning and ML Experiments

3.3M 16K 1K
voxel51
fiftyone-db

Refine high-quality datasets and visual AI models

170K 11K 761
nuclia
nucliadb-telemetry

NucliaDB, The AI Search database for RAG

164K 719 58
nuclia
nucliadb-protos

NucliaDB, The AI Search database for RAG

147K 719 58
nuclia
nucliadb-utils

NucliaDB, The AI Search database for RAG

144K 719 58
voxel51
fiftyone

Refine high-quality datasets and visual AI models

136K 11K 761
garyelephant
pygrok

python implementation of jordansissel's grok regular expression library

99K 284 74
nuclia
nucliadb-models

NucliaDB, The AI Search database for RAG

76K 719 58
nuclia
nucliadb-sdk

NucliaDB, The AI Search database for RAG

59K 719 58
nuclia
nucliadb-dataset

NucliaDB, The AI Search database for RAG

50K 719 58
nuclia
nucliadb

NucliaDB, The AI Search database for RAG

49K 719 58
kodexa-ai
kodexa

Kodexa Python Client

46K 5 1
datachain-ai
datachain

The Context Layer for unstructured data: typed, versioned datasets over S3, GCS, Azure

43K 3K 143
nomic-ai
nomic

Nomic Developer API SDK

37K 2K 197
yobix-ai
extractous

Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.

34K 2K 95
nuclia
nidx-protos

NucliaDB, The AI Search database for RAG

24K 719 58
shcherbak-ai
contextgem

ContextGem: Effortless LLM extraction from documents

13K 2K 156
nuclia
nidx-binding

NucliaDB, The AI Search database for RAG

10K 719 58
mitdbg
palimpzest

A System for Optimized Semantic Computation

6K 217 43
Zipstack
unstract-sdk

A framework for writing Unstract Tools/Apps

3K 23 1
voxel51
fiftyone-db-ubuntu2204

Refine high-quality datasets and visual AI models

3K 11K 761
osllmai
indox

The Indox Ecosystem offers integrated AI tools for data workflows. Our four components (IndoxArcg, IndoxMiner, IndoxJudge, and IndoxGen) enhance AI applications with advanced retrieval, extraction, evaluation, and generation capabilities, supporting multiple document formats and LLM providers.

2K 19 2
emcf
thepipe-api

Get clean data from tricky documents, powered by VLMs.

2K 2K 99
amphi-ai
jupyterlab-amphi

visual data prep powered by python

2K 1K 105
    • Data from PyPI, GitHub, ClickHouse, and BigQuery