PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Data Wrangling Python Packages

Python packages with the GitHub topic data-wrangling. Sorted by relevance, with stars and monthly downloads.
skrub-data
skrub

Machine learning with dataframes

181K 2K 218
ironmussa
optimuspyspark

:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

4K 2K 232
desbordante
desbordante

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

3K 478 100
fairtracks
omnipy

Omnipy is a high level Python library for type-driven data wrangling and scalable workflow orchestration (under development)

2K 26 1
grzegorzme
data-toolz

simple python library for handling data-io tasks

2K 7 0
pmgraham
datagrunt

Datagrunt is a Python library designed to simplify the way you work with CSV files. It provides a streamlined approach to reading, processing, and transforming your data into various formats, making data manipulation efficient and intuitive.

1K 10 2
vikrantdeshpande09876
anonymized-fraud-detection

A small package to parse and train an ML model for anonymized credit card transactions. Refer to github wikis for more details. Package was built for PythonVirtualenvOperator() on GCP Airflow.

1K 2 1
ContextLab
hypertools

A Python toolbox for gaining geometric insights into high-dimensional data

1K 2K 162
whythawk
whyqd

data wrangling simplicity, complete audit transparency, and at speed

1K 35 1
LucaCappelletti94
csv-trimming

Package python to remove common ugliness from a csv-like file

736 106 0
VianneyMI
monggregate

MongoDB aggregation pipelines made easy. Joins, grouping, counting and much more...

682 22 3
LukasHedegaard
datasetops

Fluent dataset operations, compatible with your favorite libraries

663 11 0
hi-primus
pyoptimus

:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

620 2K 233
laoluadewoye
skloverlay

This repository is the official location of the SKLOverlay Project. Here, it will hold everything used for the package on Py Pi, including source files.

486 0 0
ContextLab
pydata-wrangler

Wrangle messy numerical, image, and text data into consistent well-organized formats

394 10 2
AliiiBenn
excel-toolkit-cwd

Command-line toolkit for Excel data manipulation and analysis

282 0 0
onlozanoo
databroom

Databroom is a cross-language data cleaning tool with CLI, GUI, and API. Clean CSV, Excel, or JSON files and generate reproducible scripts in Python (pandas) or R (tidyverse). Now supports saving and loading cleaning pipelines as JSON for fully automated, shareable workflows.

248 7 0
DataPreprocessing
data-cleaning

Data Cleaning is a python package for data preprocessing. This cleans the CSV file and returns the cleaned data frame. It does the work of imputation, removing duplicates, replacing special characters, and many more.

245 9 4
nxank4
loclean

High-performance, local-first semantic data cleaning library

213 10 1
PavelGrigoryevDS
frameon

🐼✨ Frameon - enhances pandas DataFrames with analysis methods while preserving all native functionality

201 4 2
fititnt
gis-conflation-toolchain

gis-conflation-toolchain

182 0 0
nateify
ics-fixer

Fix slightly broken iCalendar files

181 0 0
asavinov
prosto

Data processing toolkit radically changing the way data is processed

180 93 5
fburic
pandance

Advanced relational operations for pandas DataFrames

121 5 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery