PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Parquet Python Packages

Python packages with the GitHub topic parquet. Sorted by relevance, with stars and monthly downloads.
apache
pyarrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics

412.7M 17K 4K
Eventual-Inc
daft

High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale

810K 5K 474
InfluxCommunity
influxdb3-python

Python module that provides a simple and convenient way to interact with InfluxDB 3.0.

667K 100 17
uber
petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

273K 2K 284
scikit-hep
awkward0

Manipulate arrays of complex data structures as easily as Numpy.

262K 214 39
cldellow
parquet-metadata

Dump metadata about a Parquet file.

205K 11 2
ktrueda
parquet-tools

easy install parquet-tools

143K 183 24
developmentseed
lonboard

Fast, interactive geospatial data visualization in Jupyter.

40K 945 52
dask-contrib
dask-deltatable

A Delta Lake reader for Dask

35K 54 17
lakehq
pysail

Drop-in Apache Spark replacement written in Rust, unifying batch processing, stream processing, and compute-intensive AI workloads.

29K 3K 146
quiltdata
quilt3

Quilt is a Scientific Data Management Platform on AWS that helps teams and AI find, trust, and reuse data through deeply versioned, context-rich data packages.

27K 1K 90
andreax79
airflow-provider-xlsx

Airflow operators for converting XLSX files from/to Parquet/CSV/JSON

19K 7 1
ryan-evans-git
ematix-flow

Move data between databases, files, and streams from Python. 5.87× faster than PySpark.

10K 0 0
lmmx
polars-config-meta

A Polars plugin for persistent DataFrame-level metadata

8K 20 2
godalida
koala-diff

Blazingly fast data comparison tool for Python, powered by Rust. Compare massive CSV/Parquet datasets instantly.

7K 5 0
Eventual-Inc
daft-lts

High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale

6K 5K 474
sqllocks
sqllocks-spindle

Multi-domain, schema-aware synthetic data generator for Microsoft Fabric. 13 domains, billion-row scale, statistically calibrated. Lakehouse · Warehouse · SQL DB · Eventhouse writers.

6K 0 0
zachspar
parquet-py

A simple command-line interface & Python API for parquet

5K 1 0
SouravRoy-ETL
slothdb

SlothDB is an embedded SQL database that runs everywhere: on your laptop, on a server, and in the browser. Built from scratch. Up to 5x faster where it counts.

5K 852 4
quiltdata
quilt

Quilt is a Scientific Data Management Platform on AWS that helps teams and AI find, trust, and reuse data through deeply versioned, context-rich data packages.

5K 1K 90
MorePET
mat-vis-client

PBR texture data factory — ~3000 materials from ambientcg, polyhaven, gpuopen, physicallybased.info baked to Parquet on GitHub Releases. pip install mat-vis-client

4K 0 0
paradigmxyz
cryo

cryo is the easiest way to extract blockchain data to parquet, csv, json, or python dataframes

4K 2K 182
RecordEvolution
imctermite

Enables extraction of measurement data from binary files with extension 'raw' used by proprietary software imcFAMOS/imcSTUDIO and facilitates its storage in open source file formats

4K 33 11
mabel-dev
rugo

📒 Python Parquet & JSON Lines reader

3K 3 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery