PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Pyarrow Python Packages

Python packages with the GitHub topic pyarrow. Sorted by relevance, with stars and monthly downloads.
narwhals-dev
narwhals

Lightweight and extensible compatibility layer between dataframe libraries!

86.5M 2K 193
zen-xu
pyarrow-stubs

Type annotations for pyarrow

3.1M 50 26
ibis-project
ibis-framework

the portable Python dataframe library

1.9M 7K 722
uber
petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

273K 2K 284
shloktech
keyedstablehash

Stable, keyed hashing for Python objects and columnar data. Think `stablehash`, but with SipHash-like keyed PRF semantics so hashes are deterministic for a given key and resistant to adversarial inputs.

7K 1 0
vertti
daffy

Lightweight DataFrame validation decorators for Pandas, Polars, Modin, and PyArrow. No custom types required.

4K 58 5
rebase-energy
timedatamodel

A lightweight Pythonic data model for time series data, interoperable with NumPy, Pandas and Polars.

4K 8 2
andree0
fast-xml-flattener

Rust-powered Python library for flattening nested XML into JSON, dict, CSV and Parquet. 4–7× faster than xmltodict.

3K 3 0
pmgraham
datagrunt

Datagrunt is a Python library designed to simplify the way you work with CSV files. It provides a streamlined approach to reading, processing, and transforming your data into various formats, making data manipulation efficient and intuitive.

1K 10 2
legout
fsspec-utils

fsspec utils

1K 2 0
trustedshops-public
schema2pyarrow

Converts AsyncApi and JsonSchema to PyArrow schema

1K 12 0
ismailhammounou
db2ixf

db2ixf is a python package with a CLI that simplifies the parsing and processing of IBM Integration eXchange Format (IXF) files.

937 16 1
Genentech
pysummaries

Generate beautiful summary tables from pandas, polars or pyarrow dataframes

729 34 3
terrylica
exness-data-preprocess

Professional forex tick data preprocessing with unified DuckDB storage, Phase7 OHLC schema, and sub-15ms query performance

705 4 0
icaropires
pdf2dataset

Converts a whole subdirectory with a big (or small) volume of PDF documents to a dataset (pandas DataFrame) with error tracking and choice of features

674 19 5
thread53
pqviewer

View Apache Parquet Files In Your Terminal

648 21 0
jaysnm
dremio-arrow

Dremio Arrow Flight Client

501 4 4
itsbigspark
pymetagen

Metadata Generator

498 0 0
goalzz85
sql2arrow

This is a Python library that provides convenient and high-performance methods to parse INSERT SQL statements into Arrow arrays.

479 7 0
stefur
swemaps

Maps of Sweden in GeoParquet

435 2 1
psmyth94
biosets

A bioinformatics extension of 🤗 Datasets library, built for ML applications on biological and omics data, offering easy integration of metadata and low-code data management tools.

369 3 0
ibis-project
turntable-spoonbill

the portable Python dataframe library

305 7K 722
xbrianh
xdlake

A loose implementation of the deltalake protocol, written in Python on top of pyarrow, focused on extensibility, customizability, and distributed data.

304 4 0
SaelKimberly
rxls

Reading both XLSX and XLSB files, fast and memory-safe, into PyArrow.

200 12 3
    • Data from PyPI, GitHub, ClickHouse, and BigQuery