PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Bigdata Python Packages

Python packages with the GitHub topic bigdata. Sorted by relevance, with stars and monthly downloads.
chatnoir-eu
fastwarc

A robust web archive analytics toolkit

2.3M 138 18
chatnoir-eu
resiliparse

A robust web archive analytics toolkit

2.2M 138 18
scikit-hep
uproot

ROOT I/O in pure Python and NumPy.

1.3M 267 94
joseph-fox
pybloom-live

Scalable Bloom Filter implemented in Python

493K 165 25
scikit-hep
uproot3

ROOT I/O in pure Python and NumPy.

221K 313 66
canimus
cuallee

Possibly the fastest DataFrame-agnostic quality check library in town.

103K 246 22
abeusher
timehash

An algorithm for creating user configurable, variable-precision sliding windows of time. Useful for binning time values in large collections of data.

27K 45 14
legend-exp
legend-pydataobj

LEGEND Python Data Objects

12K 1 13
scikit-hep
uproot4

ROOT I/O in pure Python and NumPy.

8K 267 94
ironmussa
optimuspyspark

:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

4K 2K 232
RayforceDB
rayforce-py

Python bindings for RayforceDB

4K 14 0
abronte
pysparkgateway

Connect to remote Spark clusters seamlessly.

4K 3 4
dbcli
athenacli

AthenaCLI is a CLI tool for AWS Athena service that can do auto-completion and syntax highlighting.

3K 226 36
visualpython
visualpython

Visual Python is a GUI-based Python code generator, developed on the Jupyter Notebook as an extension.

3K 919 119
apache
airavata-django-portal-sdk

The Airavata Django Portal SDK is a library that makes it easier to develop Airavata Django Portal customizations.

2K 0 2
bigbio
quantms-utils

Python scripts and helpers for the quantMS workflow

1K 6 5
visualpython
jupyterlab-visualpython

GUI-based Python code generator for data science, extension to Jupyter Lab, Jupyter Notebook and Google Colab.

1K 919 119
databendlabs
databend

Data Agent Ready Warehouse : One for Analytics, Search, AI, Python Sandbox. — rebuilt from scratch. Unified architecture on your S3.

1K 9K 872
bigartm
bigartm

Fast topic modeling platform

1K 673 121
BROADSoftware
hadeploy

An Hadoop Application deployment tool

981 9 4
apache
airavata-jupyter-magic

Jupyter magics for running notebook cells on remote HPC resources. Powered by Apache Airavata.

883 151 141
douban
dpark

Python clone of Spark, a MapReduce alike framework in Python

796 3K 519
arvados
arvados-pam

Arvados PAM module

773 418 126
anovos
anovos

An Open Source tool for Feature Engineering in Machine Learning

761 74 24
    • Data from PyPI, GitHub, ClickHouse, and BigQuery