PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Hadoop Python Packages

Python packages with the GitHub topic hadoop. Sorted by relevance, with stars and monthly downloads.
spotify
luigi

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

3.5M 19K 2K
CODAIT
yarn-api-client

Python client for Hadoop® YARN API

384K 109 49
h2oai
h2o

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

235K 7K 2K
jcrist
skein

A tool and library for easily deploying applications on Apache YARN

56K 146 39
iterative
dvc-hdfs

HDFS/WebHDFS plugin for dvc

33K 2 1
SneaksAndData
hadoop-fs-wrapper

Python Wrappers for Hadoop FileSystem

22K 4 0
jingw
pyhdfs

Python HDFS client

5K 97 23
Breaka84
spooq

Spooq is a PySpark based helper library for ETL data ingestion pipeline in Data Lakes.

4K 10 2
h2oai
h2o-client

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

2K 7K 2K
developer-sdk
hadoop-yarn-rest-api

This is Python Library for YARN REST api

2K 0 0
splitlog
splitlog

Utility to split aggregated logs from Apache Hadoop Yarn applications into a folder hierarchy

2K 0 0
BROADSoftware
hadeploy

An Hadoop Application deployment tool

1K 9 4
h2oai
h2o-mlflow-flavor

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

1K 7K 2K
eecs485staff
madoop

A light weight MapReduce framework for education

1K 10 5
szilard-nemeth
yarn-dev-tools

Various scripts to automate and ease Apache Hadoop YARN development.

708 2 0
SvenskaSpel
cobra-policytool

Manage Apache Atlas and Ranger configuration for your Hadoop environment.

691 16 6
IBMStreams
streamsx-hdfs

This toolkit provides operators and functions for interacting with Hadoop File System.

538 9 20
clusterdock
clusterdock

clusterdock is a framework for creating Docker-based container clusters

532 30 8
criteo
tf-yarn

Train TensorFlow models on YARN in just a few lines of code!

328 93 28
ab2dridi
lakekeeper

A configurable PySpark package to identify fragmented external tables and perform safe in-place compaction

262 0 0
deeplearning4j
jumpy

Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learn...

251 14K 4K
canimus
aiowebhdfs

A modern and async implementation of the WebHDFS API in python

236 7 1
dask
knit

Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead

224 54 10
deeplearning4j
pydatavec

Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learn...

195 14K 4K
    • Data from PyPI, GitHub, ClickHouse, and BigQuery