PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Hdfs Python Packages

Python packages with the GitHub topic hdfs. Sorted by relevance, with stars and monthly downloads.
piskvorky
smart-open

Utils for streaming large files (S3, HDFS, gzip, bz2...)

69.6M 3K 387
TileDB-Inc
tiledb

Python interface to the TileDB storage engine

98K 201 38
jcrist
skein

A tool and library for easily deploying applications on Apache YARN

56K 146 39
iterative
dvc-hdfs

HDFS/WebHDFS plugin for dvc

33K 2 1
megvii-research
megfile

Megvii FILE Library - Working with Files in Python same as the standard library

26K 174 20
wradlib
wradlib

weather radar data processing - python package

13K 309 88
spotify
snakebite

A pure python HDFS client

6K 859 213
jingw
pyhdfs

Python HDFS client

5K 97 23
criteo
cluster-pack

A library on top of either pex or conda-pack to make your Python code easily available on a cluster

1K 47 23
BROADSoftware
hadeploy

An Hadoop Application deployment tool

1K 9 4
tks18
pyquery-polars

PyQuery is a local-first data operating system built on lazy execution that processes 100GB+ files while you doomscroll. No cap. 🧢

765 1 0
IBMStreams
streamsx-hdfs

This toolkit provides operators and functions for interacting with Hadoop File System.

538 9 20
fasouto
webhdfspy

A Python wrapper library to access Hadoop WebHDFS REST API

526 8 5
canimus
alphareader

A reader for large files with custom delimiters and encodings

517 6 1
tks18
pyquery-core

PyQuery is a local-first data operating system built on lazy execution that processes 100GB+ files while you doomscroll. No cap. 🧢

412 1 0
ab2dridi
lakekeeper

A configurable PySpark package to identify fragmented external tables and perform safe in-place compaction

262 0 0
qiyangduan
schemaindex

SchemaIndex is designed for data scientists to index and search metadata more efficiently.

117 3 1
yassineazzouz
kraken-pyds

Kraken - A distributed data transfer tool.

95 2 1
silkway-ai
dfspy

Distributed File System written in Python

95 14 0
yassineazzouz
pydistcp

A python Web HDFS based tool for inter/intra-cluster data copying.

94 9 3
ceph
test-cephadm

cephadm

93 17K 6K
marco-gallegos
sqoopit

A simple package to let you Sqoop into HDFS/Hive/HBase with python

62 0 0
piskvorky
srcd-smart-open

Utils for streaming large files (S3, HDFS, gzip, bz2...)

53 3K 387
s8sg
spark-yarn-submit

library to handle spark job submit in a yarn cluster in different environment

40 3 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery