PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Data Lake Python Packages

Python packages with the GitHub topic data-lake. Sorted by relevance, with stars and monthly downloads.
dlt-hub
dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

5.7M 5K 508
treeverse
lakefs-sdk

lakeFS - Data version control for your data lake | Git for data

1M 5K 447
treeverse
lakefs

lakeFS - Data version control for your data lake | Git for data

931K 5K 447
treeverse
lakefs-client

lakeFS - Data version control for your data lake | Git for data

89K 5K 447
MatsMoll
aligned

The DBT of ML, as Aligned describes data dependencies in ML systems, and reduce technical data debt

8K 61 2
crate
dlt-cratedb

dlt destination adapter for CrateDB

7K 0 0
nodestream-proj
nodestream

A Declarative framework for Building, Maintaining, and Analyzing Graph Data

3K 62 17
dlt-hub
dlt-core

dlt is an open-source python-first scalable data loading library that does not require any backend to run.

927 5K 508
mag1cfrog
timeseries-table-format

Rust-native time-series table format with gap/overlap tracking and SQL queries

750 15 1
nodestream-proj
nodestream-plugin-dotenv

A plugin to nodestream for loading environment variables from a .env file

490 2 0
nodestream-proj
nodestream-plugin-semantic

A plugin for nodestream to ingest semantic data with embeddings

421 0 0
nodestream-proj
nodestream-plugin-meta

A Meta Plugin for Nodestream that Lets you Build a Graph of Your Graph Schema

303 0 0
nodestream-proj
nodestream-plugin-pedantic

A nodestream plugin that provides a series of audits to ensure high quality and consistent nodestream projects.

289 2 0
arpe-io
lakexpress-mcp

MCP server for LakeXpress

286 1 0
realdatadriven
etlx-wrapper

Python wrapper for ETLX CLI to run ETL workflows from Python

246 43 3
treeverse
lakefs-sdk-async

lakeFS API

241 5K 447
utndatasystems
virtual-parquet

🗜️Compressing Parquet files using functions

220 0 1
dlt-hub
dlt-dataops

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

143 5K 508
Canner
vulcan-sql

Data API Framework for AI Agents and Data Apps

123 793 42
SRRC-1334
ztract

Extract mainframe EBCDIC data using COBOL copybooks. Zero MIPS. Pure Python + Cobrix engine.

91 0 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery