PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Delta Lake Python Packages

Python packages with the GitHub topic delta-lake. Sorted by relevance, with stars and monthly downloads.
delta-io
delta-spark

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

36M 9K 2K
delta-io
deltalake

A native Rust library for Delta Lake, with bindings into Python

24.2M 3K 619
delta-io
delta-sharing

An open protocol for secure data sharing

1.4M 941 227
Nike-Inc
koheesio

Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.

879K 650 39
starrocks
starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.

562K 12K 2K
apache
pydoris-custom

Apache Doris is an easy-to-use, high performance and unified analytics database.

229K 15K 4K
apache
pydoris

Apache Doris is an easy-to-use, high performance and unified analytics database.

92K 15K 4K
delta-io
hops-deltalake

A native Rust library for Delta Lake, with bindings into Python

56K 3K 619
dask-contrib
dask-deltatable

A Delta Lake reader for Dask

35K 54 17
lakehq
pysail

Drop-in Apache Spark replacement written in Rust, unifying batch processing, stream processing, and compute-intensive AI workloads.

28K 3K 146
HsiehShuJeng
cdk-emrserverless-with-delta-lake

This construct builds some elements for you to quickly launch an EMR Serverless application. After submitting the Emr Serverless job, you could also launch an EMR notebook via cluster template to check the outcome from the EMR Serverless application.

22K 11 5
ryan-evans-git
ematix-flow

Move data between databases, files, and streams from Python. 5.87× faster than PySpark.

12K 0 0
jeppe742
delta-lake-reader

Read Delta tables without any Spark

11K 47 14
sqllocks
sqllocks-spindle

Multi-domain, schema-aware synthetic data generator for Microsoft Fabric. 13 domains, billion-row scale, statistically calibrated. Lakehouse · Warehouse · SQL DB · Eventhouse writers.

6K 0 0
adidas
lakehouse-engine

The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.

4K 289 50
apache
dbt-doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

3K 15K 4K
legout
duckalog

Build DuckDB catalogs from declarative YAML/JSON configuration files

2K 1 0
lhrick
nosql-delta-bridge

NoSQL to Delta Lake ingestion with schema enforcement, type coercion, and a dead-letter queue. Nothing silently crashes your pipeline.

2K 0 0
roapi
roapi

Create full-fledged APIs for slowly moving datasets without writing a single line of code.

1K 3K 211
roapi
columnq-cli

Create full-fledged APIs for slowly moving datasets without writing a single line of code.

989 3K 211
datacoolie
datacoolie

Metadata-driven ETL framework for portable data pipelines across Polars, Spark, Fabric, Databricks, and AWS.

950 8 0
oyvinrog
sqlshell

A powerful SQL shell with GUI interface for data analysis

893 1 1
roapi
roapi-http

No description available

816 3K 211
PFund-Software-Ltd
pfeed

Data pipeline for algo-trading, getting and storing both real-time and historical data made easy.

416 32 7
    • Data from PyPI, GitHub, ClickHouse, and BigQuery