PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Lakehouse Python Packages

Python packages with the GitHub topic lakehouse. Sorted by relevance, with stars and monthly downloads.
starrocks
starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.

554K 12K 2K
apache
pydoris-custom

Apache Doris is an easy-to-use, high performance and unified analytics database.

228K 15K 4K
apache
pydoris

Apache Doris is an easy-to-use, high performance and unified analytics database.

92K 15K 4K
lakehq
pysail

Drop-in Apache Spark replacement written in Rust, unifying batch processing, stream processing, and compute-intensive AI workloads.

29K 3K 146
sdebruyn
dbt-fabric-samdebruyn

Maintained and extended fork combining dbt-fabric and dbt-fabricspark

7K 9 2
sqllocks
sqllocks-spindle

Multi-domain, schema-aware synthetic data generator for Microsoft Fabric. 13 domains, billion-row scale, statistically calibrated. Lakehouse · Warehouse · SQL DB · Eventhouse writers.

6K 0 0
adidas
lakehouse-engine

The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.

4K 289 50
Mmodarre
lakehouse-plumber

The Metadata Driven framework for Databricks Lakeflow Declarative Pipelines (formerly Delta Live Tables). Metadata framework that generates production ready Pyspark code for Lakeflow Declarative Pipelines

3K 59 10
apache
dbt-doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

3K 15K 4K
apache
apache-gravitino

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.

3K 3K 835
ytsaurus
ytsaurus-spyt

YTsaurus is a scalable and fault-tolerant open-source big data platform.

3K 2K 206
apache
redpanda-polaris-catalog-python

Apache Polaris, the interoperable, open source catalog for Apache Iceberg

3K 2K 444
datalpia
laketower

Oversee your lakehouse

3K 12 0
databendlabs
databend

Data Agent Ready Warehouse : One for Analytics, Search, AI, Python Sandbox. — rebuilt from scratch. Unified architecture on your S3.

1K 9K 872
datacoolie
datacoolie

Metadata-driven ETL framework for portable data pipelines across Polars, Spark, Fabric, Databricks, and AWS.

947 8 0
mwc360
lakebench

A multi-modal Python library for benchmarking lakehouse engines and ELT scenarios, supporting both industry-standard and novel benchmarks.

945 52 18
apache
doris-mcp-server

Apache Doris MCP Server

888 292 79
IBM
ibm-watsonxdata-mcp-server

Model Context Protocol (MCP) server for IBM watsonx.data - enables AI assistants to query and explore lakehouse data Resources

786 7 3
mag1cfrog
timeseries-table-format

Rust-native time-series table format with gap/overlap tracking and SQL queries

750 15 1
google
space-datasets

Unified storage framework for the entire machine learning lifecycle

747 155 8
openaleph
ftm-lakehouse

Data standard, storage and retrieval for structured and unstructured FollowTheMoney data

699 5 1
apache
apache-polaris

Apache Polaris, the interoperable, open source catalog for Apache Iceberg

668 2K 444
Org-EthereaLogic
etherealogic-aetheriaforge

Databricks-native intelligent data transformation engine — coherence-scored Bronze/Silver/Gold with entity resolution and temporal reconciliation in a single deployable product.

665 1 0
apache
pyfluss

Rust Client for Apache Fluss (Incubating)

525 51 41
    • Data from PyPI, GitHub, ClickHouse, and BigQuery