PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Datalake Python Packages

Python packages with the GitHub topic datalake. Sorted by relevance, with stars and monthly downloads.
treeverse
lakefs-sdk

lakeFS - Data version control for your data lake | Git for data

1M 5K 447
treeverse
lakefs

lakeFS - Data version control for your data lake | Git for data

927K 5K 447
starrocks
starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.

562K 12K 2K
activeloopai
deeplake

Deeplake is AI Data Runtime for Agents. It provides serverless postgres with a multimodal datalake, enabling scalable retrieval and training.

230K 9K 710
sinaptik-ai
pandasai

Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.

175K 24K 2K
treeverse
lakefs-client

lakeFS - Data version control for your data lake | Git for data

81K 5K 447
sinaptik-ai
pandasai-openai

Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.

12K 24K 2K
PaloAltoNetworks
pan-cortex-data-lake

Python idiomatic SDK for Cortex™ Data Lake.

11K 48 22
sinaptik-ai
pandasai-litellm

Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.

7K 24K 2K
zinggAI
zingg

Scalable identity resolution, entity resolution, data mastering and deduplication using ML

5K 1K 168
apache
apache-gravitino

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.

3K 3K 835
PaloAltoNetworks
pancloud

Python idiomatic SDK for Cortex™ Data Lake.

3K 48 22
sidequery
dlt-iceberg

An Iceberg destination for DLT that supports REST catalogs

2K 10 5
awslabs
aws-orbit-overprovisioning

Launch a Pod for the team space that executes a script given by the user

2K 147 92
neuro-ml
tarn

An insanely customizable framework for key-value storage 💾

2K 3 0
aws-samples
aws-insurancelake-etl

A CDK Python app for deploying ETL jobs that operate data pipelines for InsuranceLake in AWS

1K 35 16
awslabs
aws-orbit

Data & ML Unified Development and Production Environment.

1K 147 92
awslabs
aws-orbit-team-script-launcher

Launch a Pod for the team space that executes a script given by the user

1K 147 92
awslabs
aws-orbit-code-commit

Orbit Workbench CodeCommit Plugin.

1K 147 92
awslabs
aws-orbit-custom-cfn

Launch a CloudFormation stack for the team space

999 147 92
awslabs
aws-orbit-redshift

Orbit Workbench Redshift Plugin.

998 147 92
awslabs
aws-orbit-sdk

AWS Orbit Workbench SDK

989 147 92
awslabs
aws-orbit-hello-world

Minimal Orbit Workbench Plugin.

987 147 92
awslabs
aws-orbit-emr-on-eks

Allow users to run EMR jobs on their EKS namespace

931 147 92
    • Data from PyPI, GitHub, ClickHouse, and BigQuery