PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Spark Python Packages

Python packages with the GitHub topic spark. Sorted by relevance, with stars and monthly downloads.
tobymao
sqlglot

Python SQL Parser and Transpiler

63.5M 9K 1K
apache
pyspark

Apache Spark - A unified analytics engine for large-scale data processing

51.2M 43K 29K
delta-io
delta-spark

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

36M 9K 2K
tobymao
sqlglotrs

Python SQL Parser and Transpiler

5.8M 9K 1K
databrickslabs
databricks-labs-dqx

Databricks framework to validate Data Quality of pySpark DataFrames and Tables

5.5M 414 113
graphframes
graphframes

GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs

3.2M 1K 268
capitalone
datacompy

Pandas, Polars, Spark, and Snowpark DataFrame comparison for humans and more!

2.6M 643 160
fugue-project
fugue

A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.

2.6M 2K 100
combust
mleap

MLeap: Deploy ML Pipelines to Production

2.5M 2K 316
graphframes
graphframes-py

GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs

2.2M 1K 268
Microsoft
synapseml

Simple and Distributed Machine Learning

2.1M 5K 861
lucacanali
sparkmeasure

This repository contains the development code for sparkMeasure, an Apache Spark performance analysis and troubleshooting library. It simplifies collecting, aggregating, and exporting Spark task/stage metrics, and is designed for practical use by developers and data engineers in interactive analysis, testing, and production monitoring workflows.

1.8M 822 160
apache
pyspark-client

Apache Spark - A unified analytics engine for large-scale data processing

1.7M 43K 29K
dylan-profiler
visions

Type System for Data Analysis in Python

1.6M 218 20
jupyter-incubator
hdijupyterutils

Jupyter magics and kernels for working with remote Spark clusters

1.6M 1K 447
jupyter-incubator
autovizwidget

Jupyter magics and kernels for working with remote Spark clusters

1.6M 1K 447
delta-io
delta-sharing

An open protocol for secure data sharing

1.4M 941 227
databricks
koalas

Koalas: pandas API on Apache Spark

1.4M 3K 370
jelmerk
pyspark-hnsw

Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs

1.3M 303 59
JohnSnowLabs
spark-nlp

State of the Art Natural Language Processing

1.1M 4K 743
tobymao
sqlglotc

Python SQL Parser and Transpiler

1.1M 9K 1K
fugue-project
fugue-sql-antlr

A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.

989K 2K 100
moj-analytical-services
splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

782K 2K 236
flyteorg
flytekit

Extensible Python SDK for developing Flyte tasks and workflows. Simple to get started and learn and highly extensible.

629K 315 335
    • Data from PyPI, GitHub, ClickHouse, and BigQuery