PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Apache Spark Python Packages

Python packages with the GitHub topic apache-spark. Sorted by relevance, with stars and monthly downloads.
mlflow
mlflow-skinny

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controlling costs and managing access to models and data.

38.2M 26K 6K
mlflow
mlflow

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controlling costs and managing access to models and data.

37.1M 26K 6K
mlflow
mlflow-tracing

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controlling costs and managing access to models and data.

17M 26K 6K
graphframes
graphframes

GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs

3.2M 1K 268
graphframes
graphframes-py

GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs

2.2M 1K 268
Microsoft
synapseml

Simple and Distributed Machine Learning

2.1M 5K 861
lucacanali
sparkmeasure

This repository contains the development code for sparkMeasure, an Apache Spark performance analysis and troubleshooting library. It simplifies collecting, aggregating, and exporting Spark task/stage metrics, and is designed for practical use by developers and data engineers in interactive analysis, testing, and production monitoring workflows.

1.8M 822 160
treeverse
lakefs-sdk

lakeFS - Data version control for your data lake | Git for data

1M 5K 447
treeverse
lakefs

lakeFS - Data version control for your data lake | Git for data

931K 5K 447
MrPowers
quinn

pyspark methods to enhance developer productivity 📣 👯 🎉

615K 687 95
zero323
pyspark-stubs

Apache (Py)Spark type annotations (stub files).

320K 118 36
svenkreiss
pysparkling

A pure Python implementation of Apache Spark's RDD and DStream interfaces.

160K 270 45
databricks
spark-sklearn

(Deprecated) Scikit-learn integration package for Apache Spark

136K 1K 224
treeverse
lakefs-client

lakeFS - Data version control for your data lake | Git for data

89K 5K 447
lakehq
pysail

Drop-in Apache Spark replacement written in Rust, unifying batch processing, stream processing, and compute-intensive AI workloads.

29K 3K 146
mattjw
sparkql

sparkql: Apache Spark SQL DataFrame schema management for sensible humans

17K 12 4
G-Research
fasttrackml

Experiment tracking server focused on speed and scalability

9K 118 18
graphframes
graphframes-latest

GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs

6K 1K 268
abronte
pysparkgateway

Connect to remote Spark clusters seamlessly.

4K 3 4
LucaCanali
sparkhistogram

Includes notes on using Apache Spark, with drill down on Spark for Physics, how to run TPCDS on PySpark, how to create histograms with Spark. Also tools for stress testing, measuring CPUs' performance, and I/O latency heat maps. Jupyter notebooks examples for using various DB systems.

3K 460 154
kubeflow
mcp-apache-spark-history-server

MCP Server and CLI for Apache Spark History Server. Debug Spark applications from AI agents, scripts, or the terminal.

3K 170 60
zero323
pyspark-asyncactions

Asynchronous actions for PySpark

3K 47 2
maxpoint
spylon

Utilities to work with Scala/Java code with py4j

2K 40 17
dsgrid
dsgrid-toolkit

Python package for working with demand-side grid projects, datasets and queries

2K 33 5
    • Data from PyPI, GitHub, ClickHouse, and BigQuery