PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Scala Python Packages

Python packages with the GitHub topic scala. Sorted by relevance, with stars and monthly downloads.
apache
pyspark

Apache Spark - A unified analytics engine for large-scale data processing

51.2M 43K 29K
nteract
papermill

📚 Parameterize, execute, and analyze notebooks

9.3M 6K 450
databrickslabs
dbl-tempo

API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation

3.2M 342 59
combust
mleap

MLeap: Deploy ML Pipelines to Production

2.5M 2K 316
apache
apache-sedona

A cluster computing framework for processing large-scale geospatial data

2.2M 2K 760
Microsoft
synapseml

Simple and Distributed Machine Learning

2.1M 5K 861
lucacanali
sparkmeasure

This repository contains the development code for sparkMeasure, an Apache Spark performance analysis and troubleshooting library. It simplifies collecting, aggregating, and exporting Spark task/stage metrics, and is designed for practical use by developers and data engineers in interactive analysis, testing, and production monitoring workflows.

1.8M 822 160
apache
pyspark-client

Apache Spark - A unified analytics engine for large-scale data processing

1.7M 43K 29K
jelmerk
pyspark-hnsw

Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs

1.3M 303 59
tree-sitter
tree-sitter-scala

Scala grammar for tree-sitter

615K 187 69
G-Research
pyspark-extension

A library that provides useful extensions to Apache Spark and PySpark.

91K 237 30
h2oai
h2o-pysparkling-3-1

Sparkling Water provides H2O functionality inside Spark cluster

79K 977 361
aws
sagemaker-pyspark

A Spark library for Amazon SageMaker.

55K 301 128
h2oai
h2o-pysparkling-3-5

Sparkling Water provides H2O functionality inside Spark cluster

27K 977 361
h2oai
h2o-pysparkling-3-4

Sparkling Water provides H2O functionality inside Spark cluster

22K 977 361
pantsbuild
pantsbuild-pants

The Pants Build System

20K 4K 703
h2oai
h2o-pysparkling-2-4

Sparkling Water provides H2O functionality inside Spark cluster

17K 977 361
h2oai
h2o-pysparkling-3-3

Sparkling Water provides H2O functionality inside Spark cluster

17K 977 361
logicalclocks
hsfs

Python - Java/Scala API for the Hopsworks feature store

12K 55 42
yahoo
tensorflowonspark

TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.

12K 4K 940
pantsbuild
pantsbuild-pants-testutil

The Pants Build System

10K 4K 703
DataSystemsLab
geospark

A cluster computing framework for processing large-scale geospatial data

6K 2K 760
apache
pyspark-connect

Apache Spark - A unified analytics engine for large-scale data processing

4K 43K 29K
maxpoint
spylon

Utilities to work with Scala/Java code with py4j

2K 40 17
    • Data from PyPI, GitHub, ClickHouse, and BigQuery