PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Multimodal Python Packages

Python packages with the GitHub topic multimodal. Sorted by relevance, with stars and monthly downloads.
embeddings-benchmark
mteb

MTEB: Massive Text Embedding Benchmark

2.8M 3K 614
yzhao062
pyod

A Python library for anomaly detection across tabular, time series, graph, text, and image data. 60+ detectors, benchmark-backed ADEngine orchestration, and an agentic workflow for AI agents.

2.7M 10K 1K
rerun-io
rerun-sdk

Visualize, query, and stream to train on multimodal robotics data.

2.4M 11K 734
Eventual-Inc
daft

High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale

810K 5K 474
vortex-data
vortex-data

An extensible, state-of-the-art framework for columnar compression, and the fastest FOSS columnar file format. Formerly at @spiraldb, now an Incubation Stage project at LFAI&Data, part of the Linux Foundation.

340K 3K 153
activeloopai
deeplake

Deeplake is AI Data Runtime for Agents. It provides serverless postgres with a multimodal datalake, enabling scalable retrieval and training.

227K 9K 710
vlm-run
vlmrun-hub

A hub for various industry-specific schemas to be used with VLMs.

197K 544 24
bentoml
bentoml

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

186K 9K 965
Blaizzy
mlx-audio

A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.

186K 7K 594
modelscope
ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-R1, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Gemma4, Llava, Phi4, ...) (AAAI 2025).

141K 14K 1K
docarray
docarray

Represent, send, store and search multimodal data

129K 3K 243
jina-ai
jina

☁️ Build multimodal AI applications with cloud-native stack

97K 22K 2K
microsoft
torchscale

Foundation Architecture for (M)LLMs

72K 3K 225
rom1504
img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

61K 4K 374
datachain-ai
datachain

The Context Layer for unstructured data: typed, versioned datasets over S3, GCS, Azure

43K 3K 143
vllm-project
vllm-omni

A framework for efficient model inference with omni-modality models

40K 5K 933
open-mmlab
mmcls

OpenMMLab Pre-training Toolbox and Benchmark

27K 4K 1K
Stability-AI
stability-sdk

SDK for interacting with stability.ai APIs (e.g. stable diffusion inference)

25K 2K 344
McGill-NLP
weblinx

WebLINX is a benchmark for building web navigation agents with conversational capabilities

21K 160 17
open-mmlab
mmpretrain

OpenMMLab Pre-training Toolbox and Benchmark

21K 4K 1K
ai-bot-pro
achatbot

An open source chat bot architecture for voice/vision (and multimodal) assistants, local(CPU/GPU bound) and remote(I/O bound) to run.

18K 89 18
anam-org
metaxy

Pluggable sample-level metadata versioning for incremental multimodal pipelines.

18K 97 6
predict-idlab
tsflex

Flexible time series feature extraction & processing

17K 438 28
video-db
videodb

VideoDB Python SDK

14K 95 15
    • Data from PyPI, GitHub, ClickHouse, and BigQuery