PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Vision Language Model Python Packages

Python packages with the GitHub topic vision-language-model. Sorted by relevance, with stars and monthly downloads.
Blaizzy
mlx-vlm

MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.

383K 5K 539
illuin-tech
colpali-engine

The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.

149K 3K 251
EvolvingLMMs-Lab
lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

9K 4K 585
ARahim3
mlx-tune

Fine-tune LLMs on your Mac with Apple Silicon. SFT, DPO, GRPO, Vision, TTS, STT, Embedding, and OCR fine-tuning — natively on MLX. Unsloth-compatible API.

9K 1K 80
CVHub520
x-anylabeling-cvhub

Effortless data labeling with AI support from Segment Anything and other awesome models.

6K 9K 994
illuin-tech
vidore-benchmark

Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.

4K 272 35
emcf
thepipe-api

Get clean data from tricky documents, powered by VLMs.

2K 2K 99
billbillbilly
urban-worm

Workflow of reproducible multimodal inference for urban environment evaluation.

2K 5 4
CVHub520
x-anylabeling

Effortless data labeling with AI support from Segment Anything and other awesome models.

1K 9K 994
lica-world
lica-gdb

GDB: GraphicDesignBench - A real-world benchmark for evaluating AI on graphic design tasks

1K 6 1
zhudotexe
kani-vision

Kani extension for supporting vision-language models (VLMs). Comes with model-agnostic support for GPT-Vision and LLaVA.

1K 7 0
dvlab-research
visionzip

Official repository for VisionZip (CVPR 2025)

1K 431 27
haotian-liu
llava-torch

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

968 25K 3K
mbodiai
mbodied

Seamlessly integrate state-of-the-art transformer models into robotics stacks

948 286 32
Nerif-AI
nerif

LLM powered Python

752 15 5
lica-world
lica-gdb-helm

GDB: GraphicDesignBench - A real-world benchmark for evaluating AI on graphic design tasks

644 6 1
gptscript-ai
gptparse

Document parser for RAG

618 28 2
ymrohit
openscenesense-ollama

Offline video analysis using Ollama models and local Whisper

550 46 9
NVlabs
ps3-torch

Scaling Vision Pre-Training to 4K Resolution

540 227 10
zhudotexe
kani-multimodal-core

Core shared libraries for multimodal Kani extensions.

418 2 0
lhzn-io
kanoa

AI-powered interpretation of data science outputs with multi-backend support (Molmo, Gemini, Claude, OpenAI)

397 1 0
ARahim3
unsloth-mlx

Fine-tune LLMs on your Mac with Apple Silicon. SFT, DPO, GRPO, Vision, TTS, STT, Embedding, and OCR fine-tuning — natively on MLX. Unsloth-compatible API.

389 1K 81
Keyvanhardani
german-ocr

High-performance German document OCR - Local & Cloud with GPU/CPU support

389 107 6
Blaizzy
mlx-vlm-nell

MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.

373 5K 539
    • Data from PyPI, GitHub, ClickHouse, and BigQuery