PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Vision Transformer Python Packages

Python packages with the GitHub topic vision-transformer. Sorted by relevance, with stars and monthly downloads.
open-mmlab
mmdet

OpenMMLab Detection Toolbox and Benchmark

392K 33K 10K
Blaizzy
mlx-vlm

MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.

383K 5K 539
open-mmlab
mmcls

OpenMMLab Pre-training Toolbox and Benchmark

27K 4K 1K
open-mmlab
mmpretrain

OpenMMLab Pre-training Toolbox and Benchmark

21K 4K 1K
lukas-blecher
pix2tex

pix2tex: Using a ViT to convert images of equations into LaTeX code.

12K 16K 1K
NVlabs
mambavision

[CVPR 2025] Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone

3K 2K 139
emcf
thepipe-api

Get clean data from tricky documents, powered by VLMs.

2K 2K 99
towhee-io
towhee

Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

2K 3K 261
veb-101
attention-and-transformers

Transformers goes brrr... Attention and Transformers from scratch in TensorFlow. Currently contains Vision transformers, MobileViT-v1, MobileViT-v2, MobileViT-v3

2K 14 2
sovit-123
vision-transformers

Vision Transformers for image classification, image segmentation, and object detection.

2K 68 9
kyegomez
clipq

A simple implementation of a CLIP that splits up an image into quandrants and then gets the embeddings for each quandrant

2K 7 1
fmegahed
conformal-clip

Few-shot CLIP classification with conformal prediction, probability calibration, and reliability metrics.

2K 0 0
towhee-io
towhee-models

Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

1K 3K 261
mist-medical
mist-medical

MIST is a simple and scalable end-to-end framework for medical imaging segmentation.

1K 53 15
NVlabs
fastervit

[ICLR 2024] Official PyTorch implementation of FasterViT: Fast Vision Transformers with Hierarchical Attention

1K 915 69
alibaba
pai-easycv

An all-in-one toolkit for computer vision

1K 2K 225
evanatyourservice
image-classification-jax

Image classification in JAX with ViT, resnet, cifar10, cifar100, imagenette, and imagenet

1K 3 0
rwightman
timme

timm, evolved

1K 49 0
mit-han-lab
efficientvit-gml

open-set object detector

998 3K 244
DavidLandup0
deepvision-toolkit

PyTorch and TensorFlow/Keras image models with automatic weight conversions and equal API/implementations - Vision Transformer (ViT), ResNetV2, EfficientNetV2, NeRF, SegFormer, MixTransformer, (planned...) DeepLabV3+, ConvNeXtV2, YOLO, etc.

906 42 7
martinsbruveris
tfimm

TensorFlow port of PyTorch Image Models (timm) - image models with pretrained weights

780 291 25
autodistill
autodistill-vit

ViT module for use with autodistill.

720 4 0
TheoCoombes
clipcap

Using pretrained encoder and language models to generate captions from multimedia inputs.

676 100 14
vballoli
vit-flax

Implementation of Vision Transformers in Flax

667 18 2
    • Data from PyPI, GitHub, ClickHouse, and BigQuery