PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Information Retrieval Python Packages

Python packages with the GitHub topic information-retrieval. Sorted by relevance, with stars and monthly downloads.
dorianbrown
rank-bm25

A Collection of BM25 Algorithms in Python

6.5M 1K 107
Unstructured-IO
unstructured

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

5.4M 15K 1K
RaRe-Technologies
gensim

Topic Modelling for Humans

5M 16K 4K
ashvardanian
stringzilla

Up to 100x faster strings for C, C++, CUDA, Python, Rust, Swift, JS, & Go, leveraging NEON, AVX2, AVX-512, SVE, GPGPU, & SWAR to accelerate search, hashing, sorting, edit distances, sketches, and memory ops 🦖

4.9M 3K 125
ashvardanian
simsimd

SIMD-accelerated distances, dot products, matrix ops, geospatial & geometric kernels for 16 numeric types — from 6-bit floats to 64-bit complex — across x86, Arm, RISC-V, and WASM, with bindings for Python, Rust, C, C++, Swift, JS, and Go 📐

4.3M 2K 119
jaidedai
easyocr

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

3M 29K 4K
embeddings-benchmark
mteb

MTEB: Massive Text Embedding Benchmark

2.8M 3K 614
xhluca
bm25s

Fast BM25 search in Python, powered by Numpy and Numba

1.5M 2K 99
deepset-ai
haystack-ai

Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit control over retrieval, routing, memory, and generation. Built for scalable agents, RAG, multimodal applications, semantic search, and conversational systems.

893K 25K 3K
allenai
ir-datasets

Provides a common interface to many IR ranking datasets.

645K 390 52
FlagOpen
flagembedding

Retrieval and Retrieval-augmented LLMs

529K 12K 876
HKUNLP
instructorembedding

[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings

439K 2K 157
ashvardanian
numkong

SIMD-accelerated distances, dot products, matrix ops, geospatial & geometric kernels for 16 numeric types — from 6-bit floats to 64-bit complex — across x86, Arm, RISC-V, and WASM, with bindings for Python, Rust, C, C++, Swift, JS, and Go 📐

387K 2K 119
oeken
needle-python

Needle simplifies building RAG pipelines.

238K 45 2
rapidsai
pylibraft-cu12

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.

194K 1K 232
rapidsai
libraft-cu12

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.

173K 1K 232
illuin-tech
colpali-engine

The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.

149K 3K 251
AmenRa
ranx

⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍

121K 677 31
lightonai
fast-plaid

High-Performance Engine for Multi-Vector Search

109K 256 24
cvangysel
pytrec-eval

pytrec_eval is an Information Retrieval evaluation tool for Python, based on the popular trec_eval.

84K 346 36
rapidsai
raft-dask-cu12

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.

82K 1K 232
lightonai
pylate

Late Interaction Models Training & Retrieval

74K 811 81
deepset-ai
farm-haystack

Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit control over retrieval, routing, memory, and generation. Built for scalable agents, RAG, multimodal applications, semantic search, and conversational systems.

73K 25K 3K
tensorflow
tensorflow-ranking

Learning to Rank in TensorFlow

70K 3K 477
    • Data from PyPI, GitHub, ClickHouse, and BigQuery