PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Chunking Python Packages

Python packages with the GitHub topic chunking. Sorted by relevance, with stars and monthly downloads.
isaacus-dev
semchunk

A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.

4M 628 40
iscc
fastcdc

FastCDC implementation in Python https://pypi.org/project/fastcdc/

56K 65 17
OpenVoiceOS
quebra-frases

chunks strings into byte sized pieces

15K 1 3
yonk-labs
chunkshop

Standalone ingest-to-pgvector: source → chunker → embedder → extractor → table. int8 by default.

3K 0 0
carlosplanchon
betterhtmlchunking

BetterHTMLChunking is a Python library for intelligent HTML segmentation. It builds a DOM tree from raw HTML and extracts content-rich regions of interest, making content analysis effortless. Great for LLM based processing.

2K 56 7
speedyk-005
chunklet-py

High-fidelity context-aware chunking and interactive visualization for RAG. Advanced segmentation for code and documents, because your LLM is only as smart as the fragments you feed it.

2K 77 2
AmitoVrito
chunkrank

Model-aware text chunking and answer re-ranking for LLM pipelines. Automatically adapts chunk size to tokenizer and context window, then consolidates and ranks answers across chunks.

1K 1 0
shantanu-deshmukh
chunktuner

Benchmark chunking strategies for your RAG corpus. Get a recommended config. CLI, Python library, and MCP server.

1K - -
ENDEVSOLS
longparser

Privacy-first document intelligence engine — parse PDFs, DOCX, PPTX, XLSX & CSV into AI-ready chunks for RAG pipelines. Includes HITL review, 3-layer memory chat, and a production FastAPI server.

1K 26 2
PebbleRoad
table2rules

Convert HTML tables into flat, self-contained facts for LLMs and RAG pipelines.

1K 2 1
langformers
langformers

🚀 Unified NLP Pipelines for Language Models

1K 19 1
starthackHQ
contextinator

Filesystem tools for AI agents with optional RAG capabilities

1K 22 5
speedyk-005
chunklet

A smart multilingual text chunker for LLMs, RAG, and beyond.

681 77 2
vinerya
faiss-vector-aggregator

This Python library provides a suite of advanced methods for aggregating multiple embeddings associated with a single document or entity into a single representative embedding.

673 2 1
oguzhankir
omnichunk

Structure-aware text chunking library for code, prose, and markup files. Intelligently splits files into context-rich chunks while preserving semantic boundaries. Supports 15+ programming languages, deterministic output, and zero external dependencies. Perfect for RAG systems, code analysis, and LLM context optimization.

613 8 0
mirth
chonky

Chonky is a Python library that intelligently segments text into meaningful semantic chunks using a fine-tuned transformer model.

567 408 15
lazyFrogLOL
llmdocparser

A package for parsing PDFs and analyzing their content using LLMs.

548 271 8
imbue
django-query-chunk

Django Query Chunk is used to split big queries into multiple chunks for prevent (too high) memory usage.

509 1 1
aimsise
seedbraid

Reference-based file reconstruction with CDC chunking, SBD1 binary seed format, and IPFS transport

478 0 0
swarmauri
swarmauri-xmp-gif

GIF handler for embedding and extracting XMP packets in Swarmauri runtimes.

451 104 47
fujiba
llm-pdf-chunker

LLM-friendly PDF splitter & image optimizer. Chunk PDFs by size and downsample images for RAG/Bedrock.

413 0 0
Kubenew
ragpipe-lite

ragpipe-lite: unified RAG ingestion pipeline (loaders, chunking, embeddings, vector store export).

355 1 0
xyb
chunksum

Print FastCDC rolling hash chunks and checksums.

349 2 0
cmlonder
notebooklm-chunker

Heading-aware PDF chunking with resumable source and Studio workflows for turning long documents into interactive NotebookLM learning kits.

322 18 3
    • Data from PyPI, GitHub, ClickHouse, and BigQuery