PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Layout Analysis Python Packages

Python packages with the GitHub topic layout-analysis. Sorted by relevance, with stars and monthly downloads.
opendatalab
mineru

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

305K 64K 5K
Layout-Parser
layoutparser

A Unified Toolkit for Deep Learning Based Document Image Analysis

170K 6K 534
opendatalab
magic-pdf

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

76K 64K 5K
u9401066
asset-aware-mcp

Asset-Aware MCP Server — AI Agent precisely accesses tables, figures, sections from PDFs + .docx round-trip editing (DFM) with 46 tools / 13 resources, segmentation export, layout overlay, OCR preprocessing, knowledge graph (LightRAG)

11K 0 1
breezedeus
pix2text

An Open-Source Python3 tool with SMALL models for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported.

9K 3K 271
opendatalab
mineru-selfhosted-mcp

MCP bridge for a self-hosted MinerU API

5K 64K 5K
RapidAI
rapid-layout

Analysis of Chinese and English layouts 中英文版面分析

4K 270 21
yuvaraj3855
preocr

Fast document classification and OCR detection. Analyzes any file type to determine if OCR is needed, saving time and money on unnecessary processing.

3K 10 4
yoshihikoueno
pdf-layout-scanner

No description available

1K 7 3
mindspore-lab
mindocr

A toolbox of ocr models and algorithms based on MindSpore

395 301 62
Kubenew
pdf2struct

`pdf2struct` extracts structured JSON from PDF documents.

392 1 0
Magnet-AI
quanta-pdf

Advanced PDF layout analysis engine for extracting figures, tables, and structured content from complex engineering documents using computer vision and machine learning.

276 2 1
MBAigner
pdfsegmenter

This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified and returned. Tables are retrieved formatted as a CSV.

178 23 3
opendatalab
xh-pdf-parser

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

141 64K 5K
chigwell
text2design

A new package that enables users to input textual descriptions of visual design, layout, or interface concepts and returns structured representations or annotations derived from the description. It le

135 1 0
opendatalab
lazyllm-magic-pdf

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

124 64K 5K
sulzbals
ocrd-gbn

Collection of OCR-D compliant tools for layout analysis and segmentation of historical german-language documents published in Brazil

117 11 0
mindspore-lab
opensourcedot-mindocr

A toolbox of ocr models and algorithms based on MindSpore

95 301 62
ixalodecte
filestruct

A python package to structure files using visual and style informations

90 1 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery