PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Metadata Extraction Python Packages

Python packages with the GitHub topic metadata-extraction. Sorted by relevance, with stars and monthly downloads.
adbar
htmldate

Fast and robust date extraction from web pages, with Python or on the command-line

11.2M 149 30
kreuzberg-dev
kreuzberg

A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 97+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.

174K 8K 488
oduwsdl
aiu

A library for interacting with web archive collections at Archive-It, Trove, Pandora, and more.

4K 8 1
iluvcapra
wavinfo

Probe WAVE Files for all metadata

4K 43 10
kobaltcore
pymage-size

A utility package for getting image dimensions without loading files into memory. No dependencies!

3K 16 1
jakiki6
ruminant

Recursive metadata extraction tool

2K 5 1
tern-tools
tern

Tern is a software composition analysis tool and Python library that generates a Software Bill of Materials for container images and Dockerfiles. The SBOM that Tern generates will give you a layer-by-layer view of what's inside your container in a variety of formats including human-readable, JSON, HTML, SPDX and more.

2K 1K 188
lstein
photomapai

AI-based image clustering and exploration tool

1K 70 4
fvaleye
metadata-guardian

Provide an easy way with Python to protect your data sources by searching its metadata. 🛡️

723 18 1
DanTsai0903
namingpaper

CLI tool to rename academic papers using AI-extracted metadata

626 7 1
d3x-at
sd-parsers

A Python library to read metadata from images created by Stable Diffusion.

607 45 4
rsmvdl
metaspector

Python library to inspect and export metadata from MP4/M4V/M4A, MP3 and FLAC media files.

555 3 0
lttkgp
music-metadata-extractor

Fetch music metadata from common Music APIs for a variety of data sources

546 16 7
radusuciu
traktor-nowplaying

traktor_nowplaying uses Traktor's broadcast functionality to extract metadata about the currently playing song.

518 67 8
itsbigspark
pymetagen

Metadata Generator

498 0 0
mauricelambert
spyware

This package implements a complete SpyWare.

481 157 32
sdsc-ordes
gimie

Extract structured metadata from git repositories.

453 14 2
chigwell
image-text-structurizer

A new package that processes user-provided text descriptions of images and returns structured, validated outputs using pattern matching. It ensures that the generated content adheres to a predefined f

371 1 0
meysam81
sitemap-harvester

Crawl sitemap of a given website and export metadata of its pages recursively into CSV format.

339 5 0
m8sec
pymetasec

Utility to download and extract document metadata from an organization. This technique can be used to identify: domains, usernames, software/version numbers and naming conventions.

316 515 88
shantanubafna
geotcha

Extract and harmonize RNA-seq metadata from NCBI GEO

305 0 0
baughmann
tikara

The metadata and text content extractor for almost every file type.

279 9 0
ymrohit
openscenesense

A video analysis toolkit using OpenAI and Openrouter vision models

244 23 1
ankit-chaubey
surgery

Precision-focused offline CLI for viewing, editing, and stripping metadata from any media or document file

228 10 1
    • Data from PyPI, GitHub, ClickHouse, and BigQuery