PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Extractor Python Packages

Python packages with the GitHub topic extractor. Sorted by relevance, with stars and monthly downloads.
lipoja
urlextract

URLExtract is python class for collecting (extracting) URLs from given text based on locating TLD.

855K 277 64
nexB
extractcode

A mostly universal file extraction library and CLI tool to extract almost any archive in a reasonably safe way on Linux, macOS and Windows.

92K 38 23
offici5l
fcetool

Extract specific files from remote ROM.ZIP archives without downloading the full ROM

16K 32 8
fhamborg
news-please

news-please - an integrated web crawler and information extractor for news that just works

16K 2K 453
DanielJDufour
date-extractor

Extract dates from text

4K 66 14
mefistotelis
pylabview

Python reader of LabVIEW RSRC files (VI, CTL, LLB). File format description on the Wiki.

4K 126 30
diogo-alves
eml-extractor

A python package to extract attachments from .eml files (email messages saved as files)

4K 20 7
tatuylonen
wiktextract

Wiktionary dump file parser and multilingual data extractor

2K 1K 116
CBICA
openpatchminer

A patch miner for large histopathology images

1K 4 6
myifeng
article-parser

A parser that parses articles from any url or html

961 50 6
mutalyzer
mutalyzer-algebra

A Boolean Algebra for Genetic Variants

829 13 2
vishaltanwar96
aadhaar-py

Extract embedded information from Aadhaar Secure QR Code.

659 15 1
rs-develop
forioccrawler

A forensic ioc crawler and parser.

482 5 2
febos
contactextractor

Contact Extractor from PDB/mmCIF coordinate files

409 0 0
aphp
edspdf-poppler

Poppler extension for EDS-PDF

405 0 0
Arech
yacce

Non-intrusive compile_commands.json extractor for Bazel

394 5 0
samaybhavsar
copyrightextractor

Copyright Detector/Extractor - Detects and Extracts Copyright Snippet from HTML

296 3 1
sgl-umons
gigawork

A tool for extracting GitHub Actions workflows

283 8 2
aboutcode-org
android-inspector

android-inspector is a library of utilities to introspect source and binary Android apps and Android device firmware. It can be used as a plugin to ScanCode.

237 1 1
Coskon
ytget

Easily get data and download youtube videos, focused on speed and simplicity.

235 0 0
The-Nebula-Developers
complex-parser

A versatile Python package for data extraction from JSON-like structures with user-defined format keys, enhanced with synonym retrieval capabilities.

227 5 0
MikeMeliz
torcrawl

Crawl and extract (regular or onion) webpages through TOR network

211 511 89
fbernhart
officeextractor

officeextractor extracts media files (images, videos, music) from Microsoft Office and LibreOffice files.

192 5 2
taojinmin
spparser

an async ETL tool written in Python.

188 33 1
    • Data from PyPI, GitHub, ClickHouse, and BigQuery