sapbert
Production-grade document extraction with intelligent fallback chain: Docling -> PyMuPDF -> pdfplumber -> Tesseract