spark-datasource
PDF DataSource for Apache Spark, allow to read PDF files directly to the DataFrame and ocr it