Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Reading and Management

Notices

Reply
 
Thread Tools Search this Thread
Old 02-20-2017, 10:20 AM   #1
isbnread
Junior Member
isbnread began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Feb 2017
Device: kindle
Angry Fully Automated ebook file parsing, ISBN extraction, Titel Extraction and metadata

Why is there no software that goes through a directory, converts the PDF, EPUB, oet other format to text. then agressively searches the text for ISBN number, title etc. Corrects the metadata of the ebook. Also extracts the IMG for tesseract OCR to check if the title can be deduced. Library of Congress entries are also good sources.

parsing PDF's can also be done with python modules for eve nmore effective automatic library cleaning.
isbnread is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Pocketbook Annotations Extraction michailbachtin PocketBook Developer's Corner 46 07-19-2023 06:31 AM
ISBN Extraction with OCR Noobish Related Tools 1 04-13-2014 02:45 AM
Ebook: Amazing chroma green screen extraction with Photoshop spaze Self-Promotions by Authors and Publishers 3 03-02-2011 09:48 AM
PDF extraction – what is the best tool? Prospect PDF 21 09-27-2009 01:34 AM
Mobi format metadata extraction issues FrancisT Calibre 7 01-22-2009 01:34 AM


All times are GMT -4. The time now is 04:05 AM.


MobileRead.com is a privately owned, operated and funded community.