|
|
Thread Tools | Search this Thread |
02-20-2017, 10:20 AM | #1 |
Junior Member
Posts: 1
Karma: 10
Join Date: Feb 2017
Device: kindle
|
Fully Automated ebook file parsing, ISBN extraction, Titel Extraction and metadata
Why is there no software that goes through a directory, converts the PDF, EPUB, oet other format to text. then agressively searches the text for ISBN number, title etc. Corrects the metadata of the ebook. Also extracts the IMG for tesseract OCR to check if the title can be deduced. Library of Congress entries are also good sources.
parsing PDF's can also be done with python modules for eve nmore effective automatic library cleaning. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Pocketbook Annotations Extraction | michailbachtin | PocketBook Developer's Corner | 46 | 07-19-2023 06:31 AM |
ISBN Extraction with OCR | Noobish | Related Tools | 1 | 04-13-2014 02:45 AM |
Ebook: Amazing chroma green screen extraction with Photoshop | spaze | Self-Promotions by Authors and Publishers | 3 | 03-02-2011 09:48 AM |
PDF extraction – what is the best tool? | Prospect | 21 | 09-27-2009 01:34 AM | |
Mobi format metadata extraction issues | FrancisT | Calibre | 7 | 01-22-2009 01:34 AM |