Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Related Tools

Notices

Reply
 
Thread Tools Search this Thread
Old 04-12-2014, 01:48 AM   #1
Noobish
Junior Member
Noobish can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterNoobish can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterNoobish can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterNoobish can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterNoobish can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterNoobish can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterNoobish can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterNoobish can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterNoobish can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterNoobish can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterNoobish can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
Posts: 4
Karma: 12584
Join Date: Apr 2014
Device: none
ISBN Extraction with OCR

hello , is there a utility to extract isbn from ebook using OCR? the Extract ISBN plugin is good, but for books which their first pages are in image formats it won't work.

if there is none, anyone interested in such utility? i can develop it via C# if anyone is interested.

Last edited by Noobish; 04-12-2014 at 02:45 AM.
Noobish is offline   Reply With Quote
Old 04-13-2014, 02:45 AM   #2
Noobish
Junior Member
Noobish can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterNoobish can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterNoobish can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterNoobish can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterNoobish can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterNoobish can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterNoobish can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterNoobish can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterNoobish can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterNoobish can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterNoobish can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
Posts: 4
Karma: 12584
Join Date: Apr 2014
Device: none
ISBN Extraction with OCR Utility

After experimenting the Extract ISBN on Claibre, let me say it's an excellent addon, but also has many false positives in my experience. So I developed a "little" utility to extract ISBN by searching text in first 10 pages , if that fails it performs OCR on first 10 pages, if a valid isbn is found, u can rename or make a copy of these files to a chose directory, adding isbn to filename.



Requirements:
1- Attached is the program, u need http://www.microsoft.com/en-us/downl....aspx?id=40779
2- Download tesseract: http://tesseract-ocr.googlecode.com/...2-portable.zip
Extract , copy the tessdata folder to the "\Release" directory.
3- Might also need Visual C++ runtime if the program fails to start.

Features (currently):
Works only with non-encrypted and non-password protected pdf files.
Should be safe if you have multiple files with same name in different directories.

usage:
1- trigger the program, choose ur output directory from settings tab, save settings, restart the program and check whether the correct output directory is correct.
2- click the scan folders button , choose the BASE DIRECTORY OF YOUR BOOKS.
3- Once the List is populated click Start Search.
4- Once Searching is finished click "Save and CleanUP Button".

To import the recognized books into calibre:
Calibre Preferences-> Adding Books
copy: (?P<isbn>[0-9xX]+)
and paste it at Regular Expression.
Uncheck Read Metadata from File contents rather than file name.
Apply, save, [Restart Calibre].
import books normally.
Download metadata in bulk for imported books.

To Be Added (hav not decided yet):
Support for image enhancement b4 OCR.
Scanning page as image:
produce much more accurate results at the cost of speed.
a workaround for recognizing isbn from text books which stores information not in the same order as seen, using text objects...etc


Comments/Replies/Reporting Bugs/..etc appreciated.

NOTE: MAKE A COPY FIRST OF YOUR FILES, BEFORE USING IT.
Attached Files
File Type: rar Release.rar (4.34 MB, 37 views)

Last edited by Noobish; 04-13-2014 at 04:03 AM.
Noobish is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
no text extraction for pdf with images and OCR fxp33 Conversion 6 05-09-2013 03:51 AM
ASIAN, ISBN and ISBN-13 jbcohen General Discussions 2 04-02-2013 02:27 PM
How to convert an OCR file to a Non-OCR one res9282 PDF 1 08-05-2011 05:58 AM
Stupid Question: ISBN-10 and ISBN-13 Tegan Library Management 4 03-11-2011 01:20 AM
PDF extraction – what is the best tool? Prospect PDF 21 09-27-2009 01:34 AM


All times are GMT -4. The time now is 04:38 AM.


MobileRead.com is a privately owned, operated and funded community.