View Single Post
Old 07-11-2009, 12:09 PM   #1
UnraisedArc
Junior Member
UnraisedArc began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Jul 2009
Device: none
[Old Thread] Auto Extract ISBN-Feature request

From the isbn website,
"Every ISBN consists of ten digits and whenever it is printed it is preceded by the letters ISBN. The ten-digit number is divided into four parts of variable length, each part separated by a hyphen."

I have some e-books in the .pdf format that are not text but images of regular paper books from a scanner. Most include a page that has an isbn number. Since these pdfs are scanned images, a text based search for the letters ISBN comes up empty. I was wondering if it would be possible to use some open source OCR software to convert the first few pages of a pdf to text and then search that text for isbn numbers and then use that to auto fill meta-data.

Thanks.
UnraisedArc is offline   Reply With Quote