[Old Thread] Auto Extract ISBN-Feature request
From the isbn website,
"Every ISBN consists of ten digits and whenever it is printed it is preceded by the letters ISBN. The ten-digit number is divided into four parts of variable length, each part separated by a hyphen."
I have some e-books in the .pdf format that are not text but images of regular paper books from a scanner. Most include a page that has an isbn number. Since these pdfs are scanned images, a text based search for the letters ISBN comes up empty. I was wondering if it would be possible to use some open source OCR software to convert the first few pages of a pdf to text and then search that text for isbn numbers and then use that to auto fill meta-data.
Thanks.
|