MobileRead Forums - View Single Post - [Old Thread] Auto Extract ISBN-Feature request

UnraisedArc · 07-11-2009, 12:09 PM

From the isbn website,
"Every ISBN consists of ten digits and whenever it is printed it is preceded by the letters ISBN. The ten-digit number is divided into four parts of variable length, each part separated by a hyphen."

I have some e-books in the .pdf format that are not text but images of regular paper books from a scanner. Most include a page that has an isbn number. Since these pdfs are scanned images, a text based search for the letters ISBN comes up empty. I was wondering if it would be possible to use some open source OCR software to convert the first few pages of a pdf to text and then search that text for isbn numbers and then use that to auto fill meta-data.

Thanks.

07-11-2009, 12:09 PM	#1
UnraisedArc Junior Member Posts: 8 Karma: 10 Join Date: Jul 2009 Device: none	[Old Thread] Auto Extract ISBN-Feature request From the isbn website, "Every ISBN consists of ten digits and whenever it is printed it is preceded by the letters ISBN. The ten-digit number is divided into four parts of variable length, each part separated by a hyphen." I have some e-books in the .pdf format that are not text but images of regular paper books from a scanner. Most include a page that has an isbn number. Since these pdfs are scanned images, a text based search for the letters ISBN comes up empty. I was wondering if it would be possible to use some open source OCR software to convert the first few pages of a pdf to text and then search that text for isbn numbers and then use that to auto fill meta-data. Thanks.