Is barebones commercial scan/ocr to PDF file adequately converted by Send-To-Kindle ?

scanewbie · 07-18-2015, 04:54 PM

Hi,

Newbie with a varied print library, but first mostly interested in scanning 20-60 year-old softcover fiction books that may not last forever. My home equipment is and will remain inadequate for some time, so looking into commercial services.

From what I see the two "lowest cost" scan services mentioned here repeatedly - 1dollarscan.com and bookscan.us - both offer a basic scan to PDF with OCR/text overlay (before adding more costly prep and conversion options).

I've noted threads on problems with further conversion to eBook formats - perhaps my forum search-terms have been inadequate - but no comment on simply mailing such an OCRd-PDF to a Send-to-Kindle e-mail address for conversion.

Does that conversion not work as well (or as poorly) as other options?

Thanks.

Toxaris · 07-19-2015, 06:52 AM

Scanning is easy. OCR is easy. The part that comes next is hard and takes time. That is fixing the OCR errors and creating a good book out of it. The scanning and OCR can be largely automated and can be offered low costs. The rest is very much 'you get what you pay for'. So good quality means bigger bills. The end format is actually not that important, ePUB of Kindle. For a good Kindle book you also need a good ePUB (or so I am told).

PDF is absolutely the worst format to create an ePUB or Kindle book from. Most of these 'services' use a variant of Calibre to create their conversion, most without any form of post processing. Personally I am not to fond of the Calibre conversions. Mind you, that is my personal opinion. Calibre is a great product in itself.

scanewbie · 07-19-2015, 08:31 PM

Appreciate the comments. As a Noob I'm still absorbing it all.

Yes, I was wondering why the package of PDF with OCR text overlay was "standard" rather than an image file and a text file. Although OCR/PDF conversion issues help explain the "experience" of reading an out-of-print out-of-copyright Kindle book an unknown seller offers for $0.99.

My miniscule past OCR experience has been single-pages on a flatbed scanner with either OmniPage back in the 20th Century, or Acrobat Pro later. Guess I was hoping that OCR software had gotten much smarter.

If I were a touch typist it always would have been faster than locating and correcting all the OCR errors - every time. Now that I think of it, wouldn't be surprised if manual re-typing would be cost-effective for the big-guys taking advantage of exchange-rate and labor costs in other countries.

Toxaris · 07-20-2015, 03:10 AM

Oh, OCR software has gotten a whole lot smarter since you worked with it. The error rate is way down, but there always will be typical OCR errors. Also GIGO plays a big role here. The better the source, the better the results. The main OCR player nowadays is ABBYY Finereader.

Re-typing is not cost-effective. It will cause other errors yet again, which will also be spotted only by proof-reading.

It is not without reason that I made my Word add-in. It is designed to take the output from the OCR process and either fix errors automatically or give you the tools to fix them. It saves me an enormous amount of time in digitizing a text.

The PDF with OCR text overlay is useful. I use it as well. If I find some strange text where I think there is an error but I am not quite sure what it should be, I use that one. It enables me to search quickly to the correct point and then see the original.

scanewbie · 07-20-2015, 05:54 PM

Thank you again for sharing your expertise. I've taken a quick look through the links to your website, and definitely will spend more time there to understand the capabilities of the useful tools you've created.

07-18-2015, 04:54 PM	#1
scanewbie Junior Member Posts: 3 Karma: 10 Join Date: Jul 2015 Device: Kindle, and others	Is barebones commercial scan/ocr to PDF file adequately converted by Send-To-Kindle ? Hi, Newbie with a varied print library, but first mostly interested in scanning 20-60 year-old softcover fiction books that may not last forever. My home equipment is and will remain inadequate for some time, so looking into commercial services. From what I see the two "lowest cost" scan services mentioned here repeatedly - 1dollarscan.com and bookscan.us - both offer a basic scan to PDF with OCR/text overlay (before adding more costly prep and conversion options). I've noted threads on problems with further conversion to eBook formats - perhaps my forum search-terms have been inadequate - but no comment on simply mailing such an OCRd-PDF to a Send-to-Kindle e-mail address for conversion. Does that conversion not work as well (or as poorly) as other options? Thanks.

07-19-2015, 08:31 PM	#3
scanewbie Junior Member Posts: 3 Karma: 10 Join Date: Jul 2015 Device: Kindle, and others	Appreciate the comments. As a Noob I'm still absorbing it all. Yes, I was wondering why the package of PDF with OCR text overlay was "standard" rather than an image file and a text file. Although OCR/PDF conversion issues help explain the "experience" of reading an out-of-print out-of-copyright Kindle book an unknown seller offers for $0.99. My miniscule past OCR experience has been single-pages on a flatbed scanner with either OmniPage back in the 20th Century, or Acrobat Pro later. Guess I was hoping that OCR software had gotten much smarter. If I were a touch typist it always would have been faster than locating and correcting all the OCR errors - every time. Now that I think of it, wouldn't be surprised if manual re-typing would be cost-effective for the big-guys taking advantage of exchange-rate and labor costs in other countries. Last edited by scanewbie; 07-19-2015 at 08:39 PM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Send epub to Kindle but don't keep the converted file?	Toxeus	Library Management	5	10-03-2012 08:25 PM
Book scan -> pdf -> Kindle Touch - problems	rainsparade	PDF	4	05-29-2012 01:55 PM
cleanup post scan PDF file	wastewater	Workshop	1	01-23-2012 10:43 AM
commercial on-demand book scan service?	miquele	General Discussions	2	12-20-2011 02:53 PM
How to convert an OCR file to a Non-OCR one	res9282	PDF	1	08-05-2011 05:58 AM

07-19-2015, 06:52 AM	#2
Toxaris Wizard Posts: 4,520 Karma: 121692313 Join Date: Oct 2009 Location: Heemskerk, NL Device: PRS-T1, Kobo Touch, Kobo Aura	Scanning is easy. OCR is easy. The part that comes next is hard and takes time. That is fixing the OCR errors and creating a good book out of it. The scanning and OCR can be largely automated and can be offered low costs. The rest is very much 'you get what you pay for'. So good quality means bigger bills. The end format is actually not that important, ePUB of Kindle. For a good Kindle book you also need a good ePUB (or so I am told). PDF is absolutely the worst format to create an ePUB or Kindle book from. Most of these 'services' use a variant of Calibre to create their conversion, most without any form of post processing. Personally I am not to fond of the Calibre conversions. Mind you, that is my personal opinion. Calibre is a great product in itself.

07-20-2015, 03:10 AM	#4
Toxaris Wizard Posts: 4,520 Karma: 121692313 Join Date: Oct 2009 Location: Heemskerk, NL Device: PRS-T1, Kobo Touch, Kobo Aura	Oh, OCR software has gotten a whole lot smarter since you worked with it. The error rate is way down, but there always will be typical OCR errors. Also GIGO plays a big role here. The better the source, the better the results. The main OCR player nowadays is ABBYY Finereader. Re-typing is not cost-effective. It will cause other errors yet again, which will also be spotted only by proof-reading. It is not without reason that I made my Word add-in. It is designed to take the output from the OCR process and either fix errors automatically or give you the tools to fix them. It saves me an enormous amount of time in digitizing a text. The PDF with OCR text overlay is useful. I use it as well. If I find some strange text where I think there is an error but I am not quite sure what it should be, I use that one. It enables me to search quickly to the correct point and then see the original.

07-20-2015, 05:54 PM	#5
scanewbie Junior Member Posts: 3 Karma: 10 Join Date: Jul 2015 Device: Kindle, and others	Thank you again for sharing your expertise. I've taken a quick look through the links to your website, and definitely will spend more time there to understand the capabilities of the useful tools you've created.

Advert

Advert