View Single Post
Old 05-04-2011, 05:28 AM   #6
DSpider
Evangelist
DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.
 
DSpider's Avatar
 
Posts: 450
Karma: 343115
Join Date: Nov 2009
Location: Romania
Device: PW2 2014
Thumbs up

You don't have to OCR. But if you'd like to search, highlight, reference and so on, it would be ideal. Otherwise you could use Scan Tailor on the scans, pack them in a PDF and you'd be done. But OCR-ing usually results in a much higher quality output - and quality trumps quantity every time.

Pros:
- cleaner text (free from printing flaws)
- lower filesize
- faster rendering and page flipping on portable (which are usually slower) devices
- fully search-able
- highlighting text is possible
- dictionary look-up
- reflow-able text (ePUB, MOBI, etc.)
- body fonts can be replaced if the user wants to
- www and email links are click-able
- footnotes can be added to the end of the document instead of getting in your face
- in-document references (for instance you could simply click "See page 91")
- text-to-speech (for the visually impaired)

...and maybe more.

Cons:
- proof-reading takes time
- layout takes time
- vectorizing the cover takes time (optional)
- font matching takes time (again, optional) - that's if the font is even available. If not, you'd have to edit a similar font which would take even more time (at least until you get the hang of it)

Is it worth it ? Oh yeah. Like I said, quality trumps quantity. Always. Especially if it's a good book, it's worth it. It's always a pleasure to read a book with smooth text than with jagged, partial, half characters.


Think about it. Out of those 3000 books, which are the top, say, 30 you'd like to keep ? The rest I would probably just archive with Scan Tailor (grayscale), keeping the correct layout, etc. Also, while black and white TIFFs can have a huge impact on filesize (especially in .djvu format), they could prove difficult to OCR in the future as most OCR software have filters that were tweaked to work better with grayscale images. B&W TIFFs can sometimes remove details that would help OCR-ing differentiate tl from a d, for example.

Last edited by DSpider; 05-04-2011 at 05:45 AM.
DSpider is offline   Reply With Quote