View Single Post
Old 11-14-2011, 08:56 PM   #3
taylor3456
Member
taylor3456 began at the beginning.
 
Posts: 21
Karma: 10
Join Date: Jul 2011
Device: hanvon n800
Quote:
Originally Posted by DDHarriman View Post
Hello

You are right, clearscan has the lower impact.
Basically clearscan gives the most proximal result of if you would have writen and formatted the text/images of the pdf yourself.
The catch is that all the errors (incorrect or not recognized characters/word/phrases) will show, and one must correct them by hand - Acrobat is not the best tool to do this correcting - “proof reading” is the term for it.

The other two options are what one calls a “two layer” pdf: one layer is the original (or compressed) image and the other the text (the result of the ocr processing), thus occupying the size and putting (at least) the same pressure in the eBook reader as if you were just reading a pdf made from the scanned images.
In practice, for what your problem, doing an searchable image ocr (exact or not) on Acrobat is useless.

Best regards,
thank you!

I usually keep the results of clearscanned pdf and do not do the "proof reading" because the unrecognized character/word/image do not bother me very much.But I have the same feeling with you about the page turning speed with the searchable images(with or without exact)---these files should lag the page turning speed of the eink devices due to the big size!

Acctually I wanna share some results with you and other pdf readers:

I tested 50 pages of the originally same pdf document. I will turn pages 50 times on the pre-clearscanned one and after-clearscanned one.

When the pages contain many graphs or images, the reader turn pages faster with the pre-clearscanned one.
When the pages do not contain many graphs or images, there is no big difference between 2 files.

Sometimes the reader will lose some characters or part of the graph when turning pages with the after-clearscanned one.
taylor3456 is offline   Reply With Quote