Quote:
Originally Posted by xristy
O'Reilly is a publisher which does charge a one-time fee for the set of mobi/ePub/PDF/Daisy with no DRM! Certainly it is a pricing choice and nothing inherent in the use of multiple sized PDFs.
|
I really haven't paid much attention to the technical book market since I finished college.
I now just get outraged when books are over $30!

I can't imagine myself paying hundreds of dollars for any books any more.
Especially with a lot of the technical fields I am interested in (programming, math, economics, physics), you can find perfectly good material FOR FREE. If I ever did go purchase a physical book on the topic, there is sure as hell no way I would go for the latest/"greatest" edition.
Quote:
Originally Posted by xristy
As I have mentioned, I get very good results with Acrobat X and good quality scans.
|
And again, the key here is "good quality scans". In practice, this is the exception, not the rule.
In many cases, you cannot get the good quality scan!
Either they paid a crappy scanning company to scan the book (as you can see, crappy/cheap solutions bring headaches later), the book itself is so old that it is degraded (water stains), the book is rare (so this is the only copy that you have), someone wrote in the book (this one makes me want to pull my hair out! NEVER WRITE IN YOUR BOOKS OR YOU WILL SUFFER MY WRATH!

).
For example, here is one of the most egregious examples (~50 out of 576 pages were marked BADLY)... a few were marked minorly (I was able to fix those before OCR):
It doesn't matter what amazing PDF reader you are using on your tablet, there is no way you can get that scan as good as that EPUB.
But yes, having a great scan goes a great way in speeding up the OCR process and making it more accurate. It can chop down a process that would take me a few hours, down to less than an hour (this is with me double-checking the areas marked as "unsure" by the OCR).
Quote:
Originally Posted by xristy
I don't know what Archive.org is doing but their results are not very uplifting as far as OCR'd PDFs and searching.
|
Well, most of their stuff is in the "not great scan" category (mostly because the books are so old). They run it through OCR with no human intervention (I believe they use the Finereader engine (?)), and while it is "99.8%" accurate (or something like that), there are still a bunch of errors (which is why you pay for a human to look through it and fix it).