View Single Post
Old 09-25-2006, 01:07 PM   #9
bowerbird
Banned
bowerbird has been very, very naughtybowerbird has been very, very naughtybowerbird has been very, very naughty
 
Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
there's no better combination than the optic3600 and finereader.
if you want to do the scanning and the o.c.r. yourself, that is...

but since you specified an "out of copyright" book, you should
look around cyberspace to see if it has already been scanned.
google now has 100,000+ done, and is doing more every day.

if you can get the scans from somewhere, then the easiest
thing to do is to wrap them into a .pdf and just start reading.

of course, the text from such a "book" cannot be _searched_,
or _copied_, or _resized_ for greater readability, nor can it be
_reflowed_ so as to better fit varying screensizes. but if all that
doesn't bother you, then there's no reason to do any more work.

and remember that if you got the scans from google, you can
always return to google whenever you want to search the book.
plus, if you want to copy the text from a page or two, you can
o.c.r. just those page-images; you don't need the whole book.
so you might be able to live comfortably with those limitations.

but if you do want to do the o.c.r., you should know that it is
_not_ that difficult to clean the results and format the e-book.

i'll be posting some messages to the "bookpeople" listserve
this week that walk people through the process with an actual
scan-set that i downloaded from google. not only that, but
the university of michigan is now posting the o.c.r. _results_
on their site, so you can scrape their actual o.c.r. output too,
which means that you don't even have to do the o.c.r yourself.

when i post my messages, i'll come here and give you the url's.

-bowerbird
bowerbird is offline   Reply With Quote