Thread: Scanning books
View Single Post
Old 11-04-2007, 01:20 AM   #11
bowerbird has been very, very naughtybowerbird has been very, very naughtybowerbird has been very, very naughty
Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
you don't need to join distributed proofreaders to snag some scans:

you can also find scan-sets in the same places that d.p. finds 'em:


nekokami said:
> Why reassemble the word images, instead of OCR?

well, believe it or not, that's one way of doing "reflow".
parc did a paper on it a while back. here's the info:
> Paper to PDA
> Thomas M. Breuel, William C. Janssen, Kris Popat, Henry S. Baird
> 11 August 2002
> TR−01−2


nate said:
> Developing a decent OCR program from scratch
> is a Master's thesis or PhD level project
> (according to my professor).

um, he's pulling your leg. developing a decent o.c.r. program is
_immensely_ difficult. even with a headstart they obtained from
adopting a project from elsewhere, google discovered it's hard...
take a look at their recent alpha of ocropus to get a rough idea:

bowerbird is offline   Reply With Quote