Thread: Scanning books
View Single Post
Old 11-04-2007, 01:20 AM   #11
bowerbird
Banned
bowerbird has been very, very naughtybowerbird has been very, very naughtybowerbird has been very, very naughty
 
Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
you don't need to join distributed proofreaders to snag some scans:
> http://www.pgdp.org/ols/

you can also find scan-sets in the same places that d.p. finds 'em:
> http://www.pgdp.net/wiki/Sources_for_Scan_Harvesting

***

nekokami said:
> Why reassemble the word images, instead of OCR?

well, believe it or not, that's one way of doing "reflow".
parc did a paper on it a while back. here's the info:
> Paper to PDA
> Thomas M. Breuel, William C. Janssen, Kris Popat, Henry S. Baird
> 11 August 2002
> TR−01−2

***

nate said:
> Developing a decent OCR program from scratch
> is a Master's thesis or PhD level project
> (according to my professor).

um, he's pulling your leg. developing a decent o.c.r. program is
_immensely_ difficult. even with a headstart they obtained from
adopting a project from elsewhere, google discovered it's hard...
take a look at their recent alpha of ocropus to get a rough idea:
> http://code.google.com/p/ocropus/

-bowerbird
bowerbird is offline   Reply With Quote