MobileRead Forums - View Single Post

bowerbird · 11-04-2007, 02:20 AM

you don't need to join distributed proofreaders to snag some scans:
> http://www.pgdp.org/ols/

you can also find scan-sets in the same places that d.p. finds 'em:
> http://www.pgdp.net/wiki/Sources_for_Scan_Harvesting

***

nekokami said:
> Why reassemble the word images, instead of OCR?

well, believe it or not, that's one way of doing "reflow".
parc did a paper on it a while back. here's the info:
> Paper to PDA
> Thomas M. Breuel, William C. Janssen, Kris Popat, Henry S. Baird
> 11 August 2002
> TR−01−2

***

nate said:
> Developing a decent OCR program from scratch
> is a Master's thesis or PhD level project
> (according to my professor).

um, he's pulling your leg. developing a decent o.c.r. program is
_immensely_ difficult. even with a headstart they obtained from
adopting a project from elsewhere, google discovered it's hard...
take a look at their recent alpha of ocropus to get a rough idea:
> http://code.google.com/p/ocropus/

-bowerbird

11-04-2007, 02:20 AM	#11
bowerbird Banned Posts: 269 Karma: -273 Join Date: Sep 2006 Location: los angeles	you don't need to join distributed proofreaders to snag some scans: > http://www.pgdp.org/ols/ you can also find scan-sets in the same places that d.p. finds 'em: > http://www.pgdp.net/wiki/Sources_for_Scan_Harvesting * nekokami said: > Why reassemble the word images, instead of OCR? well, believe it or not, that's one way of doing "reflow". parc did a paper on it a while back. here's the info: > Paper to PDA > Thomas M. Breuel, William C. Janssen, Kris Popat, Henry S. Baird > 11 August 2002 > TR−01−2 * nate said: > Developing a decent OCR program from scratch > is a Master's thesis or PhD level project > (according to my professor). um, he's pulling your leg. developing a decent o.c.r. program is _immensely_ difficult. even with a headstart they obtained from adopting a project from elsewhere, google discovered it's hard... take a look at their recent alpha of ocropus to get a rough idea: > http://code.google.com/p/ocropus/ -bowerbird