Quote:
Originally Posted by bazzargh
|
Actually, those are the very algorithms I was trying to work on (I'm the original author of PDFRead) but then gave up on, as it required too much effort to implement them from scratch.
Happily, one of the author of these publications (Thomas Breuel) is now leading the development of
Ocropus at Google, which is a document analysis and OCR system. Browsing through the code, most of the algorithms already seem to be implemented (and some advances from that, too): I plan to integrate it sometime into PDFRead soon. (I've already contributed some patches to get it compiling under windows). The library interface can be scripted via Lua, so I'm currently trying to put together the bits and pieces to get that approach working.