View Single Post
Old 05-18-2008, 07:01 AM   #57
ashkulz
Addict
ashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enoughashkulz will become famous soon enough
 
ashkulz's Avatar
 
Posts: 350
Karma: 705
Join Date: Dec 2006
Location: Mumbai, India
Device: Kindle 1/REB 1200
Quote:
Originally Posted by bazzargh View Post
BTW reading over this thread it seems worth mentioning some previous research into reflowing scanned text for small devices:
http://pubs.iupr.org/DATA/2002-breuel-wdabook.pdf
Actually, those are the very algorithms I was trying to work on (I'm the original author of PDFRead) but then gave up on, as it required too much effort to implement them from scratch.

Happily, one of the author of these publications (Thomas Breuel) is now leading the development of Ocropus at Google, which is a document analysis and OCR system. Browsing through the code, most of the algorithms already seem to be implemented (and some advances from that, too): I plan to integrate it sometime into PDFRead soon. (I've already contributed some patches to get it compiling under windows). The library interface can be scripted via Lua, so I'm currently trying to put together the bits and pieces to get that approach working.
ashkulz is offline   Reply With Quote