MobileRead Forums - View Single Post - LaunchHack - an OCR-based companion to LaunchPad

vdp · 12-09-2011, 04:46 AM

@hawhill

Quote:

Originally Posted by hawhill

Holy crap, you got me sitting here in a mix of admiration and disbelief. This is awesome in more than one way: The engineering thought of going this route, the guts to do so and at last the "hackery" level :-)

Thanks

Quote:

Originally Posted by hawhill

Well, I think you would agree if I add as a postscriptum: There just _has_ to be a simpler way :-)

Absolutely! I also like the lsof-based trick that was commited by dpavlin in kpdfviewer's repo, but unfortunately it seems to require an additional step and probably takes more memory since two different pdf readers are loaded in memory.

Quote:

Originally Posted by hawhill

Thanks for mentioning the PDF reader. There'll be updates soon, the sources already have a file choser and I'll fix a few other things and do another release then.

That's great!

@h1uke

Quote:

vdp, do you think that a small part of Tesseract can be used to quickly
analyze page structure and return a set of word/line bounding boxes?

This could seriously simplify emulation of a pointing device in GUI programs
ported to Kindle.

Maybe, I am not familiar with tesseract's internals. My understanding from reading the wikipedia's articles and information given on the Tesseract's website, is that document analysis features were added after Google started supporting the project. There are also other projects like Ocropus that seem to perfom the analysis and use Tesseract as a backend recognition engine.
So perhaps there are options... I would be also very interested if we can research and use/implement lightweight techniques from computer vision and ocr communities to analyse the documents and use the infromation for improving the navigation and presentation.
Take for example the two-column mode of Duokan. It is very useful in the majority of the cases, but what it does is just splitting in two equal parts. If the text is offset however or one of the columns is wider, the user is out of luck.