Quote:
Originally Posted by Flumine
I wish I saw your application before, it is really unique of its kind (at least I have not yet found anything better).
Actually, I had a similar idea about a two years ago - to re-flow wide scanned documents on a glyph level to fit into mobile screen, so spent some time with my friend writing an android application. The result we got is working fine but with some limitations - it could not recognize complex formulas or multi-column layout.
Here is a good page sample:
https://slack-files.com/T9YDZ38JY-FUG1J0AKA-40bce696bf
Here is a sample of original page which failed to re-flow properly - with glyphs recoginzed - https://glyphs.flum.app/image?id=448&mode=glyphs.
To recognize glyphs we are using OpenCV library and it mostly works fine but it is hard to get formulas to be recognized as a single image. Your application is working much better with them so I wonder what algorithm you are using for that?
|
If you look at the comments at the top of the main source file, k2pdfopt.c, it outlines the high level process used by k2pdfopt and points out some of the key C functions. The algorithms for detection are just my own inventions, with a lot of trial and error for what works well and what doesn't. The basic concept is to first look for columnar regions / large blocks of the page by scanning for horizontal and vertical blank (white) areas between the regions, and then to break those columns/regions into rows of text, and then the rows of text into words. The process has given me a deep appreciation for how easily the human brain can visually parse a page (and instantly know "that is text" and "that is an image", etc.) compared to how hard it is to write a reliable algorithm to do the same thing.