View Single Post
Old 03-10-2020, 01:39 AM   #1755
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,303
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by Flumine View Post
I wish I saw your application before, it is really unique of its kind (at least I have not yet found anything better).
Actually, I had a similar idea about a two years ago - to re-flow wide scanned documents on a glyph level to fit into mobile screen, so spent some time with my friend writing an android application. The result we got is working fine but with some limitations - it could not recognize complex formulas or multi-column layout.
Here is a good page sample:
https://slack-files.com/T9YDZ38JY-FUG1J0AKA-40bce696bf
Here is a sample of original page which failed to re-flow properly - with glyphs recoginzed - https://glyphs.flum.app/image?id=448&mode=glyphs.
To recognize glyphs we are using OpenCV library and it mostly works fine but it is hard to get formulas to be recognized as a single image. Your application is working much better with them so I wonder what algorithm you are using for that?
If you look at the comments at the top of the main source file, k2pdfopt.c, it outlines the high level process used by k2pdfopt and points out some of the key C functions. The algorithms for detection are just my own inventions, with a lot of trial and error for what works well and what doesn't. The basic concept is to first look for columnar regions / large blocks of the page by scanning for horizontal and vertical blank (white) areas between the regions, and then to break those columns/regions into rows of text, and then the rows of text into words. The process has given me a deep appreciation for how easily the human brain can visually parse a page (and instantly know "that is text" and "that is an image", etc.) compared to how hard it is to write a reliable algorithm to do the same thing.
willus is offline   Reply With Quote