View Single Post
Old 09-13-2014, 11:18 AM   #8
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,303
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by HarryT View Post
A PDF containing scanned pages obviously can't be reflowed - it's just a picture.
Not to beat a dead horse, but this does seem to be a common misconception. As Markom pointed out, if the text is regular enough, and there are not too many defects, text re-flow on scanned pages can be done reliably using graphical methods to find the text rows and words within the scanned bitmaps (not OCR). The links below are from willus.com/k2pdfopt (in the middle of the page where the examples are):

Scanned book pages (no OCR layer)

Scanned pages as processed by k2pdfopt (no OCR performed)
willus is offline   Reply With Quote