View Single Post
Old 08-21-2013, 01:19 AM   #505
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,274
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by kundor View Post
...By the way, it takes about 12 hours to OCR this document, which seems kind of silly when there is already a hidden text layer. Since it includes the location data, it seems like it might be possible to keep track of which words go with each chunk while you're slicing up the pages. Have you considered doing that?
This feature (using the native text in a PDF file in place of OCR) will be available in the next k2pdfopt release. I've implemented it (a la the -ttt option in the mudraw utility that comes with MuPDF) and tested it.
willus is offline   Reply With Quote