View Single Post
Old 11-05-2014, 06:58 AM   #2
EnergyLens
Hack
EnergyLens began at the beginning.
 
Posts: 34
Karma: 12
Join Date: Dec 2009
Device: Kobo Aura HD, Kindle Paperwhite
I also wanted to note that the inspiration for this was not primarily to make PDFs more readable on eReaders (which it does), but to streamline cleanup after OCR of PDF files. Briss does a great job of automatically cutting off page numbers and chapter names which occur on every page, thus making the resulting text much more readable.

I use PDFOCRx on Mac, which does a great job with two-column PDFs, and produces soft line-wraps. Sometimes OCR works even better (in terms of cleanup) than extracting the text from the native PDF.

I also have a custom column in my Calibre library which sorts PDFs into three type: Scanned, Scan with embedded OCR data, and Native. I populate this column when loading new PDFs into the library and then later the value of the column helps me to decide how to process the file when converting to other formats.
EnergyLens is offline   Reply With Quote