View Single Post
Old 08-22-2012, 06:36 AM   #37
janek
Groupie
janek has memorized the entire works of Homer, Shakespeare, and Jane Austenjanek has memorized the entire works of Homer, Shakespeare, and Jane Austenjanek has memorized the entire works of Homer, Shakespeare, and Jane Austenjanek has memorized the entire works of Homer, Shakespeare, and Jane Austenjanek has memorized the entire works of Homer, Shakespeare, and Jane Austenjanek has memorized the entire works of Homer, Shakespeare, and Jane Austenjanek has memorized the entire works of Homer, Shakespeare, and Jane Austenjanek has memorized the entire works of Homer, Shakespeare, and Jane Austenjanek has memorized the entire works of Homer, Shakespeare, and Jane Austenjanek has memorized the entire works of Homer, Shakespeare, and Jane Austenjanek has memorized the entire works of Homer, Shakespeare, and Jane Austen
 
Posts: 175
Karma: 23456
Join Date: Feb 2012
Device: Boox m92
Quote:
Originally Posted by Knorke View Post
Thanks for the helpful comments. In the two cases with bug 4 (missings space) the pdf files are not scanned. They are generated by pdflatex+Adobe destiller or Elsevier.

When I mark the text with the Adobe pdf reader and copy it to an editor the spaces are correct.

Than I took a scanned patent pdf, used the Acrobat OCR and - hold your hat - the onyx detects the spaces correctly!

--> In my case the space problem does not correlate with scanned or not scanned file!
Bizarre indeed. One small correction: what I wanted to say is not that every scanned/ocr-ed document causes such problems neither that no "generated" ones do, but that the bug is related to how how text layer is organized.
I'm able to reproduce this behaviour by using linux utility called gscan2pdf to scan and ocr documents.
janek is offline   Reply With Quote