Quote:
Originally Posted by Knorke
Thanks for the helpful comments. In the two cases with bug 4 (missings space) the pdf files are not scanned. They are generated by pdflatex+Adobe destiller or Elsevier.
When I mark the text with the Adobe pdf reader and copy it to an editor the spaces are correct.
Than I took a scanned patent pdf, used the Acrobat OCR and - hold your hat - the onyx detects the spaces correctly!
--> In my case the space problem does not correlate with scanned or not scanned file!
|
Bizarre indeed. One small correction: what I wanted to say is not that every scanned/ocr-ed document causes such problems neither that no "generated" ones do, but that the bug is related to how how text layer is organized.
I'm able to reproduce this behaviour by using linux utility called gscan2pdf to scan and ocr documents.