View Single Post
Old 08-13-2014, 03:06 AM   #10
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by Toxaris View Post
That is why I usually end up with OCR and subsequent processing... Seen too many strange things with the text export...
Yep, can't trust any of these dang PDF creation programs.

Even using the same program, you don't know which settings people clicked. Did they generate this PDF using LibreOffice, and enabled "Tagged PDF"? Did they generate it using InDesign using the proper (accessibility) settings? What dang "PDF Printer" did they run it through in Word (and what were the settings)?

After they generated the original PDF, did they run it through some crappy "PDF Editing" software to add a Cover/Title Page, or do something simple like ADD METADATA? (By the gods, those "Editing" softwares absolutely mangle PDFs).

Since the text is quite crisp (since it is a purely digital file), the OCR should be QUITE accurate, and have few errors.

Although enough poopooing on how bad PDF is as an input format! Let's remain positive!
Tex2002ans is offline   Reply With Quote