View Single Post
Old 03-15-2012, 10:37 AM   #13
murraypaul
Interested Bystander
murraypaul ought to be getting tired of karma fortunes by now.murraypaul ought to be getting tired of karma fortunes by now.murraypaul ought to be getting tired of karma fortunes by now.murraypaul ought to be getting tired of karma fortunes by now.murraypaul ought to be getting tired of karma fortunes by now.murraypaul ought to be getting tired of karma fortunes by now.murraypaul ought to be getting tired of karma fortunes by now.murraypaul ought to be getting tired of karma fortunes by now.murraypaul ought to be getting tired of karma fortunes by now.murraypaul ought to be getting tired of karma fortunes by now.murraypaul ought to be getting tired of karma fortunes by now.
 
Posts: 3,726
Karma: 19728152
Join Date: Jun 2008
Device: Note 4, Kobo One
Quote:
Originally Posted by Penforhire View Post
Well, using 3rd party software (or full Acrobat) any time there is text in a PDF I can extract it. If I can extract it, as text, then it has to be reflowable in certain applications.
You can extract text, yes.
You cannot 100% reliably extract text in the correct order.
A two column PDF might be laid out as all of column one, then all of column two, or as the first line of both columns, then the second line of both columns...
You could have a perfectly valid PDF, which displayed fine on the screen, which printed all the letter 'a's, then all the letter 'b's, and so on.
You cannot reliably extract sentence and paragraph endings.
You cannot reliably tell whether a new page should or should not start a new paragraph.
In short, PDF is excellent at being a final display format, and poor at being a transitional format.
murraypaul is offline   Reply With Quote