MobileRead Forums - View Single Post

murraypaul · 03-15-2012, 11:37 AM

Quote:

Originally Posted by Penforhire

Well, using 3rd party software (or full Acrobat) any time there is text in a PDF I can extract it. If I can extract it, as text, then it has to be reflowable in certain applications.

You can extract text, yes.
You cannot 100% reliably extract text in the correct order.
A two column PDF might be laid out as all of column one, then all of column two, or as the first line of both columns, then the second line of both columns...
You could have a perfectly valid PDF, which displayed fine on the screen, which printed all the letter 'a's, then all the letter 'b's, and so on.
You cannot reliably extract sentence and paragraph endings.
You cannot reliably tell whether a new page should or should not start a new paragraph.
In short, PDF is excellent at being a final display format, and poor at being a transitional format.