MobileRead Forums - View Single Post

rkomar · 03-14-2012, 09:47 PM

Quote:

Originally Posted by Penforhire

Well, using 3rd party software (or full Acrobat) any time there is text in a PDF I can extract it. If I can extract it, as text, then it has to be reflowable in certain applications.

PDF files are programs that execute inside of a state engine. They combine data with instructions, and what is done with either depends on the state of the engine at that time. For example, the exact position at which one character is rendered may depend on where the preceding character was located. Modifying what happens at some stage of rendering could have drastic effects on what comes after, since the engine state could be different than what was expected when the PDF file/program was written. I think these reflow tags work to alleviate some of that problem, breaking the text into smaller independent objects that can be relocated as a group as far as the engine is concerned. The main point is that you can't think of a PDF file as content and metadata in arbitrary arrangements; it is really a set of precise instructions that have to be followed consecutively according to strict rules. Applying reflow to, say, a mathematical paper will show you what happens when you start messing with the engine in arbitrary ways while it's working (i.e. you get gibberish).