View Single Post
Old 12-09-2010, 07:54 AM   #4
frabjous
Wizard
frabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
frabjous's Avatar
 
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
You cannot.

PDF is not designed as an import format to be converted to another format. It is designed as an output format. PDFs are always created from other source documents; the thought has always been that if you wanted to change it or convert it, you'd go back and make changes to the source document. The fact that PDFs can ever be converted at all with useable results is somewhat surprising.

A PDF is designed to emulate a printed page and the purpose of a PDF is to look exactly the same for everyone who views it, much like a printed page would. Typically, a PDF only contains information about the exact placement of characters, images, and vectors on a page and nothing else. It does not even maintain the information from the original source document such as where one word begins and another ends, much less where paragraphs begin and end. When you convert a PDF to another format, it is up to the artificial intelligent of the converter to try to "reconstruct" this information. There is no easy way to make this work well for each case.

You get spaces between letters, or places where words bump together, when the converter incorrectly reads where word boundaries are. You get boxes or wrong characters when the glyphs from its fonts don't match the glyphs from the fonts you are converting into, or there are differences in the fonts' character encodings.

You really can't expect perfection here; you're working against the grain, trying to do what you were never designed to do. I'm afraid you're going to have to live with that, or else learn a lot of programming and try to write a superior artificial intelligence algorithm.

Nevertheless, if you don't have access to the source document, converting is still often preferable to retyping everything from scratch, but do be prepared for a fair amount of manual fixing.

Last edited by frabjous; 12-09-2010 at 07:57 AM.
frabjous is offline   Reply With Quote