View Single Post
Old 01-04-2009, 01:24 PM   #13
Flinx
Connoisseur
Flinx is a glorious beacon of lightFlinx is a glorious beacon of lightFlinx is a glorious beacon of lightFlinx is a glorious beacon of lightFlinx is a glorious beacon of lightFlinx is a glorious beacon of lightFlinx is a glorious beacon of lightFlinx is a glorious beacon of lightFlinx is a glorious beacon of lightFlinx is a glorious beacon of lightFlinx is a glorious beacon of light
 
Posts: 63
Karma: 12132
Join Date: Sep 2006
Location: Germany
Device: Cybook Muse Frontlight, Cybook Odyssey
Quote:
Originally Posted by tompe View Post
Really not true at all. You can also use the convention that two line breaks in a row indicates a new paragraph
No, that is not really useful for the most standard PDFs. The text object in a PDF file does not contain a real line break. It contains the position where on the page it has to drawn and a number of characters. The result is a line of text.
The progam that makes the conversion has to estimate from the positions of the text objects in which order the lines come. Simple converters like the most available (including Acrobat) use one text object, convert it to text and set a line break at the end, resulting in one line of the output text. The better converters can try to join the separate text objects, if their horizontal start position is identical and the line is long enough. But this is a difficult job, and I have not yet found a program that works good enough for me.
Flinx is offline   Reply With Quote