View Single Post
Old 09-18-2011, 08:54 PM   #2
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
PDF does not define that a block of text is a paragraph. In HTML for instance you would put the text in side of <p> </p> tags to denote that that block is a paragraph.

In a PDF, it essentially says draw black lines in this shape at these points on the page. Each line is drawn independently of the next like in a print book. The tab indent (if there is one) is you're visual indicator that you have started a new paragraph. However, that tab character isn't a character in the PDF. The instructions for drawing the text just start a bit further to the right than the line above and below.

Now we get into the question of what is a paragraph? Does it always start with a tab indent? How large of an indent? Is a paragraph separated by blank lines? Is a 10 character line alone that says Chapter 10 a paragraph or something else?

Do you see the issue? With a PDF (much like a TXT file) you don't have information (you do but it's limited at best) that tells you what you're looking at other than at this point on the page draw this.
user_none is offline   Reply With Quote