PDF does not define that a block of text is a paragraph. In HTML for instance you would put the text in side of <p> </p> tags to denote that that block is a paragraph.
In a PDF, it essentially says draw black lines in this shape at these points on the page. Each line is drawn independently of the next like in a print book. The tab indent (if there is one) is you're visual indicator that you have started a new paragraph. However, that tab character isn't a character in the PDF. The instructions for drawing the text just start a bit further to the right than the line above and below.
Now we get into the question of what is a paragraph? Does it always start with a tab indent? How large of an indent? Is a paragraph separated by blank lines? Is a 10 character line alone that says Chapter 10 a paragraph or something else?
Do you see the issue? With a PDF (much like a TXT file) you don't have information (you do but it's limited at best) that tells you what you're looking at other than at this point on the page draw this.
|