View Single Post
Old 11-27-2007, 06:23 PM   #11
profnachos
Connoisseur
profnachos began at the beginning.
 
Posts: 52
Karma: 43
Join Date: Nov 2007
Device: Palm Treo
Quote:
Originally Posted by HarryT View Post
Could you not do it the same way that the text file clean-up tools work - treat two consecutive <br>'s as a paragraph break, and then delete all the others? That's all that springs to mind at present, I'm afraid!
Well, not all paragraphs are handled with two consecutive br tags. I converted Crime and Punishment from PDF to HTML with pdftohtml. I don't see two consecutive <br>'s anywhere.

I am thinking that if there is a period right before the <br> tag, that is the end of the paragraph. Of course it won't always be right, but that seems to be the best "guess."

Last edited by profnachos; 11-27-2007 at 07:22 PM.
profnachos is offline   Reply With Quote