See pdf2lrf, html2lrf and rtf2lrf. LRF has a well defined notion of what a paragraph is, so my mapping implicitly identifies paragraphs.
It's not perfect in that it will treat heading as paragraphs as well, but I don't see that being a problem for reference work.
And I cant really take the credit for pdf and rtf as I use other people converters to convert them to html first.
And also note that these mappings are not infallible. You can produce files that humans would think contain paragraphs but the converters dont. However, as demonstrated by the wide use of these tools they are largely successful on real world files.
|