Calibre 0.7.16 enables the preprocessing option for lit, txt, rtf, and html input types. It will attempt to unwrap lines and also mark chapters in books where chapter headings previously weren't marked.
Select "preprocess input to possibly improve structure detection" under the Structure Detection options in conversion for a bad book to have Calibre attempt to fix the markup. If you're dealing with a troublesome text file then you should also choose "Treat each line as a paragraph" under text input.
It seems to work ok on the books I've tested, but I know there is a pretty large variety of badly formatted books coming from OCR sources, etc. Let me know if there are books that fail to unwrap. I'm also interested in books that have text formatting that traverses lines. I think that should be ok, but I didn't have any test cases.
|