Quote:
Originally Posted by mike_bike_kite
I can't understand why there aren't simple post processors to process the text. Take the text output and join the lines together unless they end in a full stop, a question mark or a double quote.
|
Of course there are, using algorithms more sophisticated than this, too. I think most of the software mentioned in the thread does this already, to greater or lesser degrees of success. There are also dedicated tools like
ebook-tidy, and so on.
Quote:
My aim was to finally generate HTML and then use the chapter titles to create a TOC. I got halfway there but considering how clever tools like Calibre are, it surprised me that this wasn't done automatically.
|
Calibre provides a place to enter a regular expressions for deleting headers and footers IIRC. pdfreflow attempts to do this automatically; but this is a difficult thing to get right, and I think such software naturally tends to err towards not deleting something when in doubt as opposed to deleting it.