Quote:
Originally Posted by user_none
@frostschutz, HTMLZ with default settings will clean up a lot of poorly formatted HTML. This coupled with Heuristics will find most chapters without needing to write pattern matches.
|
The other problem with many poorly formatted ePubs is they were originally converted from Lit/text/whatever using a version of Calibre without heuristics (or heuristics was disabled), so there would be seemingly random split points every 260K - HTMLZ will merge the random split points back together, and without doing that first heuristics wouldn't be able to find the chapters correctly.