View Single Post
Old 10-30-2011, 02:54 AM   #6
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Quote:
Originally Posted by user_none View Post
@frostschutz, HTMLZ with default settings will clean up a lot of poorly formatted HTML. This coupled with Heuristics will find most chapters without needing to write pattern matches.
The other problem with many poorly formatted ePubs is they were originally converted from Lit/text/whatever using a version of Calibre without heuristics (or heuristics was disabled), so there would be seemingly random split points every 260K - HTMLZ will merge the random split points back together, and without doing that first heuristics wouldn't be able to find the chapters correctly.
ldolse is offline   Reply With Quote