View Single Post
Old 10-18-2012, 11:17 AM   #4
dvd8n
Connoisseur
dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.
 
Posts: 75
Karma: 77598
Join Date: Apr 2010
Device: Sony PRS-650, Hanvon h516, Kobo Aura HD
Quote:
Originally Posted by ldolse View Post
Heuristics can't catch all scenarios, there are infinite possibilities for the way chapters can be formatted. In your case the word 'chapter' is explicitly covered, so my guess is it might be related to the underlying html code.

Tangential to Agama's solution would be to try to cleanse the html by converting to text with markdown or textile, and then convert back from text to ePub/Mobi with heuristics enabled. In some cases this will eliminate whatever html formatting tripped up heuristics, but basic formatting like italics/bold would be retained.
I think you might be misunderstanding me. The file that I am converting is plain vanilla text - no html.

And while I get your statement about heuristics being difficult, I would have said that a line starting with the word 'Chapter' followed by a number (admittedly in Roman numerals) was a good candidate for a chapter break
dvd8n is offline   Reply With Quote