View Single Post
Old 07-03-2010, 12:28 AM   #1
prky
Member
prky began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Nov 2009
Device: IPhone 3GS
HTML Conversion - Multiline Headers

Hi all,

I've noticed some HTML -> EPUB conversion funkiness with Calibre since I upgraded from a 0.6.54 version to 0.7.5

I had a book which was converted in 0.6.54 which worked fine, but when I went to reconvert it in 0.7.5 (after changing the CSS for emphasis handling) I found the entire book was converted in large text.

Looking at the HTML, each chapter had the following type of chapter header:

Code:
<h2>1
Chapter 1 Title</h2>
It appears that in 0.6.54, this was parsed as:

Code:
<h2>1</h2>
<h2>Chapter 1 Title</h2>
ie, just use header 2 for those two lines, however in 0.7.5 it was parsed as:

Code:
<h2>1
Chapter 1 Title
ie an unterminated header 2, so the entire chapter was done in header 2!

I edited the HTML to manually make it look like

Code:
<h2>1</h2>
<h2>Chapter 1 Title</h2>
Which fixed it.

Question - is this a HTML parsing bug in 0.7.x, or was it never meant to work the way it did in 0.6.x?

Is there a way I can make it parse HTML tags across multiple lines?

Ta,

prk.
prky is offline   Reply With Quote