View Single Post
Old 04-01-2012, 05:39 PM   #1
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,858
Karma: 6120478
Join Date: Nov 2009
Device: many
questions on self-closing tags and legal xhtml in epubs

Hi,

I have been playing around with html5lib and lxml in python and libxml2 in c to write code to process epubs and have run into difficulties parsing xhtml documents with the following self-closing tags. Are these legal in strict xhtml as used in epub 2? Are they still legal for epub 3.

<title />

<a id="blah" />

<div id="blah" />

<div id="blah" class="clearfix" />

When I parse xhtml with these self closing tags in them the parsers (and this must all tie back to libxml2 since they all are front ends to that library I believe) the get very confused and either start replacing tag < and > with their html entities, or they assume the ending tag is never found and add a new ending tag much much farther on, which can easily change the meaning especially for the float region "clearfix" class approach.

Even modern browsers seem to have trouble dealing with these particular self-closing tags.

I know in pure xml almost any tag can be a self-closing tag, but I thought under strict XHTML for epubs only specific tags like <meta /> and <hr /> were allowed to be self-closing and that all others must be explicitly and separately closed to guarantee proper ebook viewing.

Does anyone know the exact spec. Having to work around these bugs is quite painful and looking for and fixing all of these before parsing the xhtml makes things quite slow at times.

Ideas anyone?

Thanks,

Kevin
KevinH is offline   Reply With Quote