MobileRead Forums - View Single Post

KevinH · 11-14-2020, 10:43 PM

Yes, technically epub is xhtml thus xml. So to be rigorous to the epub2 spec, having a doctype that specifies the version of xhtml and the named entities should be important. My guess is most ereaders are serving these pages as html or adding the doctype on the fly if missing.

To be safest it technically should specify a doctype. Sigil has always used one and added it where needed during load. Only since Sigil 1.0 where Sigil stopped moving and updating every page on initial load has there been pages without it inside Sigil. That is why I added it to our well-formed check on epub load so it can be detected and fixed automatically like it was in the past if the user wants it to be.

If a epub does not use named entities outside of those recognized by xml, and instead uses no entities or only numeric entities, then you could probably dispense with the DOCTYPE safely. But since under epub2, Sigil supports named entities (such as  ) Sigil needs and enforces the DOCTYPE.

Calibre on the other hand removes all named entities and replaces them with the correct unicode character, so it can then remove the epub2 doctype safely.

Sigil uses the doctype as specified in the epub2 (2.0.1) specification for xhtml files.

11-14-2020, 10:43 PM	#370
KevinH Sigil Developer Posts: 8,893 Karma: 6120478 Join Date: Nov 2009 Device: many	Yes, technically epub is xhtml thus xml. So to be rigorous to the epub2 spec, having a doctype that specifies the version of xhtml and the named entities should be important. My guess is most ereaders are serving these pages as html or adding the doctype on the fly if missing. To be safest it technically should specify a doctype. Sigil has always used one and added it where needed during load. Only since Sigil 1.0 where Sigil stopped moving and updating every page on initial load has there been pages without it inside Sigil. That is why I added it to our well-formed check on epub load so it can be detected and fixed automatically like it was in the past if the user wants it to be. If a epub does not use named entities outside of those recognized by xml, and instead uses no entities or only numeric entities, then you could probably dispense with the DOCTYPE safely. But since under epub2, Sigil supports named entities (such as  ) Sigil needs and enforces the DOCTYPE. Calibre on the other hand removes all named entities and replaces them with the correct unicode character, so it can then remove the epub2 doctype safely. Sigil uses the doctype as specified in the epub2 (2.0.1) specification for xhtml files. Last edited by KevinH; 11-14-2020 at 11:15 PM.