Is Comments field XHTML compliant?
I am looking at an issue in the calibre2opds software that relates to our handling of the Calibre Comments field when it contains characters outside the cp1252 character set. What we are finding is these characters are ending up displaying incorrectly in our generated catalogs.
At the moment we have some code that uses 3rd party libraries to 'tidy' the Comments field and ensure it is XHTML compliant for inclusion into an OPDS catalog. This code dates from the days before Calibre properly supported HTML comments which was why it was originally used. We have tracked down our issue to the fact that the library we are using does not seem to be handling the comments field correctly if contains these non-cp1252 characters. Before we try and 'tidy' the comments field it is correctly stored in Java (which uses UTF-16 strings). If we by-pass the tidy code then the problem goes away and our generated catalogs display correctly.
The question therefore is whether it is now safe to assume that the Calibre comments field read from the database is already XHTML compliant? If we can then a lot of work can be avoided in calibre2opds to fix code that is no longer needed.
|