MobileRead Forums - View Single Post - Can I preserve entities when converting from html? (To avoid unicode on kindle)

krunkster · 04-07-2009, 05:07 PM

Hopefully my question is not so obtuse, that it scares everyone off.

My problem is that the source material I have is an HTML toc.
Many of the HTML files included contain reference entities such as:
  and ’

When Calibre imports from HTML it converts all the files to XHTML with a UTF-8 encoding, during this process it converts those reference entities into unicode.

This maybe fine for most situations, but I have a Kindle 2, and it does not like the unicode converted symbols and tries to display them as ascii equivalents, which effectively look like garbage:
Â*and â€™

Kindle does however recognize the reference entities and will render them appropriately, at least all the ones I've seen so far. So what I'd like to do is preserve the reference entities when I import into Calibre instead of converting them into unicode.

If you look at "tidy" the classic XHTML parser it in fact has an option called "preserve-entities"... I'm hoping something like this might be achievable in Claibre.

Thanks for your time.

04-07-2009, 05:07 PM	#1
krunkster Junior Member Posts: 3 Karma: 10 Join Date: Feb 2009 Device: Kindle	Can I preserve entities when converting from html? (To avoid unicode on kindle) Hopefully my question is not so obtuse, that it scares everyone off. My problem is that the source material I have is an HTML toc. Many of the HTML files included contain reference entities such as:   and ’ When Calibre imports from HTML it converts all the files to XHTML with a UTF-8 encoding, during this process it converts those reference entities into unicode. This maybe fine for most situations, but I have a Kindle 2, and it does not like the unicode converted symbols and tries to display them as ascii equivalents, which effectively look like garbage: Â*and â€™ Kindle does however recognize the reference entities and will render them appropriately, at least all the ones I've seen so far. So what I'd like to do is preserve the reference entities when I import into Calibre instead of converting them into unicode. If you look at "tidy" the classic XHTML parser it in fact has an option called "preserve-entities"... I'm hoping something like this might be achievable in Claibre. Thanks for your time.