|04-07-2009, 04:07 PM||#1|
Join Date: Feb 2009
Can I preserve entities when converting from html? (To avoid unicode on kindle)
Hopefully my question is not so obtuse, that it scares everyone off.
My problem is that the source material I have is an HTML toc.
Many of the HTML files included contain reference entities such as:
When Calibre imports from HTML it converts all the files to XHTML with a UTF-8 encoding, during this process it converts those reference entities into unicode.
This maybe fine for most situations, but I have a Kindle 2, and it does not like the unicode converted symbols and tries to display them as ascii equivalents, which effectively look like garbage:
Kindle does however recognize the reference entities and will render them appropriately, at least all the ones I've seen so far. So what I'd like to do is preserve the reference entities when I import into Calibre instead of converting them into unicode.
If you look at "tidy" the classic XHTML parser it in fact has an option called "preserve-entities"... I'm hoping something like this might be achievable in Claibre.
Thanks for your time.
|04-07-2009, 05:11 PM||#2|
creator of calibre
Join Date: Oct 2006
Location: Mumbai, India
IIRC the MOBI format has support for UTF-8 and calibre generated UTF-8 MOBI files, so this shouldn't be a problem. Can you opena ticket and attach a tet HTML file demonstrating this problem
|Thread Tools||Search this Thread|
|Thread||Thread Starter||Forum||Replies||Last Post|
|preserve images on converting to MOBI||joselitux||Calibre||7||05-28-2010 05:49 AM|
|preserve table format when converting mobi to rtf||moogoogai||Calibre||4||02-26-2010 12:50 PM|
|Can the Kindle avoid repeating the video-game crash of '83?||m-reader||News||11||12-03-2009 01:35 PM|
|Converting from html||mysweety||Calibre||16||09-23-2009 08:20 AM|
|Converting non-ascii/non-unicode text - pictures the way to go?||politicorific||Workshop||5||04-02-2009 05:59 AM|