Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 04-07-2009, 04:07 PM   #1
krunkster
Junior Member
krunkster began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Feb 2009
Device: Kindle
Can I preserve entities when converting from html? (To avoid unicode on kindle)

Hopefully my question is not so obtuse, that it scares everyone off.

My problem is that the source material I have is an HTML toc.
Many of the HTML files included contain reference entities such as:
  and ’

When Calibre imports from HTML it converts all the files to XHTML with a UTF-8 encoding, during this process it converts those reference entities into unicode.

This maybe fine for most situations, but I have a Kindle 2, and it does not like the unicode converted symbols and tries to display them as ascii equivalents, which effectively look like garbage:
Â*and ’

Kindle does however recognize the reference entities and will render them appropriately, at least all the ones I've seen so far. So what I'd like to do is preserve the reference entities when I import into Calibre instead of converting them into unicode.

If you look at "tidy" the classic XHTML parser it in fact has an option called "preserve-entities"... I'm hoping something like this might be achievable in Claibre.

Thanks for your time.
krunkster is offline   Reply With Quote
Old 04-07-2009, 05:11 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,844
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
IIRC the MOBI format has support for UTF-8 and calibre generated UTF-8 MOBI files, so this shouldn't be a problem. Can you opena ticket and attach a tet HTML file demonstrating this problem
kovidgoyal is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
preserve images on converting to MOBI joselitux Calibre 7 05-28-2010 05:49 AM
preserve table format when converting mobi to rtf moogoogai Calibre 4 02-26-2010 12:50 PM
Can the Kindle avoid repeating the video-game crash of '83? m-reader News 11 12-03-2009 01:35 PM
Converting from html mysweety Calibre 16 09-23-2009 08:20 AM
Converting non-ascii/non-unicode text - pictures the way to go? politicorific Workshop 5 04-02-2009 05:59 AM


All times are GMT -4. The time now is 09:36 PM.


MobileRead.com is a privately owned, operated and funded community.