Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Kindle Formats

Notices

Reply
 
Thread Tools Search this Thread
Old 05-19-2011, 06:19 AM   #1
Stodder
Connoisseur
Stodder began at the beginning.
 
Posts: 75
Karma: 12
Join Date: Apr 2011
Device: ipad, kindle
Doctypes and charsets.

Hi guys,

I promise not to ask another question here for the next few days, but I just had to get an opinion on this :

What's the best doctype and charset to use for creating Kindle-compliant mobi files? I'm thinking of using XHTML Strict and ISO 8859-1.

Will this be compliant, and will it be able to handle named character entities (— et al)? My validation has come up clean, but I'm never sure what to make of charsets and the associated dtd stuff.

Any help would be great :-D

Stodder
Stodder is offline   Reply With Quote
Old 05-19-2011, 07:00 AM   #2
pdurrant
The Grand Mouse 高貴的老鼠
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 71,496
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
Quote:
Originally Posted by Stodder View Post
Hi guys,

I promise not to ask another question here for the next few days, but I just had to get an opinion on this :

What's the best doctype and charset to use for creating Kindle-compliant mobi files? I'm thinking of using XHTML Strict and ISO 8859-1.

Will this be compliant, and will it be able to handle named character entities (— et al)? My validation has come up clean, but I'm never sure what to make of charsets and the associated dtd stuff.

Any help would be great :-D

Stodder
I'd recommend XHTML 1.1 Transitional, but be sure to limit use to XHTML that's valid in the ePub 2.0.1 spec.

And then you'll probably want to limit even more to things that convert to Mobipocket well using kindlegen.

You should definitely use utf-8 as the character encoding, as that's a requirement of the ePub spec (well, or UTF-16, but that would just be silly.), and is also handled well by kindlegen.
pdurrant is offline   Reply With Quote
Advert
Old 05-19-2011, 07:46 AM   #3
Stodder
Connoisseur
Stodder began at the beginning.
 
Posts: 75
Karma: 12
Join Date: Apr 2011
Device: ipad, kindle
@pdurrant: Thanks for the detailed answer Yes, it's probably better for me to go with transitional rather than strict XHTML.

However, I thought UTF-8 was not workable in Kindle (Amazon: Formatting Your Book > Character Encoding and Supported Characters)?

This is soo complicated o_O!
Stodder is offline   Reply With Quote
Old 05-19-2011, 07:48 AM   #4
Stodder
Connoisseur
Stodder began at the beginning.
 
Posts: 75
Karma: 12
Join Date: Apr 2011
Device: ipad, kindle
Ooh, another question: if I use ISO 8859-1 for the source html (and I have no idea if that's a good idea, anyway), do I choose Western 1252 in the Mobipocket Creator encoding settings for the output MOBI file? The program only has 2 choices: UTF8 and Western 1252.
Stodder is offline   Reply With Quote
Old 05-19-2011, 09:29 AM   #5
pdurrant
The Grand Mouse 高貴的老鼠
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 71,496
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
Quote:
Originally Posted by Stodder View Post
However, I thought UTF-8 was not workable in Kindle (Amazon: Formatting Your Book > Character Encoding and Supported Characters)?
That page is talking about Amazon Kindle Direct Publishing, where you send them an HTML file and they convert it to a Kindle file. You don't want to do that. Use the free kindlegen locally - it'll be faster and give you more control.

Amazon Kindle ebooks most certainly can use utf-8. I have several published at the moment that use it.
pdurrant is offline   Reply With Quote
Advert
Old 05-19-2011, 09:31 AM   #6
pdurrant
The Grand Mouse 高貴的老鼠
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 71,496
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
Quote:
Originally Posted by Stodder View Post
Ooh, another question: if I use ISO 8859-1 for the source html (and I have no idea if that's a good idea, anyway), do I choose Western 1252 in the Mobipocket Creator encoding settings for the output MOBI file? The program only has 2 choices: UTF8 and Western 1252.
If you were to use ISO 8859-1, you should use Western 1252.

But I'd recommend utf-8. And while Mobipocket Creator is OK, I'd recommend using kindlegen for your final uploads to Amazon. (Mobipocket Creator hasn't been updated for a long time, and doesn't permit graphics bigger than 63KB, and probably has other limitations.)
pdurrant is offline   Reply With Quote
Old 05-19-2011, 10:38 AM   #7
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,515
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
The safest is just to use ASCII, and convert all other characters to numerical or named entities:

á -> á
ü -> ü
— -> —
’ -> &rsquo (or & #8217;)

If you do this, you won't have to worry about encodings, because the HTML text is actually just plain ASCII.
Jellby is offline   Reply With Quote
Old 05-19-2011, 05:32 PM   #8
Stodder
Connoisseur
Stodder began at the beginning.
 
Posts: 75
Karma: 12
Join Date: Apr 2011
Device: ipad, kindle
@pdurrant: ah, I see now

Hmm, maybe I should give kindlegen another go. I was not crazy at first, but it could be for the best to branch out from MBPC. I couldn't find a good manual to kindlegen, and that also discouraged me.

@jellby: never thought of just using ASCII. All my special characters have been replaced with named character entities already.

Last edited by Stodder; 05-19-2011 at 05:35 PM.
Stodder is offline   Reply With Quote
Old 05-19-2011, 07:39 PM   #9
Stodder
Connoisseur
Stodder began at the beginning.
 
Posts: 75
Karma: 12
Join Date: Apr 2011
Device: ipad, kindle
Oh, thanks for getting me onto KindleGen, Pdurrant!

I've started using it and its really useful. I created the html as XHTML Transitional Charset UTF8 and put in named character entities. The KindleGen output seems to read nicely, so hopefully the characters don't prove troublesome in other readers?

A part of me keeps thinking: use numeric entities. But I read once that older kindles have trouble with numeric entities--don't know how true this is?

Last edited by Stodder; 05-19-2011 at 07:56 PM.
Stodder is offline   Reply With Quote
Old 05-20-2011, 08:21 AM   #10
pdurrant
The Grand Mouse 高貴的老鼠
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 71,496
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
Quote:
Originally Posted by Stodder View Post
Oh, thanks for getting me onto KindleGen, Pdurrant!

I've started using it and its really useful. I created the html as XHTML Transitional Charset UTF8 and put in named character entities. The KindleGen output seems to read nicely, so hopefully the characters don't prove troublesome in other readers?

A part of me keeps thinking: use numeric entities. But I read once that older kindles have trouble with numeric entities--don't know how true this is?
I suspect that Kindlegen will convert numeric entries into utf-8. But I haven't looked at the output to see. If you get hold of MobiUnpack.py you can test it yourself. (It's available here on mobileread somewhere.)
pdurrant is offline   Reply With Quote
Old 05-20-2011, 05:26 PM   #11
Stodder
Connoisseur
Stodder began at the beginning.
 
Posts: 75
Karma: 12
Join Date: Apr 2011
Device: ipad, kindle
^ I've poked around in the mobi files I made with Kindlegen, and they seem to retain named entities, but never thought to check whether they converted numerical ones.

But thinking about it, I didn't chekc if the output MOBI was UTF.
Stodder is offline   Reply With Quote
Reply


Forum Jump


All times are GMT -4. The time now is 11:00 PM.


MobileRead.com is a privately owned, operated and funded community.