04-14-2011, 01:13 AM | #1 |
Read, don't parrot.
Posts: 224
Karma: 110242
Join Date: Apr 2011
Device: Kindle Fire, Kobo Touch, Aldiko for Android
|
[Old Thread] HTML to MOBI for Kindle
Hi:
I've been experimenting using Word's Save as Web Page, Filtered for upload to Kindle, but as you all probably know, the Kindle does weird things to formatting -- mainly it doesn't respect one's paragraph formatting much (see first attached file). I have found that converting the HTML file to MOBI in Calibre does respect paragraph formatting when tested in both Kindle for PC and Kindle Previewer (see attached two screen shots). I was elated at first because this seemed like a solution to the problem. But I've been told by a Kindle forum user that: "Calibre ... is not a suitable mobi production tool, as it lacks the proper amount of spacing in the headers to allow all the "Kindle For X" apps to run (as PC, Mac, and iPad apps all require a 2nd layer of encryption)...I wouldn't want author-publishers to go haring off after Calibre like it's the Holy Grail of ebook production only to find out that buyers were returning books because they couldn't be read on anything but the Kindle Device itself." Yet, I had a mobi file, that was created in Calibre, converted into Kindle's azw format and it worked fine on my Kindle for PC. This suggests the guy's information is outdated; but before I suggest to others that it WILL work, I'd like to hear from other users or developers. Thanks, Michelle |
04-14-2011, 04:38 AM | #2 |
Wizard
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
That information only applies to mobi files you plan to publish to Amazon with DRM. For personal use or publishing without DRM Calibre is fine.
Aside from that I haven't seen a discussion in the Calibre forums that confirms that allegation - note I'm not saying I disbelieve the info, just not sure of the technical details around it. If you do need to publish to Amazon with DRM then check out this post for a workflow which should let you get the best of both worlds: https://www.mobileread.com/forums/sho...56&postcount=7 |
04-14-2011, 11:41 AM | #3 |
Sigil & calibre developer
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
I for one have never heard of this issue.
|
04-14-2011, 11:46 AM | #4 |
creator of calibre
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I believe it's the ridiculous excuse Amazon offers for refusing to apply DRM to calibre created MOBI files. It has absolutely no bearing on being able to read calibre created books on non Kindle MOBI readers. The only thing it means is that, Amazon will refuse to apply DRM to your book if you try to sell it through amazon's store.
|
04-14-2011, 07:17 PM | #5 |
reader
Posts: 6,975
Karma: 5183568
Join Date: Mar 2006
Location: Mississippi, USA
Device: Kindle 3, Kobo Glo HD
|
Since Calibre has bug reporting tools, all Amazon has to do is explain what needs to be changed in Calibre's MOBI headers and they would likely get fixed.
I don't see the point of DRM on ebooks submitted via DTP, which are likely to be bargain priced and therefore relatively piracy resistant. If you have to include DRM, I would try HTML -> MOBI (Calibre); MOBI -> ePub (Calibre again); ePub to AZW/MOBI (kindlegen). This is a bit simpler than DiapDealer's work flow referenced by ldolse, assuming it works (which it might not). |
04-14-2011, 07:22 PM | #6 |
Sigil & calibre developer
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
They won't. Mobipocket is a proprietary format they own. They will not release any specification information regarding the MOBI format. All known information is due to reverse engineering.
|
04-15-2011, 02:06 AM | #7 |
Read, don't parrot.
Posts: 224
Karma: 110242
Join Date: Apr 2011
Device: Kindle Fire, Kobo Touch, Aldiko for Android
|
@Kovid: this isn't Amazon saying this, it's some forum user who may or may not have his own agenda.
Amazon will accept .mobi files for submission to their site; all are converted into their proprietary .azw format (which I understand is a variation of mobi). DRM is applied to Amazon books at the upload stage; not in advance; and the DRM coding is added then. Not having any experience submitting a .mobi file to Amazon I can't say what would happen if I did and tried to apply DRM. But DRM isn't really my issue. My issue is whether the formatting preserved in the .mobi file would get corrupted by Kindle; ie., whether the Kindle would override the .mobi coding as Kindle does with HTML and apply things like automatic indents, which is the curse we're trying to work around. (I'd like to know which Kindle genius thought overriding someone's paragraph formatting was a good idea.) When I tested a Calibre .mobi file that was converted to .azw through Kindle's free service for Kindle owners (I don't have a Kindle, an acquaintance did this for me), the formatting was preserved. Has anyone ever tried to upload a book in a .mobi format created by Calibre to Amazon Kindle? If so, what were the results? Cheers, Michelle |
04-15-2011, 03:45 AM | #8 | |
Bookmaker & Cat Slave
Posts: 11,462
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Quote:
Whether or not it's some super-secret conspiracy agenda by Amazon, or whether it is a genuine technical issue having to do with, as Amazon told me, adequate space in the headers or lack thereof, has nothing to do with me whatsoever. And since I'm having to suffer imprecatory nonsense about whether or not I have a "hidden agenda" in simply restating what Amazon stated, and reporting my own experiences with DRM'ed Kindle books not functioning on the Kindle4 apps, perhaps someone ELSE would be good enough to do their own experiments, with their own books, or their own clients' books, at their own expense, and their clients' expense, and ASK Amazon themselves, instead of libeling the messenger? Thanks. I'm sorry I bothered to tell anyone. Hitch |
|
04-15-2011, 06:28 AM | #9 |
Wizard
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
@Hitch, I don't think anyone disbelieves you, I think the OP is just getting a bit confused between the discussion of formatting vs. DRM.
@eggheadbooks1, what you're seeing in the Kindle Previewer is exactly what you'll see on a real Kindle (note that's not true of KindleforPC/KindleforMac). People use Calibre generated mobis all the time on real Kindles, and many have noted the same thing you noted in your testing, namely that Calibre generally does a more accurate conversion from epub to mobi than Amazon's own tools. AZW and Mobi are exactly the same to the best of my knowledge, Amazon just changes the extension, and the DRM is very slightly changed over mobipocket's original DRM scheme. If you're not planning on publishing to Amazon you don't need to worry about the rest of the discussion. |
04-15-2011, 11:01 AM | #10 |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi,
Kindle for PC/Mac use information from the extended header to determine the encryption key. This can and does include multiple metadata values including a token, asin, guid, kindle drm server, text to speech and etc metadata values. So it could be possible that simply growing the size of the extended header region and leaving lots of blank space (500 bytes?) at the end not assigned to any specific metadata value might allow later DRM addition software to create and write its required metadata fields in the extended header region without having to rewrite all of the individual sections/offsets that make up the .mobi ebook. I would think that using mobiunpack or mobi2mobi to print the full size and contents of the extended header region and comparing the extra space (if any in the extended header) produced by KindleGen or KindlePreviewer to Calibre would be enough to determine if a simple allocation of extra space there might do the trick. My 2 Cents, KevinH |
04-15-2011, 12:14 PM | #11 |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi,
I chose an epub I had and loaded it into calibre and converted it to .mobi, and then used mobiunpack.py to simply look at the structure of the resulting .mobi. I took the same .epub and converted it using kindlegen (for Mac) to a .mobi and then used mobiunpack.py to examine the structure of the resulting mobi. Here is what I found (any unknown metadata keys are shown as numbers and their associated values as hex strings) Kindlegen Generated .mobi Unpacking Book ... number of sections 467 0 3816 1 12484 2 13577 ... length of this header 232 book title offset 456 offset to start of extended header 248 extended header length 208 extended header num_items 9 MetaData ISBN -> 978-0-385-53313-3 Creator -> Dan Brown Publisher -> Doubleday Rights -> Copyright 2009 300 -> 03000000000000000000000000000080002000000000000000 00000000000000ecbef4ed01e001fc01a901b5409440934099 4098409c409d 204 -> 000000ca 205 -> 00000001 206 -> 00000002 207 -> 0000821b Here is the same information for the Calibre generated .mobi Unpacking Book ... number of sections 354 0 2912 1 3472 2 4656 ... length of this header 232 book title offset 544 offset to start of extended header 248 extended header length 292 extended header num_items 12 MetaData Creator -> Dan Brown Publisher -> Doubleday ISBN -> 978-0-385-53313-3 Published -> 2011-04-15 15:25:24+00:00 Contributor -> calibre (0.7.54) [http://calibre-ebook.com] Rights -> Copyright 2009 ASIN -> 0f7dd9a0-003a-45d8-9c87-c2adcef46ca1 CoverOffset -> 11 202 -> 0000001d 203 -> 00000000 501 -> 45424f4b Updated Title -> The Lost Symbol So it appears that the used up space of the extended headers are similar, although KindleGen seems to create some unknown MetaData keys/values: 204, 205, 206, 207, and 300. I will look online to see what these keys are actually for. The biggest difference seems to be the size of section 0. For the Kindlegen generated mobi, section 0 is 12484 - 3816 = 8668 bytes long, whereas for the Calibre generated mobi, section 0 is 3472 - 2912 = 560 bytes long. I am not sure what else is found in section 0 besides the extended header but the size difference is quite large. So perhaps, what Amazon is referring to is the size of section 0 or the fact that certain metadata key/values are missing? Hard to say without more information. |
04-15-2011, 12:20 PM | #12 |
creator of calibre
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
If you can figure out what the difference is, I'll be happy to modify the calibre MOBI output code accordingly. I've never been motivated enough personally, to do it.
|
04-15-2011, 12:22 PM | #13 |
Sigil & calibre developer
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
There is info about these extra keys in the mobi document on th mobileread wiki.
|
04-15-2011, 01:08 PM | #14 |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi,
I personally don't think the actual metakeys matter. I think the whole difference is in the size of section 0. Section 0 in the Kindlegen version is 8668 bytes whereas in the calibre version is 560 bytes. With the extra space (over 8K worth of it - and if you look it is all nulls after the title info), a single pass program could rewrite the entire set of metadata any way it wanted while not changing any of the other sections of the container file. So it would be interesting to modify Calibre code to produce a 8668 byte size section 0 (full of nulls after the last title info) thereby leaving room for the entire metadata section to be overwritten when uploaded. Then someone could try to upload that generated .mobi to see if that fixes the issues at Amazon's end. I am betting it would. My 2 cents ... |
04-15-2011, 02:26 PM | #15 |
creator of calibre
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
KevinH: I'm a little confused.
Looking at line 1514 in mobi/writer.py in the calibre source code, it seems like calibre always pads record zero with null bytes to ensure its length is atleast 2452, so how are you getting 560? In fact, I wrote a quick utility to get the record offsets from a MOBI file, and the offset of record 1 in a calibre created MOBI is 2580 while in a kindlegen MOBI it was 2540. IIRC, in MOBI, the length of a record is never specified, the length of record n is offset(n+1) - offset(n) So the calibre record 0 has length: 2452 and the kindlegen one has length 2404 Of course, it's a long time since I looked at MOBI, so I may be missing something. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
conversion html -> mobi read on kindle | manonoc | Kindle Developer's Corner | 4 | 11-24-2010 11:01 AM |
Troubleshooting Kindle DX Graphite html => mobi problem | carterw | Amazon Kindle | 2 | 11-10-2010 04:46 AM |
[Old thread] Need help -> Kindle 3, Mobi format, Hebrew | nitzanb | Conversion | 2 | 09-28-2010 06:54 AM |
HTML to MOBI text format is off when I get it on Kindle | cloudyvisions | Calibre | 5 | 07-14-2010 12:42 AM |
Convert HTML file to MOBI for Kindle | IMFletch | Calibre | 5 | 04-16-2010 01:06 PM |