04-15-2011, 02:50 PM | #16 |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi Kovid
I was basing that on the parsing done by mobiunpack.py to get the starting offset of each section. The difference in starting offsets determines the section length. Since section 0 contains the extended header, it;s size is the difference in the starting positions of section 0 and section 1. For my test case under Calibre this provides: going to load section 0 now loading section 0 before: 2912 and after: 3472 as the starting offset and the ending offsets. This provides a size of 3472-2912 = 560 bytes for the extended header (section 0) For my test case under KindleGen this provides: loading section 0 before: 3816 and after: 12484 as the starting and ending offsets. This provides a size of 8668 bytes. Perhaps there is a bug in mobiunpack.py in how it does sections but if you actually open the KindleGen produced book in emacs, you can see the almost 8000 bytes of nulls right where it says it should be. Here is the code snippet that does the sectioning in mobiunpack.py (for what it is worth). Code:
class Sectionizer: def __init__(self, filename, perm): self.f = file(filename, perm) header = self.f.read(78) self.ident = header[0x3C:0x3C+8] self.num_sections, = struct.unpack_from('>H', header, 76) print "number of sections ", self.num_sections sections = self.f.read(self.num_sections*8) self.sections = struct.unpack_from('>%dL' % (self.num_sections*2), sections, 0)[::2] + (0xfffffff, ) for z in xrange(self.num_sections): print z, " ", self.sections[z] def loadSection(self, section): print "loading section ", section before, after = self.sections[section:section+2] print "before: ", before, " and after: ", after self.f.seek(before) return self.f.read(after - before) |
04-15-2011, 03:02 PM | #17 |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi Kovid,
Could your term "record 0" and my term "section 0" be talking about different things? I think your record 0 includes everything up to where the my section 0 starts. Mobiunpack numbers sections starting from 0 and does not offset it by 1. According to mobiunpack for my test case (a randomly chosen epub converted to mobi) my section 0 starts at 2912 (Calibre version) and at 3816 (KindleGen version) which are quite close and both values are greater than your record 0. But the next section does not begin until 3472 for the Calibre versus versus 12484 for the KindleGen version. Mobiunpack starts to read the code for the extended header 16 bytes inside of section 0 (provided above). Perhaps the code just references things in different ways? When I talk about section 0, I am talking about where the extended header information is stored in the file. I think that must be record "1" in your code? I will grab the calibre src and take a look to see. KevinH |
Advert | |
|
04-15-2011, 03:12 PM | #18 |
creator of calibre
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Do you mean the length of the EXTH header?
By record 0, I just mean the first record in the Palm Database. record 0 contains a Palmdoc header, a MOBI header and an EXTH header, for details, see: https://wiki.mobileread.com/wiki/MOBI Last edited by kovidgoyal; 04-15-2011 at 03:17 PM. |
04-15-2011, 03:26 PM | #19 |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi Kovid,
Section 0 in mobiunpack terms is where the EXTH information is stored. I am not exactly referring to the size of the exth information itself but instead to the size of the section where the exth info is stored. I will stare at the Calibre mobi code and try to see where we differ here. |
04-15-2011, 03:43 PM | #20 |
creator of calibre
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I'm attaching a trivial test.mobi file. Can you tell me what the length of your section 0 is in this file.
|
Advert | |
|
04-15-2011, 03:50 PM | #21 |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi Kovid,
Here is what Mobiunpack says about your test.mobi file: Unpacking Book ... number of sections 6 Section Number, Starting Offset 0, 128 1, 2580 2, 2754 3, 2756 4, 2792 5, 2836 # Prints this just before it parses the exth header info going to load section 0 now loading section 0 before: 128 and after: 2580 # Offsets within this section book title offset 384 offset to start of extended header 248 extended header length 132 extended header num_items 5 # Metadata keys and values stored in exth Creator -> Unknown Updated Title -> t ASIN -> fa943046-2dcc-4be7-8ac0-888ef7e626ae 501 -> 45424f4b Published -> 2011-04-15T16:50:21.482609+00:00 That means the length of my section 0 for this file is 2580 - 128 = 2452 bytes Last edited by KevinH; 04-15-2011 at 04:38 PM. Reason: cleaned it up to make it readable |
04-15-2011, 04:10 PM | #22 |
creator of calibre
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
That is what I would expect. Can you attach the MOBI you generated?
|
04-15-2011, 04:12 PM | #23 |
creator of calibre
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I think the size you are referring to is the number of extra bytes in section 0 after the end of EXTH header? because the length of a calibre produces section 0 is always atleast 2452 bytes
|
04-15-2011, 04:22 PM | #24 | |
Bookmaker & Cat Slave
Posts: 11,462
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Quote:
No, she's not confused. I was quite clear about the difference in the KDP-forum thread, and I'm pretty sure the "...some forum user who may or may not have his own agenda" crack is fairly precise language--and the OP is, after all, an author (who has published several discussions about her expertise in Kindle formatting, as well as blog articles, etc. all in the last 2 weeks on the KDP forums). Pile on the authoritative "I for one have never heard of this" and Kovid's comment about Amazon making a "ridiculous excuse" and I could either appear stupid, ill-informed OR someone who has an agenda, and I simply want to stop this in its tracks before it goes further. I also told her that mobi and azw are precisely the same (along with prc, for that matter), and that azw is naught more than--literally--a MBPC-generated book with a proprietary file extension, and that the mobi/prc/azw extension had absolutely nothing to do with the issue. The entire discussion came about because the OP does not wish to use CSS to eliminate Kindle's default first-line indent on paragraphs, and announced to the KDP that using Calibre "fixed" the issue. I didn't want some poor noob author using Calibre to produce a book and then have books returned by unhappy users (as did my client, way back when this came up), because everyone on that list is publishing to the KDP, everyone is doing commercial production, even if it is only one book at a time. So I responded on that list, in an attempt to head off someone else's misfortune. I shouldn't have let snark get up my nose, here on MR, but it's been a long few weeks here at Booknook, so my "diplomacy-low light" is flashing. ;-) @Kovid: and yes, I reported there as well that you were not motivated to chase it down. The OP retorted that she'd ask you herself. So I guess you've been asked, and now replied on the topic a second time. Thanks, @Idolse! @Kovid, nice to see ya. Hitch |
|
04-15-2011, 04:36 PM | #25 | |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Quote:
Yes this is extra size in section 0 after the EXTH header. However my version of calibre did not allocate 2452 bytes at all. Is that new in 0.7.55? I privately emailed you the test2.mobi file created via Calibre since the book I randomly chose was a commercial ebook (and therefore should not be posted in the forum). To be specific, to create that test2.mobi I used Calibre on a Mac (0.7.54) to import a non-drm .epub file. I then converted it to .mobi via Calibre (using the default settings only) and then used the "Save to Disk" to save the .mobi and .epub to my hard drive where I copied the resulting .mobi out and renamed it to test2.mobi which I sent to you. Either way, the KindleGen version of the book has a size of over 8668 bytes for the total size of the section 0. This allows them to pretty much extend/change the exth section in any way they want without having to change any of the other records in the pdb format. If you want the KindleGen version, please let me know and I will e-mail it to you (it is twice the size). Thanks, KevinH |
|
04-15-2011, 04:43 PM | #26 |
creator of calibre
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I think what's happening is the set MOBI metadata code in calibre is truncating record 0 to the minimum possible size. You can confirm this by not using save to disk to get the MOBI out of the calibre library.
|
04-15-2011, 04:56 PM | #27 | |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Quote:
Here are the output sections starting offsets if I simply manually copy the .mobi file out of the Calibre Library. Unpacking Book ... number of sections 354 0 2912 1 5364 2 6548 3 7612 4 8676 5 9736 6 10799 7 12194 8 14307 So in this case Section 0 starts at 2912 and ends at 5364 for a size of exactly the 2452 bytes you said it would be. Perhaps the Amazon reject came because of using "Save to Disk" which made the exth region so tight there was not enough space to easily change it without rewriting the entire actual file. My guess is they keep exactly one binary DRM image of each book and write the EXTH info on the fly to correspond to the version of Kindle you have (K4Mac, K4PC, etc) in order to encode the PID information needed for that reader/platform/device. |
|
04-15-2011, 05:04 PM | #28 |
creator of calibre
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
If that's the case, I'm really surprised at Amazon. Do they lack the ability to write a simple script to insert space into a MOBI record0 and update the offsets table accordingly. calibre has been able to do that for *years*.
Oh well, I'll have the next release of calibre guarantee that there is always 8KB worth of null bytes immediately after the exth header in record0, when creating MOBI files and when updating mobi metadata. |
04-15-2011, 05:19 PM | #29 | |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi Kovid,
Quote:
Assuming that encoding a book with DRM is costly on a large scale, I would only encrypt each book once and then have extra space in record 0 where I could write the real encryption key encrypted with a set of PIDs and the metadata values needed to figure out at least one of those PIDs based on information stored on the device itself, and my own registration information, as well as metadata information taken from the book itself. Then a simple rewrite of section 0 can handle all devices/apps/readers and not changes need be made. As you said, it is very easy to redo the offset tables so all of this might just be a waste of time! KevinH |
|
04-15-2011, 05:30 PM | #30 |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi Kovid,
I ran mobiunpack on 3 encrypted Kindle for Mac books and of course it failed (since the books were encrypted) but the section 0 offsets did get printed and every one of the 3 was over 8000 bytes (8756, ...) long. Maybe they were all generated by KindleGen? So it looks like 8K of null bytes added to the end is a safe value? Then again, I may be way off base here. Take care, KevinH |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
conversion html -> mobi read on kindle | manonoc | Kindle Developer's Corner | 4 | 11-24-2010 11:01 AM |
Troubleshooting Kindle DX Graphite html => mobi problem | carterw | Amazon Kindle | 2 | 11-10-2010 04:46 AM |
[Old thread] Need help -> Kindle 3, Mobi format, Hebrew | nitzanb | Conversion | 2 | 09-28-2010 06:54 AM |
HTML to MOBI text format is off when I get it on Kindle | cloudyvisions | Calibre | 5 | 07-14-2010 12:42 AM |
Convert HTML file to MOBI for Kindle | IMFletch | Calibre | 5 | 04-16-2010 01:06 PM |