Generating mobi and KF8 parts of Kindle file from separate sources

PeterHatch · 01-23-2012, 02:33 PM

So I found a hack for using Kindlegen to create files for the Kindle that use different sources for the mobi part and the KF8 part - description on my blog at http://extraordinarycommons.com/crea...erent-sources/.

I'd like it if people could check files generated using the technique and see if there are any problems compared to official files; if there are no problems, a python script (or other tool) to do the same thing in a convenient manner would be very nice.

KevinH · 01-23-2012, 04:41 PM

Hi,

The key is that both versions must use all of the same support files and access them in the same order.

"Note that embedding images and fonts happens when the first version is created; the links in the kf8 version will point to the wrong object if the two sources don’t include the same files in the same order."

This is not easy to guarantee or even do especially with very different mobi and mobi8 pieces. Also, since the fonts are stored alongside and intermixed with the image files, they must all be know in advance and available when the older mobi is created (even though they come from the KF8 side - like fonts).

If you could guarantee that, then putting together a standalone python script to merge them is just not that hard to do. If however, the two sources do not access the same images in the same order (or fonts?), then this is much harder to do as it would require changing the raw ML inside of each part.

It seems to me that having/using two different css stylesheets should go a long way to getting good .mobi and .mobi8 pieces and that is supported.

DiapDealer · 01-23-2012, 05:22 PM

Quote:

It seems to me that having/using two different css stylesheets should go a long way to getting good .mobi and .mobi8 pieces and that is supported.

There's a learning curve as always, but I've found that some rather excellent results can be achieved with the "one source -- two different css files" approach. I'm sure it will be a slow process, but I don't think it will be long before MOBI is relegated to little more than the "fall-back" format.

I'd be interested in splitting the the mobi from the KF8 (if only for personal use). Even if the KF8-only file had to include a one-page, mobi stub saying; "Your device can't read this ebook" message.

PeterHatch · 01-24-2012, 10:32 PM

Quote:

Originally Posted by KevinH

This is not easy to guarantee or even do especially with very different mobi and mobi8 pieces. Also, since the fonts are stored alongside and intermixed with the image files, they must all be know in advance and available when the older mobi is created (even though they come from the KF8 side - like fonts).

If you could guarantee that, then putting together a standalone python script to merge them is just not that hard to do.

I think I would find such a script useful. I have found it possible to achieve the results I want through official methods, but I could at the very least reduce the file size a bit by removing unnecessary spans and such from the mobi code. Running kindlestrip already makes the result unofficial (and helps far more, of course).

Quote:

Originally Posted by DiapDealer

I'd be interested in splitting the the mobi from the KF8 (if only for personal use). Even if the KF8-only file had to include a one-page, mobi stub saying; "Your device can't read this ebook" message.

If the python script had an option to choose whether to use the mobi or KF8 source for the images and fonts, it would allow it to work with pairs where one includes more images and fonts than the other (as long as the ones that were shared were first and in the same order). That would allow something like a mobi stub with no images. I'd like to be able to distribute something like that, with the stub including a link to where the mobi version could be downloaded.

KevinH · 01-25-2012, 01:41 PM

Hi,

I wrote a small bash script to caputre the old mobi and separate mobi8 pieces while running kindlegen just to see what they look like.

There are no images or fonts at all in the .mobi8 piece. There are references to them but they are not included. The .mobi8 is literally simply appended to the end of the original mobi after removing the end of section and adding a BOUNDARY section with no changes from what I can see.

So all of the smarts are in the part that creates the old mobi piece as it has to now collect all images and fonts and store those in the right (and known order) so that the mobi8 piece can be simply appended.

If we could determine the exact order of these resources that would help. For example: Do fonts always come first, are images intermixed with other non-image sections, what order do the FCIS, FLIS, DATP, FDST come in, does the content.opf determine the order of resources files like images and fonts or does something else.

If we knew all that then it might be possible to create a stub for the old mobi section.

I was kind of hoping the .mobi8 piece you described would be standalone and we could learn what a standalone mobi KF8 might look like, but instead it relies on all of those resources being stored in known sections in the older mobi part.

Perhaps if someone played around with the order of images in the content.opf and then looked at how they were stored in the old mobi, we could figure out what algorithm they used. If they instead the parse the html to determine order and links and things, it will be a lot harder and possibly impossible if you want images in different order between the old and the new sides.

KevinH · 01-25-2012, 01:47 PM

Hi,

The more I think about it. It might be possible to create a KF8 mobi and then zero out the rawml for the older mobi part (or replace it with a placeholder), 0xffffffff out the old index tables from the old header and remove or make zero size all of those indx pieces, leaving pretty much just a simple header section to represent the old side, to be followed by the images and things and then the K8 side.

This might be doable. We will need to keep the older Mobiheader since its EXTH 121 value points to the boundary and it points to the shared images, fonts, etc. but everything else should be removable.

PeterHatch · 01-25-2012, 08:24 PM

Quote:

Originally Posted by KevinH

If we could determine the exact order of these resources that would help. For example: Do fonts always come first, are images intermixed with other non-image sections, what order do the FCIS, FLIS, DATP, FDST come in, does the content.opf determine the order of resources files like images and fonts or does something else.

Just checking using mobi_unpack, based on the number it gives to the font and the images, they are intermixed, and it seems to be based on parsing the HTML files listed in the order given in the spine section of the opf, adding images in order, and parsing the CSS after the HTML file that uses it.

Quote:

Originally Posted by KevinH

The more I think about it. It might be possible to create a KF8 mobi and then zero out the rawml for the older mobi part (or replace it with a placeholder), 0xffffffff out the old index tables from the old header and remove or make zero size all of those indx pieces, leaving pretty much just a simple header section to represent the old side, to be followed by the images and things and then the K8 side.

Rather than zeroing out or using a placeholder, could we substitute in that section from another .mobi file, assuming it used no images or fonts? I'd like to have a simple custom message in there.

nickredding · 01-26-2012, 07:35 PM

I have just confirmed that if you take a mobi created by kindlegen2 with both mobi7 and mobi8 parts, you can copy the RESC, FONT and image records from the mobi7 part over to the mobi 8 part and delete the mobi7 part (you have to update last content record and the FCIS and FLIS indexes in the mobi8 part to account for the additional mobi8 records). The resulting mobi8 file works fine on the kindle previewer and my Kindle Fire. I also checked an unencrypted file that uses fonts as well as images, and again no problem.

This is good news because it means I can generate mobi8 files without bothering with the mobi7 part that kindlegen includes (I'm looking at this from the point of view of creating a calibre writer module for KF8). I spent two days trying to figure out the interaction between the mobi7 and mobi8 parts (when both are present). There IS some interaction but it is quite opaque, and if you get it wrong either the file will not load or it will fall back to the mobi7 part even when you put it on a KF8 device (previewer or Fire).

I personally don't see much use for combined mobi7/mobi8 files. Eventually there will be calibre input and output modules for KF8 so if you want a mobi7 from a mobi8 you'll be able to convert (calibre will just throw away most of the KF8 styling for the mobi7 output which is what kindlegen2 does).

For anyone who is interested I have attached the python script I used to remove the mobi7 part (I started with the latest mobi_unpack.py so acknowledgements to the creators of that).

KevinH · 01-26-2012, 10:40 PM

Hi Nick,

Nicely done!

So as long as you keep the order the same, you can move them (resc,font,and image files) in a contiguous block to the right point in the mobi8 part and simply update the firstimg (firstaddl in mobi_ unpack) in the mobi 8 header and fcis, flis pointers as well to point to the right places and you get a standalone mobi8 file.

Wonderful work!

KevinH

KevinH · 01-26-2012, 11:37 PM

Hi Nick,

I think your code could be tweaked to split a new kindlegen mobi into old and new pieces. In the old piece you could look for FONT and RESC sections and make them zero length (but keeping that section) since the old mobi never would access the RESC or FONT sections.

Either that of make a standalone strip kf8 utility.

BTW, does the new mobi_unpacker.py actually work with your stripped mobi8 files or does it need to be fixed to properly unpack them.

Thanks!

KevinH

nickredding · 01-26-2012, 11:58 PM

Quote:

Originally Posted by KevinH

Hi Nick,

I think your code could be tweaked to split a new kindlegen mobi into old and new pieces. In the old piece you could look for FONT and RESC sections and make them zero length (but keeping that section) since the old mobi never would access the RESC or FONT sections.

Either that of make a standalone strip kf8 utility.

Yes that is easily done.

Quote:

BTW, does the new mobi_unpacker.py actually work with your stripped mobi8 files or does it need to be fixed to properly unpack them.

No but all it has to do is test version==8 in record 0 and if true proceed immediately to the kf8 code (skipping the mobi7 code). Edit: well there's a bit more to it since the images etc, are in the kf8 segment but it's still easy.

Maybe what I'll do is update mobiunpack to do both--create standalone mobi7 and mobi8 files for combo files and properly handle mobi8-only files.

nickredding · 01-27-2012, 03:19 AM

I've posted an updated unpack_mobi over in the other thread : New Mobi Decoder to perform these extra functions.

PeterHatch · 02-01-2012, 06:03 PM

Quote:

Originally Posted by PeterHatch

I have found it possible to achieve the results I want through official methods, but I could at the very least reduce the file size a bit by removing unnecessary spans and such from the mobi code.

So the methods I used relied on :after, which apparently works on the Previewer but not on the Fire.

So I, at least, would still like a tool to combine arbitrary mobi7 and mobi8 files; if the images and fonts need to match, that's okay.

For now I'll just use the crazy hack I found.

KevinH · 02-02-2012, 02:28 PM

Hi,

The key is that they can not be "arbitrary". They can not just use the same images and fonts ... they must use them the same in the exact same order.

For example, one way to approach this might be to:

1. adjust your input and use the latest kindlegen to create the best mobi7 but possibly a poor kf8 part, call this dual file best7.mobi

2. readjust your input and use the latest kindlegen to create a possibly sub-par mobi7 part and the best kf8 part, call this dual file best8.mobi

3. Then invoke some new python script (call it merge78.py) as follows:
python ./merge78.py best7.mobi best8.mobi bestboth.mobi

This as yet uncreated merge78.py program would

1. use the code in mobi_split.py (part of mobi_unpack.py) to split out each mobi into its two pieces

2. verify by walking through the specific images and font sections in each and using md5sums that all of these embedded resource files are in the exact same order and are identical (byte for byte).

3. if not, exit with error message that resource files and orders are not exact matches and so no merge is possible

4. otherwise take the first part of best7.mobi and the second part of best8.mobi and create one bestboth.mobi

A merge78.py python script that does this could be created with some work. The issue is that if you have to run kindlegen twice with related but slightly different input (and with differences that do not mess up the store order), and then finally put them together.

Is that something you are looking for?

And if so wouldn't it be better to simply use the current mobi_unpack.py to split best7.mobi and best8.mobi and simply save the right standalone pieces and sell or use those two separate instead of trying to merge them back into one dual mobi. Then you could use completely different images, fonts, css, etc and produce two different but each better ebooks.

KevinH

Quote:

Originally Posted by PeterHatch

So the methods I used relied on :after, which apparently works on the Previewer but not on the Fire.

So I, at least, would still like a tool to combine arbitrary mobi7 and mobi8 files; if the images and fonts need to match, that's okay.

For now I'll just use the crazy hack I found.

PeterHatch · 02-02-2012, 03:01 PM

Quote:

Originally Posted by KevinH

Is that something you are looking for?

Yes. Not sure if it's feasible, but if it could be made to also merge files if the resources were not identical, but one of the files had none, that would be a useful enhancement.

Quote:

Originally Posted by KevinH

And if so wouldn't it be better to simply use the current mobi_unpack.py to split best7.mobi and best8.mobi and simply save the right standalone pieces and sell or use those two separate instead of trying to merge them back into one dual mobi. Then you could use completely different images, fonts, css, etc and produce two different but each better ebooks.

Well, I'd like to have 3 versions - each separate plus a merged one. Also, I'd like to add to best8.mobi a small mobi7 part with a simple explanation that it doesn't support the old version, and the url to download a version that does - which should also let it work in Previewer, which would be nice.

(Merging the two versions I can use my hack, I'd just like a better workflow; for a small mobi7 part I think it'd be built too fast to use the hack manually, but a simple script might suffice - actually, would you be willing to share your bash script?)

01-23-2012, 02:33 PM	#1
PeterHatch Member Posts: 10 Karma: 10 Join Date: Jan 2012 Device: Kindle	Generating mobi and KF8 parts of Kindle file from separate sources So I found a hack for using Kindlegen to create files for the Kindle that use different sources for the mobi part and the KF8 part - description on my blog at http://extraordinarycommons.com/crea...erent-sources/. I'd like it if people could check files generated using the technique and see if there are any problems compared to official files; if there are no problems, a python script (or other tool) to do the same thing in a convenient manner would be very nice.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Update Mobi header/file metadata without doing a Mobi to Mobi conversion	RecQuery	Conversion	2	06-30-2012 11:43 AM
PRS-T1 Transfer notes to PC as separate file?	RBowden	Sony Reader	3	12-01-2011 12:39 AM
Can I save each page in a separate file?	moti	Calibre	2	11-11-2010 04:23 PM
Are sources of (some) parts of the Linux system used in Sony ebooks available?	kartu	Sony Reader Dev Corner	1	01-19-2010 04:02 AM
single word wrapped onto a separate line on kindle (pdf to mobi)	shinew	Calibre	2	03-21-2009 06:16 PM

01-23-2012, 04:41 PM	#2
KevinH Sigil Developer Posts: 9,007 Karma: 6361444 Join Date: Nov 2009 Device: many	Hi, The key is that both versions must use all of the same support files and access them in the same order. "Note that embedding images and fonts happens when the first version is created; the links in the kf8 version will point to the wrong object if the two sources don’t include the same files in the same order." This is not easy to guarantee or even do especially with very different mobi and mobi8 pieces. Also, since the fonts are stored alongside and intermixed with the image files, they must all be know in advance and available when the older mobi is created (even though they come from the KF8 side - like fonts). If you could guarantee that, then putting together a standalone python script to merge them is just not that hard to do. If however, the two sources do not access the same images in the same order (or fonts?), then this is much harder to do as it would require changing the raw ML inside of each part. It seems to me that having/using two different css stylesheets should go a long way to getting good .mobi and .mobi8 pieces and that is supported.

01-25-2012, 01:41 PM	#5
KevinH Sigil Developer Posts: 9,007 Karma: 6361444 Join Date: Nov 2009 Device: many	Hi, I wrote a small bash script to caputre the old mobi and separate mobi8 pieces while running kindlegen just to see what they look like. There are no images or fonts at all in the .mobi8 piece. There are references to them but they are not included. The .mobi8 is literally simply appended to the end of the original mobi after removing the end of section and adding a BOUNDARY section with no changes from what I can see. So all of the smarts are in the part that creates the old mobi piece as it has to now collect all images and fonts and store those in the right (and known order) so that the mobi8 piece can be simply appended. If we could determine the exact order of these resources that would help. For example: Do fonts always come first, are images intermixed with other non-image sections, what order do the FCIS, FLIS, DATP, FDST come in, does the content.opf determine the order of resources files like images and fonts or does something else. If we knew all that then it might be possible to create a stub for the old mobi section. I was kind of hoping the .mobi8 piece you described would be standalone and we could learn what a standalone mobi KF8 might look like, but instead it relies on all of those resources being stored in known sections in the older mobi part. Perhaps if someone played around with the order of images in the content.opf and then looked at how they were stored in the old mobi, we could figure out what algorithm they used. If they instead the parse the html to determine order and links and things, it will be a lot harder and possibly impossible if you want images in different order between the old and the new sides.

01-25-2012, 01:47 PM	#6
KevinH Sigil Developer Posts: 9,007 Karma: 6361444 Join Date: Nov 2009 Device: many	Hi, The more I think about it. It might be possible to create a KF8 mobi and then zero out the rawml for the older mobi part (or replace it with a placeholder), 0xffffffff out the old index tables from the old header and remove or make zero size all of those indx pieces, leaving pretty much just a simple header section to represent the old side, to be followed by the images and things and then the K8 side. This might be doable. We will need to keep the older Mobiheader since its EXTH 121 value points to the boundary and it points to the shared images, fonts, etc. but everything else should be removable.

01-26-2012, 10:40 PM	#9
KevinH Sigil Developer Posts: 9,007 Karma: 6361444 Join Date: Nov 2009 Device: many	Hi Nick, Nicely done! So as long as you keep the order the same, you can move them (resc,font,and image files) in a contiguous block to the right point in the mobi8 part and simply update the firstimg (firstaddl in mobi_ unpack) in the mobi 8 header and fcis, flis pointers as well to point to the right places and you get a standalone mobi8 file. Wonderful work! KevinH

01-26-2012, 11:37 PM	#10
KevinH Sigil Developer Posts: 9,007 Karma: 6361444 Join Date: Nov 2009 Device: many	Hi Nick, I think your code could be tweaked to split a new kindlegen mobi into old and new pieces. In the old piece you could look for FONT and RESC sections and make them zero length (but keeping that section) since the old mobi never would access the RESC or FONT sections. Either that of make a standalone strip kf8 utility. BTW, does the new mobi_unpacker.py actually work with your stripped mobi8 files or does it need to be fixed to properly unpack them. Thanks! KevinH

01-27-2012, 03:19 AM	#12
nickredding onlinenewsreader.net Posts: 331 Karma: 10143 Join Date: Dec 2009 Location: Phoenix, AZ & Victoria, BC Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire	I've posted an updated unpack_mobi over in the other thread : New Mobi Decoder to perform these extra functions.

Advert

Advert