Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Kindle Formats

Notices

Reply
 
Thread Tools Search this Thread
Old 01-12-2012, 12:45 PM   #241
pdurrant
The Grand Mouse 高貴的老鼠
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 71,492
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
Quote:
Originally Posted by lizcastro View Post
Thanks, Kevin! This is so helpful.

Can you confirm that the only thing mobi_unpack does is show what was in the mobi file? It doesn't generate anything, right?

When I convert an EPUB file to mobi with KindleGen2, and then unpack it with your latest version of mobi_unpack, I get a folder that contains a smaller version of the EPUB file than the original, an HTML file with what looks like the contents of the entire book, along with an ncx and opf file, and a folder with reduced size images.

Then, there's a K8 folder that contains a completely re-engineered set of files, all renamed, resized images, etc. of what was originally in my EPUB file.

And then there's a kindlegensrc.zip file, that when unzipped, contains my original unaltered files.

It all seems so excessive.

thanks,
Liz
The only new thing that Mobiunpack creates is the epub, which is generated from the K8 folder. The HTML, ncx, opf and folder of images are the mobipocket version, the K8 is the new Kindle Format 8 version and the kindlegensrc.zip are indeed your original files which are also in the Mobipocket file.

Yes, the output from the new KindleGen does contain the Mobipocket, KF8 and your source files, all wrapped up in one.
pdurrant is offline   Reply With Quote
Old 01-12-2012, 12:48 PM   #242
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,630
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi Liz,

It does unpack and generate things so that the end user could edit the files and drop them back on kindlegen to recreate a modified mobi.

The new kindlegen creates mobis (palm database files) that actually have two completely different versions of the ebook inside it (and I am not referring to the kindlegensrc.zip which may also stored there).

The first is the original mobi format ebook and immeidately after it is the new K8 mobi ebook all stored in the same .mobi palm database file.

So older technology can read the .mobi file from the top and see it as a normal mobi. Newer technology can then detect that this is a compound mobi file and actually open the second half which is the K8 formatted (html5 - basically a variation of an epub) to get all of the new features.

Right now, mobi_unpack.py will create in the output folder the following:

1. from the old part of the .mobi it will create the source mobi markup (old html) and images that will allow the user to edit it any way they want and drop it back on kindlegen.

2. if the kindlegensrc.zip record is present it will unpack it so that the user can see the actual source ebook file (typically an epub) given to kindlegen. This record is typically removed by Amazon but is actually created by Kindlegen.

3. from the K8 version of the .mobi, it will create the K8 folder and inside it all of the images and fonts, and xhtml source files that were used to create it. A user who did not have access to the kindlegensrc.zip could edit this and then drop it on kindlegen to create a new/altered version of the ebook (fix typos, etc).

4. From the K8 pieces, it actually will build a complete epub which is stores as well.

You can then compare the epub created from the K8 against the kindlegensrc.zip (typically an epub) to see what is anything the kindlegen processing changed.

All of this requires rebuilding and generation. The actual binary format inside of the mobi file needs to be decoded to make something that is usable in some way. If you want to see what the actual raw files look like, you can use NotePad+ or any good text editor to change one line near the top of mobi_unpack.py that will write out all of the raw text pieces as well.

So it is simply not something that dumps sections from the palm database file. It actually does that (the raw file) and then rebuilds it to try to get back to the original source so that authors and people can more easily edit their books and recreate mobi output using Kindlegen.

It is also useful for understanding the internal format of the new .k8 mobis and what if any tags are created and used.

If you have any other questions just ask.

Take care,

Kevin




Quote:
Originally Posted by lizcastro View Post
Thanks, Kevin! This is so helpful.

Can you confirm that the only thing mobi_unpack does is show what was in the mobi file? It doesn't generate anything, right?

When I convert an EPUB file to mobi with KindleGen2, and then unpack it with your latest version of mobi_unpack, I get a folder that contains a smaller version of the EPUB file than the original, an HTML file with what looks like the contents of the entire book, along with an ncx and opf file, and a folder with reduced size images.

Then, there's a K8 folder that contains a completely re-engineered set of files, all renamed, resized images, etc. of what was originally in my EPUB file.

And then there's a kindlegensrc.zip file, that when unzipped, contains my original unaltered files.

It all seems so excessive.

thanks,
Liz
KevinH is offline   Reply With Quote
Advert
Old 01-12-2012, 01:10 PM   #243
lizcastro
Member
lizcastro doesn't litterlizcastro doesn't litter
 
Posts: 16
Karma: 148
Join Date: Apr 2010
Device: iPad, NOOK, Kindle, Kobo
Fascinating! Thanks so much for the info. And for mobi_unpack itself.

I find the fact that the mobi file contains a non-KF8 version, a KF8 version AND the original EPUB particularly interesting.

And I hate the way all the files get renamed! I assume that's KindleGen and not mobi_unpack.

Are either of you on Twitter? I'd love to follow you.

best,
Liz
lizcastro is offline   Reply With Quote
Old 01-12-2012, 02:08 PM   #244
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,630
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi Liz,

Inside the .mobi there are no file names at all. Each font, image, etc is just stored in section of the database (with no name info) and referred to from the processed html (i.e all links are converted to section numbers in the .mobi palm database).

So all "names" are created by us (either based on the title) or simply numbered with img0001.jpg, font0002.ttf, part0004.xhtml, etc. We have no way of knowing what the original name was, whether it was a chapter, or section or ....

That is the main reason we need to re-generate things. Even in the older mobis, the mobi markup html that was input to kindlegen was processed to remove links, store images in sections, etc, and so we must reverse that to get back to something that can be edited by users.

As for twitter - I am too old to deal with anything new ;-)

But I am sure Paul, or DiapDealer or any of the other contributors from this forum topic (mobi_unpacker is really the joint effort of a lot of people) would be happy to answer any questions.

Take care,

KevinH


Quote:
Originally Posted by lizcastro View Post
And I hate the way all the files get renamed! I assume that's KindleGen and not mobi_unpack.

Are either of you on Twitter? I'd love to follow you.

best,
Liz
KevinH is offline   Reply With Quote
Old 01-12-2012, 02:18 PM   #245
lizcastro
Member
lizcastro doesn't litterlizcastro doesn't litter
 
Posts: 16
Karma: 148
Join Date: Apr 2010
Device: iPad, NOOK, Kindle, Kobo
Whoa. I didn't realize. I sort of knew that mobi was this big mass of data, but didn't realize to what extent. So, if I understand correctly, mobi_unpack reverse engineers the mobi and then generates what the individual files would look like if they were individual files?

So it's not KindleGen that renames them, it's mobi_unpack, but it does so because it has no other choice, since the names are lost in the conversion to mobi?

But the kindlegensrc.zip file actually comes from a real, existing EPUB that's sitting there in the mobi file created by KindleGen?

Going to set WRITE_RAW_DATA to True now to see what happens.

thanks!

Liz
lizcastro is offline   Reply With Quote
Advert
Old 01-12-2012, 02:31 PM   #246
lizcastro
Member
lizcastro doesn't litterlizcastro doesn't litter
 
Posts: 16
Karma: 148
Join Date: Apr 2010
Device: iPad, NOOK, Kindle, Kobo
Why does mobi_unpack generate an EPUB file?
lizcastro is offline   Reply With Quote
Old 01-12-2012, 02:45 PM   #247
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,546
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Why does mobi_unpack generate an EPUB file?
I will defer to Kevin for the final say on this question, but for myself... mobi_unpack generates an epub because the KF8 format itself is basically nothing more than a binary representation of an epub.

So since the original source won't be part of a commercially available, DRM-Free KF8 ebook, mobi_unpack decompiles the KF8 data into a familiar, standard, editable format that can be easily modified (or examined) with existing tools/programs and then fed right back to kindlegen.

Last edited by DiapDealer; 01-12-2012 at 03:04 PM.
DiapDealer is online now   Reply With Quote
Old 01-12-2012, 03:07 PM   #248
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,630
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi,

Yes, exactly as DiapDealer said!

It is nice to have the kindlegensrc.zip but ebooks downloaded from Amazon won't have that. Amazon strips it off (and if they keep it they could start selling epubs if they ever wanted to as well).

So mobi_unpacker tries to recreate the original epub as close as it can be based on the K8 information (which is xhtml based with normal css that is essentially an epub with the main bits merged into one file with links replaced and a few other modifications).

Take a look at the _k8.raw file in a text editor to see what the kindlegen actually stores inside. You can find the css info stored at the end (inline) with any svg moved to there as well. You can see how they have replaced links with base 32 numbered references, added their own aid="", etc.

The mobi_unpacker figures out how to reverse all of that to get back to as close to an epub as possible since that is the input format for kindlegen.

Take care,

Kevin


Quote:
Originally Posted by DiapDealer View Post
I will defer to Kevin for the final say on this question, but for myself... mobi_unpack generates an epub because the KF8 format itself is basically nothing more than a binary representation of an epub.

So since the original source won't be part of a commercially available, DRM-Free KF8 ebook, mobi_unpack decompiles the KF8 data into a familiar standard editable format that can be easily modified (or examined) with existing tools/programs and then fed right back to kindlegen.
KevinH is offline   Reply With Quote
Old 01-12-2012, 03:11 PM   #249
lizcastro
Member
lizcastro doesn't litterlizcastro doesn't litter
 
Posts: 16
Karma: 148
Join Date: Apr 2010
Device: iPad, NOOK, Kindle, Kobo
Hmm. I don't see the _k8.raw file. When I used WRITE_RAW_DATA=True, the only thing I got different was a .rawml file, but it looks a lot like the .html file on the non-kf8 side. Should I have modified some other setting?
lizcastro is offline   Reply With Quote
Old 01-12-2012, 03:17 PM   #250
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,630
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi,

Look for a file inside the K8 directory that is named after the title of the book and ends with .rawml (I used to call it _k8.raw but then moved it to inside the K8 so that it would not impact the raw version from the older mobi part of the ebook).

You should find the css at the end, links changed, aid="" placed in tags to augment the original id="", etc.

For fun you can look at the .rawml version outside of the K8 directory. It is how the original mobi markup language got processed by kindlegen. Check out the links, how styles are inlined, etc.

Last edited by KevinH; 01-12-2012 at 03:23 PM.
KevinH is offline   Reply With Quote
Old 01-12-2012, 03:42 PM   #251
lizcastro
Member
lizcastro doesn't litterlizcastro doesn't litter
 
Posts: 16
Karma: 148
Join Date: Apr 2010
Device: iPad, NOOK, Kindle, Kobo
I see. Interesting.

Here's another question. If I'm selling mobi files directly, how do I get rid of the original EPUB? It seems like it would make the file unnecessarily large.
lizcastro is offline   Reply With Quote
Old 01-12-2012, 03:53 PM   #252
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,630
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi,

I believe Paul has a kindlegensrc stripper someplace? Try searching this Mobi forum and you should find a thread about it.

Ahh ... there is a KindleStrip program (next thread down I believe) that does what you want but it has not yet been updated to deal with the new kindlegen. I am sure someone here will soon patch it to make it work. And perhaps expand it to remove the K8 or older mobi parts as well. It makes no sense to ship so many copies of the ebook, it is just generating bloat.

Last edited by KevinH; 01-12-2012 at 03:56 PM.
KevinH is offline   Reply With Quote
Old 01-12-2012, 04:05 PM   #253
lizcastro
Member
lizcastro doesn't litterlizcastro doesn't litter
 
Posts: 16
Karma: 148
Join Date: Apr 2010
Device: iPad, NOOK, Kindle, Kobo
Thanks!
lizcastro is offline   Reply With Quote
Old 01-12-2012, 05:26 PM   #254
pdurrant
The Grand Mouse 高貴的老鼠
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 71,492
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
Quote:
Originally Posted by KevinH View Post
Ahh ... there is a KindleStrip program (next thread down I believe) that does what you want but it has not yet been updated to deal with the new kindlegen. I am sure someone here will soon patch it to make it work. And perhaps expand it to remove the K8 or older mobi parts as well. It makes no sense to ship so many copies of the ebook, it is just generating bloat.
Of course, Amazon want you to use KindleGen to create files to send to them to sell. They not really concerned about people using it for private uses. so the bloat doesn't matter. I'm sure that when they come to send the book out they'll strip it down to Mobi or KF8 (depending on the device it's being sent to), not both.
pdurrant is offline   Reply With Quote
Old 01-14-2012, 04:05 PM   #255
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,630
Karma: 5433388
Join Date: Nov 2009
Device: many
new version of experimental K8 mobi_unpack.py

Hi,

I have had access to more samples (including the fixed layout Children's sample) and therefore have:

- added support for image files used in CSS sheets (needed for fixed layout)

- modified the unpacker to deal with the extra metadata fields used by fixed-layout ebooks
"RegionMagnification", "fixed-layout",
"book-type", "orientation-lock", "original-resolution"


- identified the BOUNDARY section number

- builds the epub from the K8 pieces with compression now

- fixed the mobi_k8proc.py class code to be better encapsulated (added accessor methods)

- fixed support for older mobis with no ncx


So attached is the very latest version of the experimental mobi_unpack.py program.

python ./mobi_unpack.py Jerome.mobi test/


PS: I have just updated the .zip attachment with all bug fixes I know about so far including some additional support for guide elements.


PPS: I have again now updated the .zip attachment to support multiple @import url statements in css.

PPPS: removed since DiapDealer has posted the latest version later on in this thread.

Last edited by KevinH; 01-18-2012 at 10:28 AM. Reason: removed old zip DiapDealer has posted the latest version
KevinH is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Can i rotate text and insert images in Mobi and EPUB? JanGLi Kindle Formats 5 02-02-2013 04:16 PM
PDF to Mobi with text and images pocketsprocket Kindle Formats 7 05-21-2012 07:06 AM
Mobi files - images DWC Introduce Yourself 5 07-06-2011 01:43 AM
pdf to mobi... creating images rather than text Dumhed Calibre 5 11-06-2010 12:08 PM
Transfer of images on text files anirudh215 PDF 2 06-22-2009 09:28 AM


All times are GMT -4. The time now is 10:56 PM.


MobileRead.com is a privately owned, operated and funded community.