MobileRead Forums - View Single Post - Mobigen KindleGen Conversion Process Documentation

KevinH · 03-29-2012, 07:03 PM

Hi,

Just to be very clear ... this project is a school assigment is that correct or not?

If so, it is hard to believe that the instructor did not know that the mobi file format has never been officially documented, nor has the conversion process. Seems more like a work assigment than a school assignment.

Quote:

Originally Posted by sarafnikit

And at last, i realized that there cannot be an easy solution and stopped searching for it and looked at the source code of Calibre and Mobiperl. But found out that Calibre is much more complex(obviously as it is all in one converter and not just the converter for one file). It first converts the file into XHTML, then parses that XHTML applying many algorithms on it, then converting that XHTML to the required format.

Just what do you think an epub is? It is a set of xhtml files, css stylesheets, images, etc all linked by an opf file, ncx, etc. So the calibre code does exactly what it needs to covert from its internal oeb format (very similar to an epub) to mobi.

Quote:

But during the conversion of EPUB to MOBI the following are the largely major steps as i could understand by going through MobiPerl:-
1.Parsing the opf file to get the tree structure of the file.
2.Cleaning up the HTML files - Adding the tags which are needed by MOBI and removing the ones which are not supported by it.
3.Compiling the opf and HTML files, including all the necessary headers for the mobi in the process.
(Though many more things are involved like keeping a check on the metadata, table of contents, guide section etc.)

Am i correct??

Broadly, yes, there are lots of other steps you are missing such as converting the css to the strange html markup used by mobis, converting and scaling images, converting metadata, building the mobi header, building the palm db container file, processing the ncx, etc.

The devil is in the details so reading and understanding the calibre code is the right way to go if you truly want to understand the conversion process.