MobileRead Forums - View Single Post - KindleUnpack (MobiUnpack): Extracts text, images and metadata from Kindle/Mobi files

KevinH · 09-15-2014, 09:13 PM

Hi Doug,

In the mobi_opf.py in the part that builds the manifest for the opf, there is this media-map that determines things. The KF8 part unpacks to .xhtml file extensions while the older mobi part unpacks to .html so so gets that strange media-type.

Code:

media_map = {
                '.jpg'  : 'image/jpeg',
                '.jpeg' : 'image/jpeg',
                '.png'  : 'image/png',
                '.gif'  : 'image/gif',
                '.svg'  : 'image/svg+xml',
                '.xhtml': 'application/xhtml+xml',
                '.html' : 'text/x-oeb1-document', # for mobi7
                '.pdf'  : 'application/pdf', # for azw4(print replica textbook)
                '.ttf'  : 'application/x-font-ttf',
                '.otf'  : 'application/x-font-opentype', # replaced?
                #'.otf' : 'application/vnd.ms-opentype', # [OpenType] OpenType fonts
                #'.woff' : 'application/font-woff', # [WOFF] WOFF fonts
                #'.smil' : 'application/smil+xml', # [MediaOverlays301] EPUB Media Overlay documents
                #'.pls' : 'application/pls+xml', # [PLS] Text-to-Speech (TTS) Pronunciation lexicons
                '.otf'  : 'application/x-font-opentype', # replaced?
                #'.mp3'  : 'audio/mpeg',
                #'.mp4'  : 'audio/mp4',
                #'.js'   : 'text/javascript', # not supported in K8
                '.css'  : 'text/css'
                }

So it would be easy to change in KindleUnpack. That said, I passed a content.opf from an old mobi through kindlegen 2.9 and it generated a lot of warnings and built a KF8 part that would never pass any epub check.

So it looks like even Kindlegen is requiring a valid epub as input otherwise it generates junk for the KF8 part. I thought that unpacking an old mobi and then passing it back through kindlegen might be as easy way to convert from html 3 to true xhtml. No such luck.

I frankly think we should use the old mobiml2xhtml.py codebase (actually its newer cousin from your KindleImport) and try and create at least a basic, valid epub-like structure from the old mobi part. Kindlegen seems to be much more adept at taking valid epub xhtml and making old html 3 than doing the reverse.

If others agree, I would be happy to incorporate it into the next KindleUnpack release.

Take care,

Kevin

09-15-2014, 09:13 PM	#993
KevinH Sigil Developer Posts: 8,937 Karma: 6361444 Join Date: Nov 2009 Device: many	Hi Doug, In the mobi_opf.py in the part that builds the manifest for the opf, there is this media-map that determines things. The KF8 part unpacks to .xhtml file extensions while the older mobi part unpacks to .html so so gets that strange media-type. Code: media_map = { '.jpg' : 'image/jpeg', '.jpeg' : 'image/jpeg', '.png' : 'image/png', '.gif' : 'image/gif', '.svg' : 'image/svg+xml', '.xhtml': 'application/xhtml+xml', '.html' : 'text/x-oeb1-document', # for mobi7 '.pdf' : 'application/pdf', # for azw4(print replica textbook) '.ttf' : 'application/x-font-ttf', '.otf' : 'application/x-font-opentype', # replaced? #'.otf' : 'application/vnd.ms-opentype', # [OpenType] OpenType fonts #'.woff' : 'application/font-woff', # [WOFF] WOFF fonts #'.smil' : 'application/smil+xml', # [MediaOverlays301] EPUB Media Overlay documents #'.pls' : 'application/pls+xml', # [PLS] Text-to-Speech (TTS) Pronunciation lexicons '.otf' : 'application/x-font-opentype', # replaced? #'.mp3' : 'audio/mpeg', #'.mp4' : 'audio/mp4', #'.js' : 'text/javascript', # not supported in K8 '.css' : 'text/css' } So it would be easy to change in KindleUnpack. That said, I passed a content.opf from an old mobi through kindlegen 2.9 and it generated a lot of warnings and built a KF8 part that would never pass any epub check. So it looks like even Kindlegen is requiring a valid epub as input otherwise it generates junk for the KF8 part. I thought that unpacking an old mobi and then passing it back through kindlegen might be as easy way to convert from html 3 to true xhtml. No such luck. I frankly think we should use the old mobiml2xhtml.py codebase (actually its newer cousin from your KindleImport) and try and create at least a basic, valid epub-like structure from the old mobi part. Kindlegen seems to be much more adept at taking valid epub xhtml and making old html 3 than doing the reverse. If others agree, I would be happy to incorporate it into the next KindleUnpack release. Take care, Kevin