09-14-2014, 04:02 AM | #991 |
The Grand Mouse 高貴的老鼠
Posts: 71,506
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
I have updated the first post and the AppleScript.
|
09-15-2014, 05:12 PM | #992 |
Grand Sorcerer
Posts: 27,549
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Out of curiosity, is the media-type "text/x-oeb1-document" found in a resource record within the MOBI when generating the content.opf file for a MOBI-only (non-KF8) kindlebook, or is it hardcoded in the KindleUnpack code? If the latter, is there a compelling reason for keeping it that way and not updating to an "application/xhtml+xml" media-type? I realize the markup file being produced isn't really xhtml, but "text/x-oeb1-document" is deprecated in the latest 2.x OPF package we appear to be building. Is kindlegen even still accepting these unpacked old-style mobi-markup files as input anymore?
|
Advert | |
|
09-15-2014, 09:13 PM | #993 |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi Doug,
In the mobi_opf.py in the part that builds the manifest for the opf, there is this media-map that determines things. The KF8 part unpacks to .xhtml file extensions while the older mobi part unpacks to .html so so gets that strange media-type. Code:
media_map = { '.jpg' : 'image/jpeg', '.jpeg' : 'image/jpeg', '.png' : 'image/png', '.gif' : 'image/gif', '.svg' : 'image/svg+xml', '.xhtml': 'application/xhtml+xml', '.html' : 'text/x-oeb1-document', # for mobi7 '.pdf' : 'application/pdf', # for azw4(print replica textbook) '.ttf' : 'application/x-font-ttf', '.otf' : 'application/x-font-opentype', # replaced? #'.otf' : 'application/vnd.ms-opentype', # [OpenType] OpenType fonts #'.woff' : 'application/font-woff', # [WOFF] WOFF fonts #'.smil' : 'application/smil+xml', # [MediaOverlays301] EPUB Media Overlay documents #'.pls' : 'application/pls+xml', # [PLS] Text-to-Speech (TTS) Pronunciation lexicons '.otf' : 'application/x-font-opentype', # replaced? #'.mp3' : 'audio/mpeg', #'.mp4' : 'audio/mp4', #'.js' : 'text/javascript', # not supported in K8 '.css' : 'text/css' } So it would be easy to change in KindleUnpack. That said, I passed a content.opf from an old mobi through kindlegen 2.9 and it generated a lot of warnings and built a KF8 part that would never pass any epub check. So it looks like even Kindlegen is requiring a valid epub as input otherwise it generates junk for the KF8 part. I thought that unpacking an old mobi and then passing it back through kindlegen might be as easy way to convert from html 3 to true xhtml. No such luck. I frankly think we should use the old mobiml2xhtml.py codebase (actually its newer cousin from your KindleImport) and try and create at least a basic, valid epub-like structure from the old mobi part. Kindlegen seems to be much more adept at taking valid epub xhtml and making old html 3 than doing the reverse. If others agree, I would be happy to incorporate it into the next KindleUnpack release. Take care, Kevin |
09-16-2014, 07:23 AM | #994 |
Grand Sorcerer
Posts: 27,549
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I'd be OK with that. I do think we need to retain the ability to produce/examine the mobiml file, though: if only for testing and for seeing what mobiml code is actually being produced by certain xhtml/epub input.
|
09-17-2014, 08:38 AM | #995 |
Connoisseur
Posts: 94
Karma: 10
Join Date: Feb 2014
Location: Japan
Device: Kindle PaperWhite, Kobo Aura HD
|
Hi,
I have no reason to oppose to implement a new feature. But I'd like to ask what is mobiml? Thanks, |
Advert | |
|
09-17-2014, 08:54 AM | #996 | |
Grand Sorcerer
Posts: 27,549
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
There's currently some work going on in another project to upgrade a semi-retired mobiml2html script to take that mobi markup and spit out something as close to xhtml as possible (while maintaining the formatting of the original book). It's made more difficult by the sheer amount of junk that can sometimes be found in that mobi markup (inline elements that cross block-level element boundaries, improperly nested and/or mismatched tags, as well as opf and ncx markup in the headers and bodies). Not to mention tags that are invalid/deprecated in xhtml. Last edited by DiapDealer; 09-17-2014 at 09:08 AM. |
|
10-03-2014, 05:02 PM | #997 |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi All,
I have an experimental version of KindleUnpack that will run on both Python 2.7 and Python 3.4 at the same time. The conversion took much longer than I expected because of massively ass-backwards decisions by the developers of Python 3. If interested please see the following issue: http://bugs.python.org/issue22549 The problem happens even if you use an iterator to access the bytes in a bytes string. The only way to get the actual characters in a bytestring is to use a slice. This hit KindleUnpack horribly in the mobi_uncompress.py, mobi_dict.py, mobi_header.py, and mobi_index.py and the problem was hard to detect and not immediately obvious at all. I think I now understand why Kovid is so reluctant to move to Python 3 or even Python 2 / Python 3 joint compatibility. It literally took me two entire days and evenings to make the initial conversion. Calibre is much much much larger, and has to manipulate bytestrings in places for binary format files even more then KindleUnpack does. Converting Calibre would take a herculean effort! If there are any testers with both python 3.4 and python 2.7 installed who would like to play around with this experimental KindleUnpack via the command-line, please let me know and I will post it. Otherwise, once I get the bugs ironed out, I will make an official release and we will attempt to keep future versions of KindleUnpack able to run on both platforms. KevinH Last edited by KevinH; 10-03-2014 at 05:31 PM. |
10-03-2014, 08:49 PM | #998 |
Resident Curmudgeon
Posts: 73,974
Karma: 128903378
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
WOW! How could the developers of Python 3 actually defend such a crappy way of doing things?
|
10-03-2014, 09:01 PM | #999 | |
Connoisseur
Posts: 94
Karma: 10
Join Date: Feb 2014
Location: Japan
Device: Kindle PaperWhite, Kobo Aura HD
|
Hi Kevin,
Thank you for your hard wrok. I have thought moving to python 3 is necessary; but hegitated because it would be hard to unpack bytes and to handle utf-8. Quote:
I have active python 3.3.4.1 (now uninstalled however). Is it work with 3.3? Thanks, Last edited by tkeo; 10-03-2014 at 09:03 PM. Reason: fixed typo. |
|
10-04-2014, 09:46 AM | #1000 |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi tkeo,
Thanks. I still have a few more tests to run to exercise epub3 code and dictionary code then I will post it tonight for you. Take care, Kevin |
10-04-2014, 12:12 PM | #1001 |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
experimental command-line version of kindleunpack for both Python 2 and Python 3
Hi tkeo,
Attached is my experimental conversion of kindleunpack (command line only - no gui yet) to run on both Python 2.7 and Python 3.4. It should run on Python 3.3 as well but I only have Python 3.4.1 on my machine to test it with. Please note: 1. this has only been tested by unpacking one large kindlegen generated mobi and a diff -urN showed that nothing was amiss when compared to standard Kindleunpack. 2. It has not been tested with a Japanese ebook nor a fixed layout or anything complex so there still will be lots of bugs to iron out. No font encryption/decryption has been tested nor have any dictionaries. I expect there still to be many bugs in those sections of code. 3. Because even one small piece of bytestring vs unicode can mess things up in python 3, care must be taken when making any changes ... Right now, and after much trial and error - I keep the actual html files and processing of it (building RawML in mobi_header.py, mobi_k8proc.py and mobi_html.py as working only in bytestrings since byte offsets are needed for link targets and for inserting fragments. Any conversion to unicode would throw off all of the byte offsets horribly and must be avoided at all costs until all position / byte offsets have been processed. The same holds true for processing the binary index data. I convert to and use unicode for mobi_ncx.py, mobi_nav.py, mobi_opf.py, and for all metadata (in mobi_header.py). In mobi_k8resc.py and mobi_pagemap.py I start processing with bytestrings until the resc data is extracted and then convert to operating in full unicode. Unfortunately, kindleunpack.py, and mobi_utils is a mix of bytestring and full unicode since it has to deal with all of this nonsense coming from different directions. 4. In other words ... this port is very temperamental and very fragile. More work will need to be done to stabilize it and revisit how soon we can convert to unicode in the html processing. Manipulating bytes in Python3 is limited and fraught with inconsistencies ... - no use of % to fold ascii or utf-8 strings into binary data (there is a pep on this) - struct.unpack will only work with bytestring formats in python all the way up to and including python 2.7.5 and possibly later - issues with iterating bytes and extracting single bytes from bytestrings, and there is a pep on this as well (pep 467) but nothing definite yet - issues with "re" requiring byte patters to work on bytstrings and visa-versa - lots of inconsistencies with many other things I have tried to take care of with the compatibility_utils.py code I have collected from all over the net It is clear that the official python programmers have never had to work close to the metal, nor worked with packed binary data, otherwise they would never have given bytestrings such a second-rate, inferior implementation. In fact, it was not until recent Python 3.3 and 3.4 releases I would have ever even tried to use Python 3 as, bytes support was just too horribly broken in python 3.0, 3.1, and 3.2 to contemplate. Instead of breaking backwards compatibility going from 2 to 3, all they had to do was start aggressively deprecating auto-conversion of bytestrings to unicode, forcing the developer to slowly track down and change things before removing the support. Now they seem to be stuck defending their initial stupidity and holding firm to their "ideals of 'unicode or die' - kill all bytestrings use for text" even though it is killing the uptake of Python 3. Oh well, please post any bugs here so that I can try and get them fixed. I would like this to eventually become the future codebase of kindleunpack moving forward. Take care, KevinH Last edited by KevinH; 10-05-2014 at 11:45 AM. |
10-04-2014, 01:55 PM | #1002 |
creator of calibre
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
@KevinH: I see you've started discovering the joys of Python 3 Be glad you dont have to port any C extension modules. In Python 2 strings are internally always UTF-16 (except on linux) which is great because all external libraries (the windows API, ICU, etc.) all use UTF-16. As of python 3.3 a python string can be any of ascii, UCS2 or UCS4, depending on its contents. So now every time you call any external API function with a python string, you have to inspect and convert it. Joy, joy, joy.
And if you thought that dealing with binary file formats was bad, think about all the network facing code -- all network protocols are binary. I really dont know what the python 3 devs were smoking. Thank heavens python is open source and I can continue using python 2 for a long, long time. Hopefully, I can retire before it becomes necessary to port calibre from python 2. |
10-04-2014, 02:04 PM | #1003 |
Resident Curmudgeon
Posts: 73,974
Karma: 128903378
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Since KindleUnpack works with Python 2, why bother to make it also work with Python 3? Most people that use KindleUnpack also use other eBooks tools that are just for Python 2 and would not have a need for a version that works on Python 3.
|
10-04-2014, 02:41 PM | #1004 |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi,
One code base will now work for both, and if Sigil does package a python in the near future, it will most likely be python 3. So making KindleUnpack work on both python 2 and 3 maximizes its future usefulness to both calibre and sigil. Also, this will also provide an example for Sigil plugin developers who want their plugins to work on both Python 2 and Python 3 as well. And it hedges our work just in case python 2's serious bugs never get fixed. Effectively it future-proofs our code. KevinH |
10-04-2014, 03:53 PM | #1005 | |
Resident Curmudgeon
Posts: 73,974
Karma: 128903378
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
To be honest, it's best to Bundle Python 2 and forget Python 3 exists. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Can i rotate text and insert images in Mobi and EPUB? | JanGLi | Kindle Formats | 5 | 02-02-2013 04:16 PM |
PDF to Mobi with text and images | pocketsprocket | Kindle Formats | 7 | 05-21-2012 07:06 AM |
Mobi files - images | DWC | Introduce Yourself | 5 | 07-06-2011 01:43 AM |
pdf to mobi... creating images rather than text | Dumhed | Calibre | 5 | 11-06-2010 12:08 PM |
Transfer of images on text files | anirudh215 | 2 | 06-22-2009 09:28 AM |