Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Kindle Formats

Notices

Reply
 
Thread Tools Search this Thread
Old 09-14-2014, 04:02 AM   #991
pdurrant
The Grand Mouse 高貴的老鼠
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 71,504
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
I have updated the first post and the AppleScript.
pdurrant is offline   Reply With Quote
Old 09-15-2014, 05:12 PM   #992
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,548
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Out of curiosity, is the media-type "text/x-oeb1-document" found in a resource record within the MOBI when generating the content.opf file for a MOBI-only (non-KF8) kindlebook, or is it hardcoded in the KindleUnpack code? If the latter, is there a compelling reason for keeping it that way and not updating to an "application/xhtml+xml" media-type? I realize the markup file being produced isn't really xhtml, but "text/x-oeb1-document" is deprecated in the latest 2.x OPF package we appear to be building. Is kindlegen even still accepting these unpacked old-style mobi-markup files as input anymore?
DiapDealer is online now   Reply With Quote
Advert
Old 09-15-2014, 09:13 PM   #993
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi Doug,

In the mobi_opf.py in the part that builds the manifest for the opf, there is this media-map that determines things. The KF8 part unpacks to .xhtml file extensions while the older mobi part unpacks to .html so so gets that strange media-type.

Code:
media_map = {
                '.jpg'  : 'image/jpeg',
                '.jpeg' : 'image/jpeg',
                '.png'  : 'image/png',
                '.gif'  : 'image/gif',
                '.svg'  : 'image/svg+xml',
                '.xhtml': 'application/xhtml+xml',
                '.html' : 'text/x-oeb1-document', # for mobi7
                '.pdf'  : 'application/pdf', # for azw4(print replica textbook)
                '.ttf'  : 'application/x-font-ttf',
                '.otf'  : 'application/x-font-opentype', # replaced?
                #'.otf' : 'application/vnd.ms-opentype', # [OpenType] OpenType fonts
                #'.woff' : 'application/font-woff', # [WOFF] WOFF fonts
                #'.smil' : 'application/smil+xml', # [MediaOverlays301] EPUB Media Overlay documents
                #'.pls' : 'application/pls+xml', # [PLS] Text-to-Speech (TTS) Pronunciation lexicons
                '.otf'  : 'application/x-font-opentype', # replaced?
                #'.mp3'  : 'audio/mpeg',
                #'.mp4'  : 'audio/mp4',
                #'.js'   : 'text/javascript', # not supported in K8
                '.css'  : 'text/css'
                }

So it would be easy to change in KindleUnpack. That said, I passed a content.opf from an old mobi through kindlegen 2.9 and it generated a lot of warnings and built a KF8 part that would never pass any epub check.

So it looks like even Kindlegen is requiring a valid epub as input otherwise it generates junk for the KF8 part. I thought that unpacking an old mobi and then passing it back through kindlegen might be as easy way to convert from html 3 to true xhtml. No such luck.

I frankly think we should use the old mobiml2xhtml.py codebase (actually its newer cousin from your KindleImport) and try and create at least a basic, valid epub-like structure from the old mobi part. Kindlegen seems to be much more adept at taking valid epub xhtml and making old html 3 than doing the reverse.

If others agree, I would be happy to incorporate it into the next KindleUnpack release.

Take care,

Kevin
KevinH is offline   Reply With Quote
Old 09-16-2014, 07:23 AM   #994
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,548
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
I'd be OK with that. I do think we need to retain the ability to produce/examine the mobiml file, though: if only for testing and for seeing what mobiml code is actually being produced by certain xhtml/epub input.
DiapDealer is online now   Reply With Quote
Old 09-17-2014, 08:38 AM   #995
tkeo
Connoisseur
tkeo began at the beginning.
 
Posts: 94
Karma: 10
Join Date: Feb 2014
Location: Japan
Device: Kindle PaperWhite, Kobo Aura HD
Hi,

I have no reason to oppose to implement a new feature.
But I'd like to ask what is mobiml?

Thanks,
tkeo is offline   Reply With Quote
Advert
Old 09-17-2014, 08:54 AM   #996
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,548
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by tkeo View Post
Hi,

I have no reason to oppose to implement a new feature.
But I'd like to ask what is mobiml?

Thanks,
Sorry. It's just a shorcut to what I (and others) would call the mobi markup language. It's what's in the *.html file in the Mobi 7 folder. The (nearly) raw output of the mobi-only portion of a kindlebook (image references and the like are rebuilt). Very similar to HTML 3 with a few additions (and plenty of garbage).

There's currently some work going on in another project to upgrade a semi-retired mobiml2html script to take that mobi markup and spit out something as close to xhtml as possible (while maintaining the formatting of the original book). It's made more difficult by the sheer amount of junk that can sometimes be found in that mobi markup (inline elements that cross block-level element boundaries, improperly nested and/or mismatched tags, as well as opf and ncx markup in the headers and bodies). Not to mention tags that are invalid/deprecated in xhtml.

Last edited by DiapDealer; 09-17-2014 at 09:08 AM.
DiapDealer is online now   Reply With Quote
Old 10-03-2014, 05:02 PM   #997
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi All,

I have an experimental version of KindleUnpack that will run on both Python 2.7 and Python 3.4 at the same time. The conversion took much longer than I expected because of massively ass-backwards decisions by the developers of Python 3. If interested please see the following issue:

http://bugs.python.org/issue22549

The problem happens even if you use an iterator to access the bytes in a bytes string. The only way to get the actual characters in a bytestring is to use a slice.

This hit KindleUnpack horribly in the mobi_uncompress.py, mobi_dict.py, mobi_header.py, and mobi_index.py and the problem was hard to detect and not immediately obvious at all.

I think I now understand why Kovid is so reluctant to move to Python 3 or even Python 2 / Python 3 joint compatibility. It literally took me two entire days and evenings to make the initial conversion. Calibre is much much much larger, and has to manipulate bytestrings in places for binary format files even more then KindleUnpack does. Converting Calibre would take a herculean effort!

If there are any testers with both python 3.4 and python 2.7 installed who would like to play around with this experimental KindleUnpack via the command-line, please let me know and I will post it.

Otherwise, once I get the bugs ironed out, I will make an official release and we will attempt to keep future versions of KindleUnpack able to run on both platforms.

KevinH

Last edited by KevinH; 10-03-2014 at 05:31 PM.
KevinH is offline   Reply With Quote
Old 10-03-2014, 08:49 PM   #998
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,957
Karma: 128903250
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
WOW! How could the developers of Python 3 actually defend such a crappy way of doing things?
JSWolf is offline   Reply With Quote
Old 10-03-2014, 09:01 PM   #999
tkeo
Connoisseur
tkeo began at the beginning.
 
Posts: 94
Karma: 10
Join Date: Feb 2014
Location: Japan
Device: Kindle PaperWhite, Kobo Aura HD
Hi Kevin,

Thank you for your hard wrok.

I have thought moving to python 3 is necessary; but hegitated because it would be hard to unpack bytes and to handle utf-8.


Quote:
Originally Posted by KevinH View Post
If there are any testers with both python 3.4 and python 2.7 installed who would like to play around with this experimental KindleUnpack via the command-line, please let me know and I will post it.
I'm willing to test it.
I have active python 3.3.4.1 (now uninstalled however). Is it work with 3.3?

Thanks,

Last edited by tkeo; 10-03-2014 at 09:03 PM. Reason: fixed typo.
tkeo is offline   Reply With Quote
Old 10-04-2014, 09:46 AM   #1000
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi tkeo,
Thanks. I still have a few more tests to run to exercise epub3 code and dictionary code then I will post it tonight for you.

Take care,

Kevin
KevinH is offline   Reply With Quote
Old 10-04-2014, 12:12 PM   #1001
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
experimental command-line version of kindleunpack for both Python 2 and Python 3

Hi tkeo,

Attached is my experimental conversion of kindleunpack (command line only - no gui yet) to run on both Python 2.7 and Python 3.4. It should run on Python 3.3 as well but I only have Python 3.4.1 on my machine to test it with.

Please note:

1. this has only been tested by unpacking one large kindlegen generated mobi and a diff -urN showed that nothing was amiss when compared to standard Kindleunpack.

2. It has not been tested with a Japanese ebook nor a fixed layout or anything complex so there still will be lots of bugs to iron out. No font encryption/decryption has been tested nor have any dictionaries. I expect there still to be many bugs in those sections of code.

3. Because even one small piece of bytestring vs unicode can mess things up in python 3, care must be taken when making any changes ...

Right now, and after much trial and error -

I keep the actual html files and processing of it (building RawML in mobi_header.py, mobi_k8proc.py and mobi_html.py as working only in bytestrings since byte offsets are needed for link targets and for inserting fragments. Any conversion to unicode would throw off all of the byte offsets horribly and must be avoided at all costs until all position / byte offsets have been processed.

The same holds true for processing the binary index data.

I convert to and use unicode for mobi_ncx.py, mobi_nav.py, mobi_opf.py, and for all metadata (in mobi_header.py).

In mobi_k8resc.py and mobi_pagemap.py I start processing with bytestrings until the resc data is extracted and then convert to operating in full unicode.

Unfortunately, kindleunpack.py, and mobi_utils is a mix of bytestring and full unicode since it has to deal with all of this nonsense coming from different directions.

4. In other words ... this port is very temperamental and very fragile. More work will need to be done to stabilize it and revisit how soon we can convert to unicode in the html processing.


Manipulating bytes in Python3 is limited and fraught with inconsistencies ...

- no use of % to fold ascii or utf-8 strings into binary data (there is a pep on this)

- struct.unpack will only work with bytestring formats in python all the way up to and including python 2.7.5 and possibly later

- issues with iterating bytes and extracting single bytes from bytestrings, and there is a pep on this as well (pep 467) but nothing definite yet

- issues with "re" requiring byte patters to work on bytstrings and visa-versa

- lots of inconsistencies with many other things I have tried to take care of with the compatibility_utils.py code I have collected from all over the net


It is clear that the official python programmers have never had to work close to the metal, nor worked with packed binary data, otherwise they would never have given bytestrings such a second-rate, inferior implementation. In fact, it was not until recent Python 3.3 and 3.4 releases I would have ever even tried to use Python 3 as, bytes support was just too horribly broken in python 3.0, 3.1, and 3.2 to contemplate.

Instead of breaking backwards compatibility going from 2 to 3, all they had to do was start aggressively deprecating auto-conversion of bytestrings to unicode, forcing the developer to slowly track down and change things before removing the support.

Now they seem to be stuck defending their initial stupidity and holding firm to their "ideals of 'unicode or die' - kill all bytestrings use for text" even though it is killing the uptake of Python 3.

Oh well, please post any bugs here so that I can try and get them fixed. I would like this to eventually become the future codebase of kindleunpack moving forward.

Take care,

KevinH

Last edited by KevinH; 10-05-2014 at 11:45 AM.
KevinH is offline   Reply With Quote
Old 10-04-2014, 01:55 PM   #1002
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,851
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
@KevinH: I see you've started discovering the joys of Python 3 Be glad you dont have to port any C extension modules. In Python 2 strings are internally always UTF-16 (except on linux) which is great because all external libraries (the windows API, ICU, etc.) all use UTF-16. As of python 3.3 a python string can be any of ascii, UCS2 or UCS4, depending on its contents. So now every time you call any external API function with a python string, you have to inspect and convert it. Joy, joy, joy.

And if you thought that dealing with binary file formats was bad, think about all the network facing code -- all network protocols are binary. I really dont know what the python 3 devs were smoking. Thank heavens python is open source and I can continue using python 2 for a long, long time. Hopefully, I can retire before it becomes necessary to port calibre from python 2.
kovidgoyal is online now   Reply With Quote
Old 10-04-2014, 02:04 PM   #1003
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,957
Karma: 128903250
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Since KindleUnpack works with Python 2, why bother to make it also work with Python 3? Most people that use KindleUnpack also use other eBooks tools that are just for Python 2 and would not have a need for a version that works on Python 3.
JSWolf is offline   Reply With Quote
Old 10-04-2014, 02:41 PM   #1004
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi,

One code base will now work for both, and if Sigil does package a python in the near future, it will most likely be python 3. So making KindleUnpack work on both python 2 and 3 maximizes its future usefulness to both calibre and sigil.

Also, this will also provide an example for Sigil plugin developers who want their plugins to work on both Python 2 and Python 3 as well. And it hedges our work just in case python 2's serious bugs never get fixed. Effectively it future-proofs our code.

KevinH


Quote:
Originally Posted by JSWolf View Post
Since KindleUnpack works with Python 2, why bother to make it also work with Python 3? Most people that use KindleUnpack also use other eBooks tools that are just for Python 2 and would not have a need for a version that works on Python 3.
KevinH is offline   Reply With Quote
Old 10-04-2014, 03:53 PM   #1005
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,957
Karma: 128903250
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by KevinH View Post
Hi,

One code base will now work for both, and if Sigil does package a python in the near future, it will most likely be python 3. So making KindleUnpack work on both python 2 and 3 maximizes its future usefulness to both calibre and sigil.

Also, this will also provide an example for Sigil plugin developers who want their plugins to work on both Python 2 and Python 3 as well. And it hedges our work just in case python 2's serious bugs never get fixed. Effectively it future-proofs our code.

KevinH
I would think it would be wiser for Sigil to bundle Python 2 since there is a lot more code out there in Python 2 then Python 3. Many people dislike Python 3 and are sticking to Python 2. Plus, porting over Python 2 code is easier then porting over Python 2 code to run on Python 3. Add to that the fact that people who program in Python 2 would then have a learning curve moving to Python 3.

To be honest, it's best to Bundle Python 2 and forget Python 3 exists.
JSWolf is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Can i rotate text and insert images in Mobi and EPUB? JanGLi Kindle Formats 5 02-02-2013 04:16 PM
PDF to Mobi with text and images pocketsprocket Kindle Formats 7 05-21-2012 07:06 AM
Mobi files - images DWC Introduce Yourself 5 07-06-2011 01:43 AM
pdf to mobi... creating images rather than text Dumhed Calibre 5 11-06-2010 12:08 PM
Transfer of images on text files anirudh215 PDF 2 06-22-2009 09:28 AM


All times are GMT -4. The time now is 10:13 PM.


MobileRead.com is a privately owned, operated and funded community.