KindleUnpack (MobiUnpack): Extracts text, images and metadata from Kindle/Mobi files - Page 76

elmimmo · 05-19-2015, 01:42 AM

Quote:

Originally Posted by KevinH

Please post links to a completely valid fixed layout KF8 made via kindlegen (no drm) and the valid fixed layout epub3 that was used as input to kindlegen and I will work out how to properly unpack it and make the needed changes to KindleUnpack.

KevinH, thanks for the interest. I got pumped so here's the KF8 you request, with the source embedded so KindleUnpack will extract it along with KF8's resources, and a buck load of requests (including the one already mentioned) when outputting FXL books to EPUB 3 (which, again, I feel should be the default).

Taking the source inside EPUB 3 FXL with linear cover.mobi as source:

XHTML documents in my source have the size of their canvas declared in their head block, as required by the EPUB 3 spec, with:
Code:
```
<meta name="viewport" content="width=794, height=1122"/>
```
with same values as those written by KindleUnpack in the output OPF's metadata:
Code:
```
<meta name="original-resolution" content="794x1122"/>
```
The latter is what Kindle requires but does nothing in EPUB 3 ereaders. The former is EPUB 3's way but does nothing in Kindle. Both are needed for the EPUB 3 to render properly in EPUB 3 ereaders and be safely convertible (back) to mobi.

I assume that the internal XHTML of most mobi files will lack EPUB 3's viewport's size declaration since Kindle makes no use of it (so no reason for the original author to have added it). KindleUnpack should add it if missing when outputting EPUB 3 FXL so that ereaders display them properly.
KindleUnpack is being unnecessarily redundant by adding to EPUB 3 FXL's metadata both <meta name="fixed-layout" content="true" /> and <meta property="rendition:layout">pre-paginated</meta>. Both accomplish the same thing in Kindle, but only the latter does so too for EPUB 3 ereaders. KindleUnpack should therefore only add the latter.

The same thing goes for <meta name="orientation-lock" content="portrait" /> and <meta property="rendition:orientation">portrait</meta>, only the latter being EPUB 3 proper syntax.
My source OPF has its ISBN specified in EPUB 3 syntax:
Code:
```
<dc:identifier id="uid">urn:isbn:9781234567890</dc:identifier>
<meta refines="#uid" property="identifier-type" scheme="onix:codelist5">15</meta>
```
but KindleUnpack's output EPUB 3 has it like:
Code:
```
<dc:identifier opf:scheme="ISBN">9781234567890</dc:identifier>
```
which is not valid EPUB 3. No opf: prefix is, so the same goes for
Code:
```
<dc:date opf:event="publication">2011</dc:date>
```
KindleUnpack should just drop that attribute when outputting EPUB 3. Note that adding it, is, however, valid and actually the right way to do it when outputting EPUB 2.
My source NCX (document which is not part of the EPUB 3 spec, but some consider good practice to add for backwards compatibility) is simpler than KindleUnpack's output. The extra stuff that KindleUnpack adds is cruft (irrespective of whether it is exporting to EPUB 2 or 3). Some parts of it:
Code:
```
<meta content="1" name="dtb:depth"/>
<meta content="mobiunpack.py" name="dtb:generator"/>
<meta content="0" name="dtb:totalPageCount"/>
<meta content="0" name="dtb:maxPageNumber"/>
```
are innocuous but still pointless since no EPUB ereader (neither Kindle) needs nor supports them in any way; some other parts, particularly, the DOCTYPE and each playOrder attribute, are, actually, harmful IMHO as they complicate post-editing by hand.

The EPUB 2 spec does not require any of that. Neither does EPUB 3's since NCX is not even a part of it. More details if you are interested, with further elaboration in the comments after the code snippets in that link.

Besides, if KindleUnpack is adding the NCX to EPUB 3 FXL for backwards compatibility purposes, then it should also generate the file com.apple.ibooks.display-options.xml that my source contains, as that file was Apple's method of tagging an EPUB 2 as a FXL book before EPUB officially embraced FXL in EPUB 3.0.1 (which uses another method to do so, but including that legacy file does not make the EPUB 3 file non-valid).

Still, I, for one, do not see value in generating EPUB 2 FXL backwards compatibility cruft in EPUB 3 FXL(the NCX, the spine's toc attribute, the OPF's guide and the file com.apple.ibooks.display-options.xml). FXL was originally a non-standard Apple extension to EPUB 2 which Apple itself now considers obsolete in favor of proper EPUB 3 FXL, and all major platforms that once supported Apple's EPUB 2 FXL now support EPUB 3 FXL.

The source inside EPUB 3 FXL without EPUB 2 markup.mobi has all that taken out so that you can compare and see what the redundant data exactly is.
The language of the source ebook is Spanish, as specified in the OPF's metadata:
Code:
```
<dc:language>es</dc:language>
```
yet KindleUnpack's output's is:
Code:
```
<dc:language>en</dc:language>
```
The wrong language is carried on to some other output XHTML documents, like the EPUB 3 Navigation Document, which KindleUnpack outputs with the name nav.xhtml and whose html element has lang and xml:lang attributes with the same wrong value, or the HTML cover cover_page.xhtml that KindleUnpack creates if the source did not have one. This one has only the xml:lang attribute, again with the wrong value.

In nav.xhtml, the namespace URI for the epub namespace is wrong. KindleUnpack's current output is:

Code:

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2011/epub" …>

while it should be:

Code:

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" …>

The source nav.xhtml has a meta tag declaring the text encoding:
Code:
```
<meta charset="UTF-8"/>
```
but KindleUnpack's output nav.xhtml does not. That might produce incorrect rendering of non-ASCII characters when accessing this book's TOC. The same goes for cover_page.xhtml. While the latter contains no text, it's still good practice to declare the encoding if anything to account for potential post-editing.
In spite of what the Kindle Publishing Guidelines (4.1, 5.1, 5.6) document claim, this does nothing whatsoever ever:
Code:
```
<meta name="RegionMagnification" content="true" />
```
AFAIK never has, and is just bloat. Kindlegen and Kindle Previewer will instead parse all HTML documents upon conversion in search for explicit region magnification markup, and will label the ebook accordingly irrespective of what that metadata value says.

The HTML doc part0000.xhtml in my source contains region magnification markup, but the OPF lacks the metadata above altogether. Kindlegen and Kindle Previewer still flags the book as having Region Magnification correctly, or not, which you can verify by:
1. Converting with Kindlegen or Kindle Previewer as is.
2. Adding that metadata and set its value to false (even though the book does have markup for it) and converting.
3. Setting its value to true and delete the region magnification markup in part0000.xhtml, and converting.
You will see in Kindlegen's log that in all cases the presence and value of that metadata does nothing.
I am not aware that the OPF metadata <meta name="output encoding" content="utf-8" /> in KindleUnpack's output accomplishes anything whatsoever, resulting in mere bloat, but it might be that I am not well informed.
If the source has no spine item with the attributes properties="page-spread-right" or properties="page-spread-left", it might be considered appropriate that KindleUnpack adds <meta property="rendition:spread">none</meta> to the output OPF metadata of an EPUB 3 FXL, so that EPUB 3 ereaders like iBooks do not group pages in 2-page spreads just like Kindle doesn't.
My source OPF has <meta property="ibooks:binding">false</meta> in its metadata, which disables the fake spine shadow in iBooks and is innocuous in Kindle. If present, though, it is also required that the prefix attribute in the package declares the prefix ibooks: …. Kindle does not produce that fake shadow, so all the more reason to have KindleUnpack add that metadata to EPUB 3 FXL. The reasons for doing so are admittedly subjective, and the syntax is admittedly iBooks-only, i.e. non-standard, but it still does not make the EPUB 3 invalid and is not incompatible with Kindle.
I would add the explicit attribute properties="rendition:page-spread-center" to the cover in the spine. It's not like not having it is incorrect, though. Besides, the value of adding it is subjective and, at any rate, even though part of the EPUB 3 spec, no major ereader app cares about the property yet.

Taking the source inside EPUB 3 FXL with no HTML cover.mobi as source:

When the source has no HTML document identified as cover in the OPF guide or nav.xhtml's landmarks, KindleUnpack will create one with the name cover_page.xhtml and give it the attribute linear="no" in the spine.

IMHO linear=no in the cover is a misuse of the attribute's purpose, and I am against it despite this practice being admittedly widespread. An ereader honoring it would make the cover, as important as it is to a book, unreachable, effectively making it bloat. That attribute's purpose is not for making content unreachable but non-linear (you have to have some way to access it). It might be because of this widespread bad practice that iBooks, Google Play Books and Kobo ignore the linear=no attribute in FXL books (ADE 4 does not, though). So while KindleUnpack adding it actually has little to no effect, I am more in favor of doing things right instead of having ereaders trying to outsmart improper markup.

Edit (2015-05-19T06:00:30Z): The source of EPUB 3 FXL with a linear HTML cover.mobi had some weird forgotten test CSS in cover_page.xhtml. It is irrelevant, but I still corrected and updated the attachment just in case.

elmimmo · 05-19-2015, 02:14 AM

Quote:

Originally Posted by elmimmo

Taking the source inside EPUB 3 FXL with no HTML cover.mobi as source:

When the source has no HTML document identified as cover in the OPF guide or nav.xhtml's landmarks, KindleUnpack will create one with the name cover_page.xhtml

Forgot this one:

In the case above, the autogenerated cover_page.xhtml uses SVG markup, so its entry in the OPF manifest should have the attribute properties="svg" for it to be a valid EPUB 3.

KevinH · 05-19-2015, 10:24 AM

Hi,

Thanks for the detailed bug report.

Quote:

Originally Posted by elmimmo

[LIST][*]XHTML documents in my source have the size of their canvas declared in their head block, as required by the EPUB 3 spec, with:

Code:

<meta name="viewport" content="width=794, height=1122"/>

with same values as those written by KindleUnpack in the output OPF's metadata:

Code:

<meta name="original-resolution" content="794x1122"/>

The latter is what Kindle requires but does nothing in EPUB 3 ereaders. The former is EPUB 3's way but does nothing in Kindle. Both are needed for the EPUB 3 to render properly in EPUB 3 ereaders and be safely convertible (back) to mobi.

I assume that the internal XHTML of most mobi files will lack EPUB 3's viewport's size declaration since Kindle makes no use of it (so no reason for the original author to have added it).

Understood. This was your bug report, correct? I wonder why kindlegen removes the meta values in the xhtml page head tag if they do no damage. If kindlegen does not remove them, then how we handle things is correct. Please understand, there is no guarantee that if an invalid epub3 is input into kindlegen, that you will unpack to a valid one. In fact in most cases, you will unpack to an invalid epub that will need to be fixed. If those meta viewport tags are actually removed by kindlegen, I will figure out a way to add them back, but if kindlegen leaves them untouched, the code is correct as it stands.

Quote:

KindleUnpack should add it if missing when outputting EPUB 3 FXL so that ereaders display them properly.

Please understand KindleUnpack just unpacks what is present and is recapturable from the AZW3, it does not guarantee the output is valid if the input is not valid. It is not going to try and fix things that were errors in the input. It is not a conversion program in and of itself.

Quote:

[*]KindleUnpack is being unnecessarily redundant by adding to EPUB 3 FXL's metadata both <meta name="fixed-layout" content="true" /> and <meta property="rendition:layout">pre-paginated</meta>. Both accomplish the same thing in Kindle, but only the latter does so too for EPUB 3 ereaders. KindleUnpack should therefore only add the latter.

The same thing goes for <meta name="orientation-lock" content="portrait" /> and <meta property="rendition

rientation">portrait</meta>, only the latter being EPUB 3 proper syntax.

Actually according to the epub3 spec, old style metadata is allowed and should be ignored by an epub3 device. So these will stay as they do not hurt things and help[ to document exactly what was present in the source.

Quote:

[*]My source OPF has its ISBN specified in EPUB 3 syntax:

Code:

<dc:identifier id="uid">urn:isbn:9781234567890</dc:identifier>
<meta refines="#uid" property="identifier-type" scheme="onix:codelist5">15</meta>

but KindleUnpack's output EPUB 3 has it like:

Code:

<dc:identifier opf:scheme="ISBN">9781234567890</dc:identifier>

I can't recapture the refines in all cases as they are stripped away in the conversion process, but I can try to correct it so that the opf: prefix is not used in an epub3 in dc tags.

Quote:

Code:

<dc:date opf:event="publication">2011</dc:date>

KindleUnpack should just drop that attribute when outputting EPUB 3. Note that adding it, is, however, valid and actually the right way to do it when outputting EPUB 2.

Will handle as above.

Quote:

[*]My source NCX (document which is not part of the EPUB 3 spec, but some consider good practice to add for backwards compatibility) is simpler than KindleUnpack's output. The extra stuff that KindleUnpack adds is cruft (irrespective of whether it is exporting to EPUB 2 or 3). Some parts of it:

Code:

<meta content="1" name="dtb:depth"/>
<meta content="mobiunpack.py" name="dtb:generator"/>
<meta content="0" name="dtb:totalPageCount"/>
<meta content="0" name="dtb:maxPageNumber"/>

are innocuous but still pointless since no EPUB ereader (neither Kindle) needs nor supports them in any way; some other parts, particularly, the DOCTYPE and each playOrder attribute, are, actually, harmful IMHO as they complicate post-editing by hand.

The DOCTYPE on the ncx is correct as stands and epubcheck 4 has fixed this bug in epub check 3. I will look at the DAISY spec to see about the extra meta data. If needed for the spec it stays, otherwaise I will remove it.

Quote:

Besides, if KindleUnpack is adding the NCX to EPUB 3 FXL for backwards compatibility purposes, then it should also generate the file com.apple.ibooks.display-options.xml that my source contains, as that file was Apple's method of tagging an EPUB 2 as a FXL book before EPUB officially embraced FXL in EPUB 3.0.1 (which uses another method to do so, but including that legacy file does not make the EPUB 3 file non-valid).

Sorry nothing ibooks specific will be dded/supported in any way. None of it is spec.

Quote:

Still, I, for one, do not see value in generating EPUB 2 FXL backwards compatibility cruft in EPUB 3 FXL(the NCX, the spine's toc attribute, the OPF's guide and the file com.apple.ibooks.display-options.xml). FXL was originally a non-standard Apple extension to EPUB 2 which Apple itself now considers obsolete in favor of proper EPUB 3 FXL, and all major platforms that once supported Apple's EPUB 2 FXL now support EPUB 3 FXL.

Again, the guide is allowed via the epub3 spec as it all of the old style metadata. And no ibooks anything will be supported as none of it is spec.

In the future, when I have more time, I may add an option for keeping or removing all of that but right now I am more worried about correctness, not what any one person considers "cruft". Sorry about that. The real purpose of the KindleUnpack tool was to help reverse engineer current and future mobi changes. That is its primary role. It is not a "converter" per se.

Quote:

[*]The language of the source ebook is Spanish, as specified in the OPF's metadata:

Code:

<dc:language>es</dc:language>

yet KindleUnpack's output's is:

Code:

<dc:language>en</dc:language>

That is definitely a bug. It should be exactly what the Mobi language code says as generated by kindlegen. That is what we (should be) outputing and only default to "en" if not language code is found. I will look into this.

Quote:

[*]In nav.xhtml, the namespace URI for the epub namespace is wrong. KindleUnpack's current output is:

Code:

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2011/epub" …>

while it should be:

Code:

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" …>

I will verify it against the spec and change it if needed.

Quote:

[*]The source nav.xhtml has a meta tag declaring the text encoding:

Code:

<meta charset="UTF-8"/>

but KindleUnpack's output nav.xhtml does not. That might produce incorrect rendering of non-ASCII characters when accessing this book's TOC. The same goes for cover_page.xhtml. While the latter contains no text, it's still good practice to declare the encoding if anything to account for potential post-editing.

Again, we only are going to reproduce what we can from the actual metadata provided by the azw3. So if the nav had it correct on input to kindlegen, since kindlegen always specifies the charset in the header, this should be addable/fixable.

Quote:

[*]In spite of what the Kindle Publishing Guidelines (4.1, 5.1, 5.6) document claim, this does nothing whatsoever ever:

Code:

<meta name="RegionMagnification" content="true" />

AFAIK never has, and is just bloat. Kindlegen and Kindle Previewer will instead parse all HTML documents upon conversion in search for explicit region magnification markup, and will label the ebook accordingly irrespective of what that metadata value says.

Again sorry but, if it exists in the metadata in the azw3, it will be output. That is the whole point of KindleUnpack. You may consider it "cruft", I consider it documenting the metadata that is provided by the azw3 to the extent it can.

Please understand, KindleUnpack is not an epub2 or 3 converter. It is an unpacker that takes the compiled format of the azw3 and tries to create an epub-like structure to document what it finds and for people to later edit and fix.

Quote:

[*]I am not aware that the OPF metadata <meta name="output encoding" content="utf-8" /> in KindleUnpack's output accomplishes anything whatsoever, resulting in mere bloat, but it might be that I am not well informed.

It is documenting the charset provided in the azw3 header.

Quote:

[*]If the source has no spine item with the attributes properties="page-spread-right" or properties="page-spread-left", it might be considered appropriate that KindleUnpack adds <meta property="rendition:spread">none</meta> to the output OPF metadata of an EPUB 3 FXL, so that EPUB 3 ereaders like iBooks do not group pages in 2-page spreads just like Kindle doesn't.

None is the default is it not?

I'll pick back up commenting on the remainder when I get more time.

Take care,

KevinH

elmimmo · 05-19-2015, 12:50 PM

Let me first add that I just mentioned what I perceive as issues for obtaining a valid EPUB 3 out of a mobi. I understand not everything asked might align with the purpose of KindleUnpack, and I am fine with you tackling just some or none of the requests based on that. I appreciate that you consider them.

Quote:

Originally Posted by KevinH

I wonder why kindlegen removes the [viewport] meta values in the xhtml page head tag if they do no damage.

It doesn't. If KindleUnpack's output does not have them the source did not either. KindleUnpack's output of my mobi has them, as my source has them too. I just mentioned that they are necessary for the output EPUB 3 FXL to display correctly in ereaders, so asked that KindleUnpack added them when the source lacked them. By your answer, I understand that going that extra mile is not the purpose of KindleUnpack.

Quote:

Originally Posted by KevinH

Actually according to the epub3 spec, old style metadata is allowed and should be ignored by an epub3 device. So these [redundant metadata] will stay as they do not hurt things and help to document exactly what was present in the source.

But the Kindle-only metadata was actually not present in my source:

Code:

<meta name="fixed-layout" content="true" />
<meta name="orientation-lock" content="portrait" />

KindleUnpack is generating the above, even though the corresponding EPUB 3 metadata that my source did have:

Code:

<meta property="rendition:layout">pre-paginated</meta>
<meta property="rendition:orientation">portrait</meta>

produces identical effect in both Kindlegen and EPUB 3 ereaders.

Quote:

Originally Posted by KevinH

epubcheck 4 has fixed this bug in epubcheck 3

What bug was that?

Quote:

Originally Posted by KevinH

I will look at the DAISY spec to see about the extra meta data. If needed for the spec it stays, otherwaise I will remove it.

Please have a look first at the EPUB 2 spec, which states that while "this specification uses the NCX defined in the DAISY/NISO Standard […] some optional elements and metadata items are not needed to implement the NCX for this specification", and follows in to note that it is valid in EPUB 2 to not include the NCX DOCTYPE, in which case the playOrder attribute, mandatory in the NCX spec, becomes optional in EPUB 2. The sample NCX in the very EPUB 2 OPF spec follows this practice, and lacks the superfluous metadata (that my source lacks but KindleUnpack is outputting) too.

The point of not outputting the playOrder attribute (which my source lacks too), which in turn requires not outputting the DOCTYPE as per the EPUB 2 spec, is to make post-editing of that file easier by avoiding its nasty requirement of it having to have consecutive values.

Quote:

Originally Posted by KevinH

Again sorry but, if [<meta name="RegionMagnification" content="true" />] exists in the metadata in the azw3, it will be output. That is the whole point of KindleUnpack. You may consider it "cruft", I consider it documenting the metadata that is provided by the azw3 to the extent it can.

Such metadata does not exist in my source. Kindlegen is adding it upon detecting that my source contains region magnification HTML markup. I'm fine if you still want to output that metadata, but please do consider that it might give the false impression to those using KindleUnpack to learn how to author KF8 files that Kindlegen uses that metadata in order to produce such KF8 and/or that it existed in the source.

Again thanks for taking the time to consider my ideas.

KevinH · 05-19-2015, 01:19 PM

Further response:

[QUOTE][*]My source OPF has <meta property="ibooks:binding">false</meta> in its metadata, which disables the fake spine shadow in iBooks and is innocuous in Kindle. If present, though, it is also required that the prefix attribute in the package declares the prefix ibooks: …. Kindle does not produce that fake shadow, so all the more reason to have KindleUnpack add that metadata to EPUB 3 FXL. The reasons for doing so are admittedly subjective, and the syntax is admittedly iBooks-only, i.e. non-standard, but it still does not make the EPUB 3 invalid and is not incompatible with Kindle.
[QUOTE]

Again, sorry but nothing ibooks specific that doesn't appear in the AZW3 raw source or metadata will be created during the unpack phase.

Quote:

[*]I would add the explicit attribute properties="rendition

age-spread-center" to the cover in the spine. It's not like not having it is incorrect, though. Besides, the value of adding it is subjective and, at any rate, even though part of the EPUB 3 spec, no major ereader app cares about the property yet.

Again, only if there is either info in the azw3 header, or RESC section, or metadata will anything be added to the unpacked epub structure.

Quote:

[*]When the source has no HTML document identified as cover in the OPF guide or nav.xhtml's landmarks, KindleUnpack will create one with the name cover_page.xhtml and give it the attribute linear="no" in the spine.

linear="no" is the default so this is not needed but linear="yes" may not be correct as many ebooks are not meant to open at the cover or force its inclusion.

That said, KindleUnpack should be adding the "svg" as a manifest property to correctly identify the use of svg in the cover image for an epub3. If it doesn't do that properly, it is a bug and something I will look into fixing.

All in all I see 8 potential issues here:

1) if viewports meta exists in azw3, force conversion as epub3

2) add viewport metadata to each xhtml when viewport is found in azw3 metadata (if and only if kindlegen removes it when present in the input source)

3) properly setting the language metadata as specified in the azw3 header

4) adding metadata charset for xhtml docs (including nav) that matches the charset as specified in the azw3 header for epub3

5) adding the proper svg manifest property under epub3 for created cover images with svg

6) check / verify the proper prefixes for use with epub3 (ie. for epub

7) make sure opf: prefixes are removed from all dc metadata tags under epub3

8) make sure ncx meets valid daisy spec if present.

KevinH

elmimmo · 05-19-2015, 01:37 PM

Forgot to add (just for context; again I am fine if you think this is not in line with KindleUnpack's purpose) that while omitting the NCX DOCTYPE is optional in EPUB2, doing so is actually required in EPUB 3. Again, a DOCTYPE that my source lacks but KindleUnpack is adding when outputting to EPUB 3.

KevinH · 05-19-2015, 02:14 PM

HGi,

Actually no. epubcheck 3.X has a bug. An ncx under epub3 is allowed to have a DOCTYPE as it is required for fallback.

This bug is fixed in epubcheck 4. So an proper daisy ncx with full doctype is allowed in epub3. Check the official epucheck 3.1 bug reports for details.

KevinH

Quote:

Originally Posted by elmimmo

Forgot to add (just for context; again I am fine if you think this is not in line with KindleUnpack's purpose) that while omitting the NCX DOCTYPE is optional in EPUB2, doing so is actually required in EPUB 3. Again, a DOCTYPE that my source lacks but KindleUnpack is adding when outputting to EPUB 3.

elmimmo · 05-19-2015, 03:55 PM

Quote:

Originally Posted by KevinH

epubcheck 3.X has a bug. An ncx under epub3 is allowed to have a DOCTYPE as it is required for fallback.

This bug is fixed in epubcheck 4. So an proper daisy ncx with full doctype is allowed in epub3. Check the official epucheck 3.1 bug reports for details

Well, what I linked to is precisely an issue in the official epubcheck issue tracker that references the part of the EPUB 3 spec where such limitation is detailed, and where Matt Garrish, one of the editors of the EPUB 3 spec, weights in confirming that EPUB 3 must omit the NCX DOCTYPE, with nobody counter arguing, barely a month ago. That issue is still open but the only other one that I could find on the subject is this other issue, already closed, where the same prohibition is stated. What specific bug report do you refer to?

None of the EPUB 3 samples by the IDPF that have an NCX have its DOCTYPE in it, and there is no backwards compatibility issue as no ereader depends on its presence or its absence.

KevinH · 05-19-2015, 05:39 PM

Hi,
I was referring to this link:

https://github.com/IDPF/epubcheck/issues/305

This issue is under review and of the last release of epubcheck 4 alpha having a DOCTYPE on a fallback ncx was considered acceptable. It seems someone disagrees with that on Apr 7, but I see no commit message or review to change epubcheck 4.

So this one may get reversed yet again. In epubcheck 3 it was okay, in epubcheck 3.1 it was an error, and in epubcheck 4 alpha it was no longer an error. So I guess this is a wait and see.

Either way KindleUnpack will keep the DOCTYPE on the ncx until epubcheck 4 final comes out and declares it one way or the other.

elmimmo · 05-20-2015, 11:02 AM

With KindleUnpack v0.75, I could run KindleUnpack.py from the command line from any path (I used to create a symlink of kindleunpack.py and place it in my path).

With v0.80, I need to first have cd'd into KindleUnpack's lib folder and run it from there, like ./kindleunpack.py. Trying to run it from another path by calling kindleunpack.py with its full path (relative or absolute makes no difference) or by calling a symlink to kindleunpack.py that resides in my path, I get:

Code:

Traceback (most recent call last):
  File "/Users/XXXX/bin/KindleUnpack/lib/kindleunpack.py", line 13, in <module>
    from .compatibility_utils import PY2, binary_type, utf8_str, unicode_str
ImportError: No module named compatibility_utils

Is this not the intended way to run it?

The python version in my path is 2.7.9, installed with Homebrew on Mac OS 10.10.3.

eschwartz · 05-20-2015, 11:08 AM

I was playing with a setuptools-powered installer that might help on general principle...

When I get back to my computer I can post the patch.

KevinH · 05-20-2015, 02:39 PM

Hi,

Here is a status update for those interested (using the very latest KindleUnpack v080.

Quote:

[1) if viewports meta exists in azw3, force conversion as epub3

Actually the "fixed-layout" metadata is already detected and will set autodetection to generate an epub3. Simply use the switch --epub_version=A

Quote:

2) add viewport metadata to each xhtml when viewport is found in azw3 metadata (if and only if kindlegen removes it when present in the input source)

As previously stated, kindlegen does not remove viewport meta in xhtml files so nothing needs to be done here for valid epub3 input -> valid epub3 output

Quote:

3) properly setting the language metadata as specified in the azw3

I just fixed this bug in mobi_header.py and mobi_utils.py and will push these to master for the next release.

Quote:

7) make sure opf: prefixes are removed from all dc metadata tags under epub3

A fix for this for epub3 has been made and will be pushed to master

I will work on the remaining issues and let you know when I have something workable.

Take care,

KevinH

eschwartz · 05-20-2015, 03:36 PM

Patch to add a setup.py

Because I prefer using entry points in /usr/bin/

KevinH · 05-21-2015, 09:43 AM

Hi,

Here is a status update for those interested (using the very latest KindleUnpack v080).

1) if viewports meta exists in azw3, force conversion as epub3

. Actually the "fixed-layout" metadata is already detected and will set autodetection to generate an epub3. Simply use the switch --epub_version=A

2) add viewport metadata to each xhtml when viewport is found in azw3 metadata (if and only if kindlegen removes it when present in the input source)

. As previously stated, kindlegen does not remove viewport meta in xhtml files so nothing needs to be done here for valid epub3 input -> valid epub3 output

3) properly setting the language metadata as specified in the azw3

. The fix for this was just committed to KindleUnpack master

4) adding metadata charset for xhtml docs (including nav) that matches the charset as specified in the azw3 header for epub3

. The fix for adding meta charset="UTF-8" to created nav under epub3 has been committed to KindleUnpack master

5) adding the proper svg manifest property under epub3 for created cover images with svg

. TODO: This still needs to be fixed

6) check / verify the proper prefixes for use with epub3 (ie. for epub

. The fix to add the proper epub3 xmlns declaration for nav has now been committed to KindleUnpack master.

7) make sure opf: prefixes are removed from all dc metadata tags under epub3

. The fix for this has now been committed to KindleUnpack Master

8) make sure ncx meets valid daisy spec if present.

. TODO - I still need to figure out what is the correct set of information to show here under epub2 and epub3

Hope this helps,

KevinH

KevinH · 05-22-2015, 01:36 PM

Hi elmimmo,

It seems that kindlegen strips out all manifest properties completely from valid epub3 input. At least kindlegen does put spine properties in the k8 RESC section so that they are not lost.

Therefore the only way to properly set any of the manifest properties for outputting an epub3 will be to literally parse every xhtml page looking for use of svg, math, and switch tags.

This will require adding an xhtml parser to KindleUnpack because trying this with just "re" (regular expressions) while doable may lead to mistakes when pre tags, scripts, and use any of these special terms.

So, if I have to parse and walk every xhtml file anyway, I should be able to detect if the original-resolution is present and if so add the meta viewport if not present.

Basically, I am thinking of using the Sigil python plugin ePub3-itizer code and incorporate parts of it into KindleUnpack to properly set the DOCTYPE in each file while harvesting the use of svg, mathml, and epub:switch tags for use in the opf manifest creation.

This will take some work but it should take care of your initial issue while fixing missing manifest properties in general for epub3. It is a shame that kindlegen does not keep them.

KevinH

05-19-2015, 01:37 PM	#1131
elmimmo Member Posts: 16 Karma: 10 Join Date: Oct 2012 Device: Kindle 4	Forgot to add (just for context; again I am fine if you think this is not in line with KindleUnpack's purpose) that while omitting the NCX DOCTYPE is optional in EPUB2, doing so is actually required in EPUB 3. Again, a DOCTYPE that my source lacks but KindleUnpack is adding when outputting to EPUB 3. Last edited by elmimmo; 05-19-2015 at 01:39 PM.

05-20-2015, 11:02 AM	#1135
elmimmo Member Posts: 16 Karma: 10 Join Date: Oct 2012 Device: Kindle 4	With KindleUnpack v0.75, I could run KindleUnpack.py from the command line from any path (I used to create a symlink of kindleunpack.py and place it in my path). With v0.80, I need to first have cd'd into KindleUnpack's lib folder and run it from there, like ./kindleunpack.py. Trying to run it from another path by calling kindleunpack.py with its full path (relative or absolute makes no difference) or by calling a symlink to kindleunpack.py that resides in my path, I get: Code: Traceback (most recent call last): File "/Users/XXXX/bin/KindleUnpack/lib/kindleunpack.py", line 13, in <module> from .compatibility_utils import PY2, binary_type, utf8_str, unicode_str ImportError: No module named compatibility_utils Is this not the intended way to run it? The python version in my path is 2.7.9, installed with Homebrew on Mac OS 10.10.3. Last edited by elmimmo; 05-20-2015 at 11:07 AM.

05-21-2015, 09:43 AM	#1139
KevinH Sigil Developer Posts: 7,636 Karma: 5433388 Join Date: Nov 2009 Device: many	Hi, Here is a status update for those interested (using the very latest KindleUnpack v080). 1) if viewports meta exists in azw3, force conversion as epub3 . Actually the "fixed-layout" metadata is already detected and will set autodetection to generate an epub3. Simply use the switch --epub_version=A 2) add viewport metadata to each xhtml when viewport is found in azw3 metadata (if and only if kindlegen removes it when present in the input source) . As previously stated, kindlegen does not remove viewport meta in xhtml files so nothing needs to be done here for valid epub3 input -> valid epub3 output 3) properly setting the language metadata as specified in the azw3 . The fix for this was just committed to KindleUnpack master 4) adding metadata charset for xhtml docs (including nav) that matches the charset as specified in the azw3 header for epub3 . The fix for adding meta charset="UTF-8" to created nav under epub3 has been committed to KindleUnpack master 5) adding the proper svg manifest property under epub3 for created cover images with svg . TODO: This still needs to be fixed 6) check / verify the proper prefixes for use with epub3 (ie. for epub . The fix to add the proper epub3 xmlns declaration for nav has now been committed to KindleUnpack master. 7) make sure opf: prefixes are removed from all dc metadata tags under epub3 . The fix for this has now been committed to KindleUnpack Master 8) make sure ncx meets valid daisy spec if present. . TODO - I still need to figure out what is the correct set of information to show here under epub2 and epub3 Hope this helps, KevinH Last edited by KevinH; 05-21-2015 at 10:54 AM.

05-22-2015, 01:36 PM	#1140
KevinH Sigil Developer Posts: 7,636 Karma: 5433388 Join Date: Nov 2009 Device: many	Hi elmimmo, It seems that kindlegen strips out all manifest properties completely from valid epub3 input. At least kindlegen does put spine properties in the k8 RESC section so that they are not lost. Therefore the only way to properly set any of the manifest properties for outputting an epub3 will be to literally parse every xhtml page looking for use of svg, math, and switch tags. This will require adding an xhtml parser to KindleUnpack because trying this with just "re" (regular expressions) while doable may lead to mistakes when pre tags, scripts, and use any of these special terms. So, if I have to parse and walk every xhtml file anyway, I should be able to detect if the original-resolution is present and if so add the meta viewport if not present. Basically, I am thinking of using the Sigil python plugin ePub3-itizer code and incorporate parts of it into KindleUnpack to properly set the DOCTYPE in each file while harvesting the use of svg, mathml, and epub:switch tags for use in the opf manifest creation. This will take some work but it should take care of your initial issue while fixing missing manifest properties in general for epub3. It is a shame that kindlegen does not keep them. KevinH Last edited by KevinH; 05-22-2015 at 01:45 PM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Can i rotate text and insert images in Mobi and EPUB?	JanGLi	Kindle Formats	5	02-02-2013 04:16 PM
PDF to Mobi with text and images	pocketsprocket	Kindle Formats	7	05-21-2012 07:06 AM
Mobi files - images	DWC	Introduce Yourself	5	07-06-2011 01:43 AM
pdf to mobi... creating images rather than text	Dumhed	Calibre	5	11-06-2010 12:08 PM
Transfer of images on text files	anirudh215	PDF	2	06-22-2009 09:28 AM

05-19-2015, 05:39 PM	#1134
KevinH Sigil Developer Posts: 7,636 Karma: 5433388 Join Date: Nov 2009 Device: many	Hi, I was referring to this link: https://github.com/IDPF/epubcheck/issues/305 This issue is under review and of the last release of epubcheck 4 alpha having a DOCTYPE on a fallback ncx was considered acceptable. It seems someone disagrees with that on Apr 7, but I see no commit message or review to change epubcheck 4. So this one may get reversed yet again. In epubcheck 3 it was okay, in epubcheck 3.1 it was an error, and in epubcheck 4 alpha it was no longer an error. So I guess this is a wait and see. Either way KindleUnpack will keep the DOCTYPE on the ncx until epubcheck 4 final comes out and declares it one way or the other.

05-20-2015, 11:08 AM	#1136
eschwartz Ex-Helpdesk Junkie Posts: 19,422 Karma: 85397180 Join Date: Nov 2012 Location: The Beaten Path, USA, Roundworld, This Side of Infinity Device: Kindle Touch fw5.3.7 (Wifi only)	I was playing with a setuptools-powered installer that might help on general principle... When I get back to my computer I can post the patch.

Advert