KindleUnpack (MobiUnpack): Extracts text, images and metadata from Kindle/Mobi files - Page 56

NiLuJe · 06-26-2014, 04:37 PM

I'll try to test this tonight, but if you don't hear back from me, I'll be back on Monday

.

Doitsu · 06-27-2014, 04:02 AM

Quote:

Originally Posted by KevinH

I am glad to hear that a sideloaded apnx file can be consumed and that the .azw3r and .azw3f funny pieces are created during reading.

The PW2 only creates an .azw3r file if an .apnx was copied to the .sdr folder prior to opening the ebook for the first time. (It also creates an addional .mbp1 file in .sdr folders of mobi7 files.) However, it seems that it cannot create an .azw3r/.mbp1 that contains valid page number references from the reverse-engineered .apnx files.

I've created another set of test files with page numbers wrapped in <div>...</div> tags with ids. I've also added page target links to the HTML TOC for testing purposes.
When accessing pages via that TOC, my K3 always displayed the correct page number, it also displayed the correct page numbers at the beginning of a new chapter when using the page forward/backward buttons.
However, when I selected the next chapter with the four-way button, my K3 would display the previous page number for azw3 files and the correct one for mobi7 files. IMHO, this is as good as it gets and I won't do any more research about this.

Please find attached the updated files generated with KindleUnpack_v072f_test. (This time I've also included the mobi7 sidecar files generated by my PW2 and the .mbp and .han files generated by my K3.)

KevinH · 06-27-2014, 10:07 AM

Hi Doitsu,

Quote:

Originally Posted by Doitsu

The PW2 only creates an .azw3r file if an .apnx was copied to the .sdr folder prior to opening the ebook for the first time. (It also creates an addional .mbp1 file in .sdr folders of mobi7 files.) However, it seems that it cannot create an .azw3r/.mbp1 that contains valid page number references from the reverse-engineered .apnx files.

That is very strange. When I compare the data stored inside the .azw3r file it is obviously created directly from the apnx file. It has every single offset value and it has the exact same pagemap itself.

To make this more specific. The apnx file that was created had the following pagemap and page offsets (in hex).

pagemap string: (2,r,1),(5,a,4)

page starting offsets: 0x13f, 0x783, 0x1591, 0x22d2, 0x3239, 0x4051, 0x4d9b, 0x5d0b, 0x6b23, 0x786d

The azw3r file has the following data in it (in hex and strings)

Code:

00000000: 0000 0000 001a b126 0200 0000 0000 0000  .......&........
00000010: 0101 0000 0002 fe00 0017 616e 6e6f 7461  ..........annota
00000020: 7469 6f6e 2e63 6163 6865 2e6f 626a 6563  tion.cache.objec
00000030: 7401 0000 0000 fffe 0000 0861 706e 782e  t..........apnx.
00000040: 6b65 7903 0000 0131 0300 0004 4542 4f4b  key....1....EBOK
00000050: 0001 0100 0000 0b01 0000 0000 0100 0001  ................
00000060: 3f01 0000 0783 0100 0015 9101 0000 22d2  ?.............".
00000070: 0100 0032 3901 0000 4051 0100 004d 9b01  ...29...@Q...M..
00000080: 0000 5d0b 0100 006b 2301 0000 786d 0100  ..]....k#...xm..
00000090: 0000 0201 0000 000a 0100 0000 0a03 0000  ................
000000a0: 0f28 322c 722c 3129 2c28 352c 612c 3429  .(2,r,1),(5,a,4)
000000b0: ff

You can easily see the page map identically stored near the end of the azw3 file as (2,r,1),(5,a,4).

You can also see the exact same set of page offsets in the azw3r file starting at location 0x59 reproduced here:

0000013f 01
00000783 01
00001591 01
000022d2 01
00003239 01
00004051 01
00004d9b 01
00005d0b 01
00006b23 01
0000786d 01

So each offset is properly found and stored along with a single byte type identifier of some sort.

In a similar way, I can find the number of page names (10 = 0xa) and the length of the pagemap string all properly stored as well.

So your PW2 does seem to be properly consuming that apnx file and parsing and storing that information in the azw3r file.

So we are very very close. It must be something else in the apnx.key before the page offsets, and pagemap that is confusing the hell out of your PW2.

It would be nice to see a a working azw3r file to compare it against to see if it is something simple we can fix in the apnx generatiion routine to make it work.

Thanks for testing and reporting back.

Take care,

KevinH

Doitsu · 06-27-2014, 12:35 PM

I did another test with a commercial book with real page numbers and was surprised that it wouldn't display page numbers, when I activated the menu bar.
Out of curiosity, I selected Go To > Page or Location and in the Go to Page or Location dialog box both "real page numbers" and locations could be selected.
When I did the same with my test book, both pages and locations were being displayed and could be selected also (see screen capture).
It looks like that there's either a bug in the latest PW2 firmware that disables the automatic display of page numbers or something is wrong with my particular PW2 or my PW2 settings.
However, there doesn't seem to be any user setting to display or hide page numbers.

If other PW2 owners read this thread: does your PW display page numbers when you select the menu bar? If so, did you have to enable/disable a particular setting to display it?

@KevinH: I'm sorry if I caused you extra work. After all, your code does work fine even with PW2 Kindles. It's just that PW2 Kindles with the latest firmware apparently don't display page numbers. However, page numbers can be selected via the Go To menu.

KevinH · 06-27-2014, 06:43 PM

Hi Doitsu,

Glad to hear it works! I think we can call that feature ready for prime time. I will wait another week or so, and if no bug reports come up, I'm use v072f to create the main v073 release.

Take care,

KevinH

tkeo · 06-28-2014, 07:32 AM

Hi,

Thank you for many improvements.

I've fixed a minor bug regarding cover image renaming.
In addition, a warning messase is added and modifed comments about k8resc.

tkeo · 06-28-2014, 08:06 AM

Hi Kevin,

You might have already modified; so I post patches separated to the bugfix.

These patches do not make functional changes, only removing needless comments and code a little.

Take care,

KevinH · 06-28-2014, 10:09 AM

Hi tkeo,

Thanks for the minor bug fixes and opf and k8resc changes. I will add them to my tree. Also, do you see any way to reduce the degree of redundancy in the opf generation code?

Thanks,

KevinH

JSWolf · 06-28-2014, 12:18 PM

Quote:

Originally Posted by KevinH

Hi tkeo,

Thanks for the minor bug fixes and opf and k8resc changes. I will add them to my tree. Also, do you see any way to reduce the degree of redundancy in the opf generation code?

Thanks,

KevinH

Is there anyway when generating an ePub you can have the OPF cleaned of all that "from this point down" in the metadata section, all this can be ignored garbage"?

KevinH · 06-28-2014, 01:10 PM

Hi,

In short no. I need that info to help debug new metadata and features being implemented by Amazon. It will not hurt or impact anything in the epub.

Why do you want it removed? It just shows what the metadata was inside the original mobi. Kindlegen will ignore it if you pass that epub back through kindlegen.

KevinH

Quote:

Originally Posted by JSWolf

Is there anyway when generating an ePub you can have the OPF cleaned of all that "from this point down" in the metadata section, all this can be ignored garbage"?

DiapDealer · 06-28-2014, 01:49 PM

Quote:

Originally Posted by KevinH

Why do you want it removed? It just shows what the metadata was inside the original mobi. Kindlegen will ignore it if you pass that epub back through kindlegen.

I think the mistake JS (and others) tend to make is in thinking of KindleUnpack in terms of a DRM-Free KF8 to ePub converter, instead of a tool to study how Kindlebooks are constructed (I blame it on that guy who made a plugin out of it

). It can be used as such, of course ... but it's never going to give him the lean, mean, no-extraneous-code ePub he desires. That's just not its main purpose.

KevinH · 06-28-2014, 02:37 PM

Hi tkeo,

I am confused. You changed the following in your patch and said it fixed a bug with cover images.

Code:

--- lib_org/kindleunpack.py     Thu Jun 26 14:43:20 2014
+++ lib/kindleunpack.py Sat Jun 28 20:18:14 2014
@@ -393,7 +393,7 @@
         return imgnames, image_ptr

     imgname = "image%05d.%s" % (i, imgtype)
-    if cover_offset and i == beg + cover_offset:
+    if cover_offset != None and i == beg + cover_offset:
         imgname = "cover%05d.%s" % (i, imgtype)
     print "Extracting image: {0:s} from section {1:d}".format(imgname,i)
     outimg = os.path.join(files.imgdir, imgname)

but the following works for my 2.7.7 implementation ...

Code:

kkk = 2
jjj = None
# test for jjj not equal to None
if jjj and kkk == jjj + 2:
    print "I failed"
else:
    print "I passed"

# now test to make sure 0 is considered to be different 
# from None in this regard
jjj= 0
if jjj and kkk == jjj + 2:
    print "I failed"
else:
    print "I passed"

# now test the normal mode of operation
jjj = 1
if jjj and kkk == jjj + 1:
    print "I passed"
else:
    print "I failed"

If I call that snippet junk.py and run it I get:

Code:

Kevins-iMac:~ kbhend$ python junk.py
I passed
I passed
I passed
Kevins-iMac:~ kbhend$

So is your original change needed? What bug is it fixing? Does it fail on some version of python 2.X that I haven't tested with? Or is this a change for readability's sake?

Thanks,

Kevin

Quote:

Originally Posted by tkeo

Hi,

Thank you for many improvements.

I've fixed a minor bug regarding cover image renaming.
In addition, a warning messase is added and modifed comments about k8resc.

tkeo · 06-29-2014, 03:17 AM

Hi Kevin,

Quote:

Originally Posted by KevinH

I am confused. You changed the following in your patch and said it fixed a bug with cover images.
So is your original change needed? What bug is it fixing?

The posted patch fixes the bug that cover image is not renamed when cover_offset = 0 due to bool(0) == False.

This happens with the mobi file krzyzacy-tom-pierwszy.mobi, posted:
https://www.mobileread.com/forums/sho...&postcount=693

EDIT: Attached CODE is removed because I have mistaken something in it.
Please try this:

Code:

Thanks,

tkeo · 06-29-2014, 04:48 AM

Hi Kevin,

Quote:

Originally Posted by KevinH

Thanks for the minor bug fixes and opf and k8resc changes. I will add them to my tree. Also, do you see any way to reduce the degree of redundancy in the opf generation code?

I feel the necessity of discussion to modify 'mobi_opf.py,' so I have not touched.
Followings are in my mind,

1. The most redundant part is manifest generation. Is it better to merge buildOPFManifest() and buildEPUB2OPFManifest() to buildEPUB3OPFManifest(), and rename buildEPUB3OPFManifest()? Personally, I prefer to merge.

2. I think it is better to merge writeK8OPF() to writeOPF().

3. Keeping buildOPF() to be separated because merging it cause many 'if statments' and decrease readability, it may better to merge buildEPUB2OPF() and buildEPUB3OPF() to buildK8OPF().

4. removing constants OPF_NAME, TOC_NCX and NAVIGATION_DOCUMENT, and changing to hard code due to almost no necessity.

Besides, I have modified 'mobi_split.py' to make processing faster. I have considered to post it after v0.73 release in oder to avoid more testing. But I will post it if some one want.

Thanks,

KevinH · 06-29-2014, 10:52 AM

Quote:

Originally Posted by tkeo

Hi Kevin,

The posted patch fixes the bug that cover image is not renamed when cover_offset = 0 due to bool(0) == False.

This happens with the mobi file krzyzacy-tom-pierwszy.mobi, posted:
https://www.mobileread.com/forums/sho...&postcount=693

EDIT: Attached CODE is removed because I have mistaken something in it.
Please try this:

Code:

Thanks,

Thanks for catching that. I did not think the cover offset could ever be 0 but I guess it could be if there is no thumbnail and no other images.

In an effort to help track down other mistakes like that, I have changed the entire code base to use the python.org PEP recommendation when testing against None as follows:

I have changed all:
== None
to:
is None

And changed all:
!= None
to:
is not None

Now any testing against 0 vs non-zero using "if variable" will stand out as different and I can track down any such occurrences to make sure they were not using it to test against None but were instead looking at the non-zero case.

I have attached a patch to move us to follow the python PEP recommendation.
See pep_use_patch.txt

Nicely done!

KevinH

06-27-2014, 12:35 PM	#829
Doitsu Grand Sorcerer Posts: 5,584 Karma: 22735033 Join Date: Dec 2010 Device: Kindle PW2	I did another test with a commercial book with real page numbers and was surprised that it wouldn't display page numbers, when I activated the menu bar. Out of curiosity, I selected Go To > Page or Location and in the Go to Page or Location dialog box both "real page numbers" and locations could be selected. When I did the same with my test book, both pages and locations were being displayed and could be selected also (see screen capture). It looks like that there's either a bug in the latest PW2 firmware that disables the automatic display of page numbers or something is wrong with my particular PW2 or my PW2 settings. However, there doesn't seem to be any user setting to display or hide page numbers. If other PW2 owners read this thread: does your PW display page numbers when you select the menu bar? If so, did you have to enable/disable a particular setting to display it? @KevinH: I'm sorry if I caused you extra work. After all, your code does work fine even with PW2 Kindles. It's just that PW2 Kindles with the latest firmware apparently don't display page numbers. However, page numbers can be selected via the Go To menu. Attached Thumbnails Last edited by Doitsu; 06-28-2014 at 12:41 AM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Can i rotate text and insert images in Mobi and EPUB?	JanGLi	Kindle Formats	5	02-02-2013 04:16 PM
PDF to Mobi with text and images	pocketsprocket	Kindle Formats	7	05-21-2012 07:06 AM
Mobi files - images	DWC	Introduce Yourself	5	07-06-2011 01:43 AM
pdf to mobi... creating images rather than text	Dumhed	Calibre	5	11-06-2010 12:08 PM
Transfer of images on text files	anirudh215	PDF	2	06-22-2009 09:28 AM

06-26-2014, 04:37 PM	#826
NiLuJe BLAM! Posts: 13,477 Karma: 26012494 Join Date: Jun 2010 Location: Paris, France Device: Kindle 2i, 3g, 4, 5w, PW, PW2, PW5; Kobo H2O, Forma, Elipsa, Sage, C2E	I'll try to test this tonight, but if you don't hear back from me, I'll be back on Monday .

06-27-2014, 06:43 PM	#830
KevinH Sigil Developer Posts: 7,645 Karma: 5433388 Join Date: Nov 2009 Device: many	Hi Doitsu, Glad to hear it works! I think we can call that feature ready for prime time. I will wait another week or so, and if no bug reports come up, I'm use v072f to create the main v073 release. Take care, KevinH

06-28-2014, 10:09 AM	#833
KevinH Sigil Developer Posts: 7,645 Karma: 5433388 Join Date: Nov 2009 Device: many	Hi tkeo, Thanks for the minor bug fixes and opf and k8resc changes. I will add them to my tree. Also, do you see any way to reduce the degree of redundancy in the opf generation code? Thanks, KevinH

Advert

Advert