Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Kindle Formats

Notices

Reply
 
Thread Tools Search this Thread
Old 09-10-2011, 04:04 PM   #151
siebert
Developer
siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.
 
Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
Quote:
Originally Posted by pdurrant View Post
I think that unknown Metadata should only be showing as a warning. There is almost always some unknown metadata, as the Mobipocket/Print Replica file format is undocumented.
Hi,

in my version I've already introduced a list of "known unknown" metadata (means that we know that these values exist, but we don't know the meaning) and mobiunpack complains only if an unknown value isn't in this list.

I hope I'll find time to release my version soon

Ciao,
Steffen
siebert is offline   Reply With Quote
Old 09-10-2011, 04:11 PM   #152
Anjelous
Junior Member
Anjelous began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Sep 2011
Device: iPad
Edited! Ok to delete this post as I found another thread that better answers my question

Last edited by Anjelous; 09-10-2011 at 05:02 PM.
Anjelous is offline   Reply With Quote
Old 09-12-2011, 09:05 AM   #153
fandrieu
Member
fandrieu began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Sep 2011
Device: kindle 3
mobiunpack modifications

Hello everybody.

First I'd like to thank the community for all the good work, without the homebrew tools my experience with the mobi file format and the kindle as a whole would'nt have been nearly as nice !

Back on topic, I first came here to ask if somebody's maintaining mobiunpack.py / accepting patches, but reading the last few post it would seem that both pdurrant and siebert are working on a branch, am I right ?
If so could I contribute ?

...

Also in the last few posts there were talks about extracting the NCX from mobi files, it just so happens that's the very feature I've been toying with this weekend and made me come here today
At this time I got a (pretty awful) proof of concept code that can extract flat "chapter only" NCX, I got the necessay clues from the "writer" part of the calibre mobi module, I could elaborate on that if somebody's interested...

Apart from that I made some corrections (like the encoding header in the html, which appears to be in siebert's branch) and also have an alternate "Adding anchors..." code that reconstructs all anchors, even when they're not referenced, and should avoid adding anchors in the <head> (a bug i encountered with some files).

I was also interested in re-factoring the code to be more readable / workable (this also appears to be in siebert's plans ).

I started with the (pdurrant's ?) version @ http://code.google.com/, but wouldn't mind switching...
fandrieu is offline   Reply With Quote
Old 09-12-2011, 09:16 AM   #154
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,506
Karma: 5433350
Join Date: Nov 2009
Device: many
Hi,

Great! The more the merrier. If you look through this topic you will find links to later versions than what we (pdurrant and I) hosted on code.google.com - we have not bothered to update that site lately. Yes, you are right siebert has added support for Dictionaries and made some major speed improvements. I have added code to spit out more of the metadata so that the tool can be used to investigate more about what each metadata means (for example we recently found what we think is the expiration date), and pdurrant has added support for non-drm versions of the .azw4 format.

Simply walk through this thread and grab the very latest version of mobiunpack.py that you see and use that as your starting point. I believe you want mobiunpack.py version 0.31 posted by pdurrant a few days ago to this thread. siebert may have an even newer version but I don't think he has posted it yet. Let me know if you can't find it and I will post it again for you.

KevinH
KevinH is offline   Reply With Quote
Old 09-12-2011, 09:16 AM   #155
pdurrant
The Grand Mouse 高貴的老鼠
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 71,406
Karma: 305065800
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
I'm afraid the google code versions is very much out of date. The current version is 0.31, which can be found in this thread here.

While we really ought to be using version control software, in a clever shared manner, at present we seem to be posting updates here, which I copy back to the fifth post in this thread.

Some ncx generation code would be welcome. I posted a sample of the binary data representing an ncx, along with the source ncx file, here.

Any other changes would be interesting to see too.
pdurrant is offline   Reply With Quote
Old 09-12-2011, 09:17 AM   #156
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,463
Karma: 192992430
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
I'd be very interested in the seeing the NCX extraction code you've come up with. I use calibre to convert epubs to mobi, and then feed the output of mobiunpack to kindlegen... so not having to rebuild the NCX by hand each time would be very welcome indeed.
DiapDealer is offline   Reply With Quote
Old 09-12-2011, 11:54 AM   #157
siebert
Developer
siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.
 
Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
Quote:
Originally Posted by DiapDealer View Post
I use calibre to convert epubs to mobi, and then feed the output of mobiunpack to kindlegen...
Calibre had a debug option --kindlegen which used the kindlegen binary to build the mobi file.

Kovid removed that option a few days ago because he doesn't like me and my request to make that option selectable via the gui (though I'm obviously not the only person liking that feature), but if you are willing to use either an older or a modified version of calibre you don't need the mobiunpack step.

Ciao,
Steffen
siebert is offline   Reply With Quote
Old 09-12-2011, 12:14 PM   #158
fandrieu
Member
fandrieu began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Sep 2011
Device: kindle 3
here it is

Thank you very much for all your quick replies !

I just downloaded v31 from the link you provided and finished retro-fitting the modifications I had made to v23.

I'll try to explain why / how I did those changes later but first, as code speaks louder than words, here's the file.

....

Just a few words:
* I just finished merging, it's not tested

* The NCX part is really a proof of concept, it does however produce an acceptable output on my test files with flat NCX.
It consists of:
- a code block with 3 methods just before unpackBook
- a main code block in unpackBook, enclosed by "#TEST NCX"
- a small mod to the OPF code, to add a ref to the NCX

* Other than there's some "empirical" changes I made while testing some files:
- FILEPOS_ON_ALL_ANCHORS: an option to use an alternate code that processes all empty anchors instead of focusing on existing links...
- replaced a " " by "\s+" in the "Insert hrefs into html" rx...
- alternate way to set the html file encoding

EDIT: sorry, the file i uploaded contained several fatal errors i failed to spot.
EDIT: the new file should work at least with calibre-generated mobis...

EDIT2: added a text file describing what I gathered of the NCX equivalent in MOBI

EDIT3: basic fixes to the code...
Attached Files
File Type: txt mobiunpack_ncx.txt (2.3 KB, 311 views)
File Type: zip mobiunpack_31_fand_EDIT3.zip (15.8 KB, 286 views)

Last edited by fandrieu; 09-12-2011 at 06:03 PM.
fandrieu is offline   Reply With Quote
Old 09-12-2011, 12:48 PM   #159
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,463
Karma: 192992430
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by siebert View Post
Calibre had a debug option --kindlegen which used the kindlegen binary to build the mobi file.

Kovid removed that option a few days ago because he doesn't like me and my request to make that option selectable via the gui (though I'm obviously not the only person liking that feature), but if you are willing to use either an older or a modified version of calibre you don't need the mobiunpack step.
How would that be any different than feeding the epub directly to kindlegen? My reason for using calibre as an intermediate step is because calibre does a much better job of translating/flattening an ePub's CSS into a mobi that more accurately reflects the original (visibly) than kindlegen currently does. Kindlegen can then take the mobiunpack output and create the final mobi (with the approved tools). Or am I missing something?
DiapDealer is offline   Reply With Quote
Old 09-12-2011, 01:58 PM   #160
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,506
Karma: 5433350
Join Date: Nov 2009
Device: many
Hi,

Thanks for posting! I grabbed it and tried it on a bunch of mobis I had and unfortunately, the internal links anchors from many of the internal links in the document no longer work. I tested it with mobiunpack version 31 without your changes and all internal links worked.

So somehow your changes have broken some of the internal links.
I will try to track this down.

I did get some form of NCX file but it was incomplete and there were error messages:

Write html
ERROR: last byte not 0x80
ERROR: text not found 1354424
Wite ncx
Write opf

I will keep playing with it to see if I get get the internal links working again.

Thanks for getting this ncx stuff going!

KevinH



Quote:
Originally Posted by fandrieu View Post
Thank you very much for all your quick replies !

I just downloaded v31 from the link you provided and finished retro-fitting the modifications I had made to v23.

I'll try to explain why / how I did those changes later but first, as code speaks louder than words, here's the file.

....

Just a few words:
* I just finished merging, it's not tested

* The NCX part is really a proof of concept, it does however produce an acceptable output on my test files with flat NCX.
It consists of:
- a code block with 3 methods just before unpackBook
- a main code block in unpackBook, enclosed by "#TEST NCX"
- a small mod to the OPF code, to add a ref to the NCX

* Other than there's some "empirical" changes I made while testing some files:
- FILEPOS_ON_ALL_ANCHORS: an option to use an alternate code that processes all empty anchors instead of focusing on existing links...
- replaced a " " by "\s+" in the "Insert hrefs into html" rx...
- alternate way to set the html file encoding

EDIT: sorry, the file i uploaded contained several fatal errors i failed to spot.
EDIT: the new file should work at least with calibre-generated mobis...
KevinH is offline   Reply With Quote
Old 09-12-2011, 02:03 PM   #161
pdurrant
The Grand Mouse 高貴的老鼠
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 71,406
Karma: 305065800
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
Quote:
Originally Posted by DiapDealer View Post
How would that be any different than feeding the epub directly to kindlegen? My reason for using calibre as an intermediate step is because calibre does a much better job of translating/flattening an ePub's CSS into a mobi that more accurately reflects the original (visibly) than kindlegen currently does. Kindlegen can then take the mobiunpack output and create the final mobi (with the approved tools). Or am I missing something?
I believe the --kindlegen option did the usual conversion to mobipocket-specific HTML, but them used kidnlegen to compile it into an actual mobipocket file rather than calibre's own mobipocket file generation code.
pdurrant is offline   Reply With Quote
Old 09-12-2011, 02:29 PM   #162
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,463
Karma: 192992430
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by pdurrant View Post
I believe the --kindlegen option did the usual conversion to mobipocket-specific HTML, but them used kidnlegen to compile it into an actual mobipocket file rather than calibre's own mobipocket file generation code.
Ahhh, ok, that makes sense. Thanks.
DiapDealer is offline   Reply With Quote
Old 09-12-2011, 02:32 PM   #163
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,506
Karma: 5433350
Join Date: Nov 2009
Device: many
Hi,

Your link_pattern used if FILEPOS_ON_ALL_ANCHORS is True seems to be a bit broken:

For example: here is what the rawml says for one link:

<a filepos=0000006414 >M<span><font size="2">APS</font></span></a>

but this link is never properly detected or processed by your link pattern:

link_pattern = re.compile(r'''<a\s*(></a>|/>)''', re.IGNORECASE)

So you might want to take another look at your link patterns to make sure rawml of this type gets processed properly.

Hope this helps,

KevinH




Quote:
Originally Posted by KevinH View Post
Hi,

Thanks for posting! I grabbed it and tried it on a bunch of mobis I had and unfortunately, the internal links anchors from many of the internal links in the document no longer work. I tested it with mobiunpack version 31 without your changes and all internal links worked.

So somehow your changes have broken some of the internal links.
I will try to track this down.

I did get some form of NCX file but it was incomplete and there were error messages:

Write html
ERROR: last byte not 0x80
ERROR: text not found 1354424
Wite ncx
Write opf

I will keep playing with it to see if I get get the internal links working again.

Thanks for getting this ncx stuff going!

KevinH
KevinH is offline   Reply With Quote
Old 09-12-2011, 04:50 PM   #164
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,506
Karma: 5433350
Join Date: Nov 2009
Device: many
index support

Hi,

Okay I looked more at this index material. It appears the "type" information is key to understanding how to read in the indx information.

For example:

To correctly parse the indx entries, I had to do something like the following:

if type == 0x1f:
# handle next two variable width unknowns
pos, unk1 = getVariableWidthValue(navdata,offset)
offset += pos
print "unknown 1 is ", unk1
pos, unk2 = getVariableWidthValue(navdata,offset)
offset += pos
print "unknown 2 is ", unk2
if type == 0xdf:
# handle next threee variable width unknowns
pos, unk1 = getVariableWidthValue(navdata,offset)
offset += pos
print "unknown 1 is ", unk1
pos, unk2 = getVariableWidthValue(navdata,offset)
offset += pos
print "unknown 2 is ", unk2
pos, unk3 = getVariableWidthValue(navdata,offset)
offset += pos
print "unknown 3 is ", unk3
pos, unk4 = getVariableWidthValue(navdata,offset)
offset += pos
print "unknown 4 is ", unk4
if type == 0x3f:
# handle next threee variable width unknowns
pos, unk1 = getVariableWidthValue(navdata,offset)
offset += pos
print "unknown 1 is ", unk1
pos, unk2 = getVariableWidthValue(navdata,offset)
offset += pos
print "unknown 2 is ", unk2
pos, unk3 = getVariableWidthValue(navdata,offset)
offset += pos
print "unknown 3 is ", unk3

and then there is no need to look for or skip 0x80 values.

Also the count is not the same as the number of entries in the CTOC.

From my set of ebooks, the CTOC data always ends with '\0\0' double null bytes and it has variable length.

So I have attached a mobiunpack_test.py program that modifies things to work with a real amazon mobi ebook (as opposed to calibre generated ones).

Perhaps this might help others trying to track things down.

I am going to try and figure out what each of these unknowns actually means.

Hope this helps,

KevinH

Last edited by KevinH; 09-15-2011 at 06:55 PM.
KevinH is offline   Reply With Quote
Old 09-12-2011, 05:36 PM   #165
siebert
Developer
siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.
 
Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
Quote:
Originally Posted by KevinH View Post
Okay I looked more at this index material. It appears the "type" information is key to understanding how to read in the indx information.
The various indexes seem to be very similar in mobi, so the ncx handling code should be able to reuse a lot of my code for the inflection index.

INDX0 is the meta index and the TAGX section can be parsed with readTagSection(). INDX1 is the actual index data, and the CTOC data is like the inflNameData.

Ciao,
Steffen
siebert is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Can i rotate text and insert images in Mobi and EPUB? JanGLi Kindle Formats 5 02-02-2013 04:16 PM
PDF to Mobi with text and images pocketsprocket Kindle Formats 7 05-21-2012 07:06 AM
Mobi files - images DWC Introduce Yourself 5 07-06-2011 01:43 AM
pdf to mobi... creating images rather than text Dumhed Calibre 5 11-06-2010 12:08 PM
Transfer of images on text files anirudh215 PDF 2 06-22-2009 09:28 AM


All times are GMT -4. The time now is 04:45 PM.


MobileRead.com is a privately owned, operated and funded community.