View Single Post
Old 06-28-2011, 01:58 AM   #73
naisren
Enthusiast
naisren began at the beginning.
 
Posts: 41
Karma: 12
Join Date: Jul 2009
Device: ppc
Quote:
Originally Posted by siebert View Post
It took me several weeks of reverse engineering and a few days of coding, but here is finally the first mobiunpack version supporting dictionaries!

I've made some shortcuts and omitted features not necessary for the dictionaries I'm interested in (e.g. unicode support, the old deprecated inflection format), so the script might not work for the dictionary you want to unpack, but feel free to improve the code

A couple of other fixes and enhancements are also included, most notable some speed optimizations. A huge dictionary is now unpacked within minutes instead of hours by using temporary files.

Have fun!

Attachment 73437
My god, I could not believe where I am now. You make my dream come true.
unpacking speed, inflection, unicode, all these were pain, now no more harassing!

Thanks a lot,

my test:
mobiunpack.py PocketOxford.mobi
MobiUnpack 0.26
Copyright (c) 2009 Charles M. Hannum <root@ihack.net>
With Images Support and Other Additions by P. Durrant and K. Hendricks
With Dictionary Support and Other Additions by S. Siebert
Unpacking Book...
Mobipocket version 4
Huffdic compression
Unpack raw html
Document contains orthographic index, handle as dictionary
Info: Index doesn't contain entry length tags
Read dictionary index data
Warning: There are unprocessed index bytes left: 08
Warning: There are unprocessed index bytes left: af
Warning: There are unprocessed index bytes left: 01 a1
Warning: There are unprocessed index bytes left: 00 18 ff
Warning: There are unprocessed index bytes left: 75 70
Warning: There are unprocessed index bytes left: 76 04 8e
Warning: There are unprocessed index bytes left: aa 01 c0
Warning: There are unprocessed index bytes left: 28
Warning: There are unprocessed index bytes left: 67 02 77
Warning: There are unprocessed index bytes left: c4 00 d0
Warning: There are unprocessed index bytes left: 6f 75
Warning: There are unprocessed index bytes left: 0a
Decode images
Find link anchors
Insert data into html
Insert hrefs into html
Remove empty anchors from html
Insert image references into html
Write html
Write opf
Completed
The Mobi HTML Markup Language File can be found at: PocketOxford\PocketOxford.html

Code:
<mbp:pagebreak></mbp:pagebreak> <a></a><idx:entry>
<idx:orth value="ley [1]">
</idx:entry>
<div bgcolor="#FFFFDD" border="1" bordercolor="#000066"><span color="#000066"> <b>ley</b> [1] </span></div> <div align="left"> <span color="red"><i>noun</i></span> <br/><img src="images/00005.jpg" /> a piece of land temporarily put down to grass, clover, etc. </div> <br/><span color="#000066">ORIGIN</span>: Old English, «fallow»; related to <a href="" filepos="0011367107" ><b><small>LAY</small></b></a> and <a href="" filepos="0011568603" ><b><small>LIE</small></b></a>. <hr color="#000066" width="70%"/> <div align="center"><a onclick="history.back()"><img src="images/00006.jpg"  border="0" align="middle"/> Back</a>***<a onclick="index_search()"><img src="images/00004.jpg"  align="middle" border="0"/> New Search</a></div> <mbp:pagebreak></mbp:pagebreak> <a></a><idx:entry>
<idx:orth value="ley [2]">
</idx:entry>
<div bgcolor="#FFFFDD" border="1" bordercolor="#000066"><span color="#000066"> <b>ley</b> [2] </span> (also <b>ley line</b>) </div> <div align="left"> <span color="red"><i>noun</i></span> <br/><img src="images/00005.jpg" /> a supposed straight line connecting three or more ancient sites, associated by some with lines of energy and other paranormal phenomena. </div> <br/><span color="#000066">ORIGIN</span>: variant of <a href="" filepos="0011383653" ><b><small>LEA</small></b></a>. <hr color="#000066" width="70%"/> <div align="center"><a onclick="history.back()"><img src="images/00006.jpg"  border="0" align="middle"/> Back</a>***<a onclick="index_search()"><img src="images/00004.jpg"  align="middle" border="0"/> New Search</a></div>
naisren is offline   Reply With Quote