![]() |
#226 | |||
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,680
Karma: 23983815
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
Quote:
I know that the Kindle app doesn't allow users to select user dictionaries anyway, but it is possible to patch the ASIN number of a user dictionary so that it matches the ASIN of one of the 5 official dictionaries. IMHO, it doesn't make much sense to convert a dictionary to a Mobipocket ebook because the user looses the dictionary functionality. Quote:
Please have a look at the original .html source file and the one that the script re-creates and you'll see that they differ significantly and I'm not talking about whitespace characters and line-breaks. |
|||
![]() |
![]() |
![]() |
#227 | |||||
Developer
![]() ![]() ![]() Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
|
You still didn't reveal which application you use to view the dictionary...
Quote:
![]() Quote:
Quote:
By the way I was very surprised to see that the unmodified dictionary works great on my new Kindle 3 (keyboard), it seems that the kindle firmware removes unnecessary formatting when displaying a dicitionary entry in the popup window, while the kindle app doesn't. Quote:
Quote:
Ciao, Steffen |
|||||
![]() |
![]() |
Advert | |
|
![]() |
#228 | ||
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,680
Karma: 23983815
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
Quote:
The reverse engineered source files. I believe it would be much easier and faster if you simply had a look at the source files. Since my very simple proof-of-concept .html source file only contains 7 dictionary definitions, it shouldn't be too complicated. Keep up the good work! Last edited by Doitsu; 10-29-2011 at 10:24 AM. |
||
![]() |
![]() |
![]() |
#229 |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 155
Karma: 200000
Join Date: Dec 2009
Location: Britania
Device: Android
|
Round-trip failure with mobiunpack & kindlegen v1.2 on linux
[If this should be a new thread, please do ask mods to move it]
This is not a support request. Just to let you know I noticed a round-trip failure using mobiunpack, kindlegen 1.2 for linux, and a Mobipocket edition of one of the Young Wizards books. I'm curious whether this is a known bug. I unpacked it, edited the "HTML", and invoked Kindlegen on the OPF file. (That's generall expected to work, right?) No problem so far; FBReader seemed happy with the new MOBI file. But then I tried to verify it by unpacking the new MOBI and checking for differences. This happened - Code:
<p height="0pt" width="0pt" align="justify"><a filepos=0000008568 ><font color="blue"><u>Consultations</u></font></a></p> Code:
<p height="0pt" width="0pt" align="justify"><a href="#filepos8519"><font color="blue"><u>Consultations</u></font></a></p> Code:
<mbp:pagebreak/></div><div><a id="filepos8568" /><a id="filepos8568" /> <p height="1em" width="0pt" align="center"><font size="5"><b><font color="red"> Consultations</font></b></font></p> FULL DISCLOSURE. The original MOBI also includes some "dead links" (href="../Text/#filepos6634"). After the round-trip, these appear as filepos=XXXXXXXX. So, it's possible these dead links are confusing mobiunpack, although I'm not sure how. [KindleGen warns "Warning(prcgen): Hyperlink not resolved", but continued anyway. I don't see any other warnings. Ideally mobiunpack would provide a similar warning during unpacking, so you can tell something odd has happened.] Second disclosure. From the above evidence, I believe that the "original" MOBI has already gone through at least one MOBI->EPUB->MOBI conversion. (Presumably edited in Sigil in between). I have a copy of what I assume is the EPUB version. The EPUB also has "calibre" written all over it (class="calibre"). So it's quite possible the MOBI I started with was generated by Calibre's reverse-engineered code, as opposed to the official MobiPocket/Kindle conversion code. Last edited by sourcejedi; 12-11-2011 at 11:42 AM. Reason: CODE tags preserve significant whitespace |
![]() |
![]() |
![]() |
#230 | |
The Grand Mouse 高貴的老鼠
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 73,649
Karma: 315126578
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Oasis
|
Quote:
The first thing to do would be to enable the raw output in MobiUnpack, and see if the duplicate destination markers are present in that. Looking at the raw output will also help to check whether the problem happens in Mobiunpack (in the conversion to HTML links) or in KindleGen. When I have some spare time, I might take a look at this, but I can't at the moment. It sounds like you're a pretty good hand at this - why not continue the investigative work yourself? Oh - and one thing to do would be to continue the Mobiunpack/KindleGen/Mobiunpack sequence a few times, and see if things keep on changing and getting worse. |
|
![]() |
![]() |
Advert | |
|
![]() |
#231 |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 155
Karma: 200000
Join Date: Dec 2009
Location: Britania
Device: Android
|
One-liner fix for the above
Done. [Attached zip: mobiunpack.py for testers; patch for developers].
You probably couldn't see the problem in the html I posted even if you tried, because I foolishly neglected to use CODE tags. The real problem was an extra space character between "<a" and "filepos=". mobiunpack doesn't say anything about "filepos=XXXXXXXX", so that must have come from KindleGen. (Although it could still be useful to warn about non-numeric filepos values). |
![]() |
![]() |
![]() |
#232 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,358
Karma: 203720150
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I see you've patched the 0.29 version of mobiunpack.py. Is that the version you were using when you discovered the issue?
I only ask because v0.32 of mobiunpack.py (the latest can always be found in post #5 of this thread) seems to have an updated regex pattern that would seem to achieve the same result as the regex in your patch: From v0.32 Code:
link_pattern = re.compile(r'''<[^<>]+filepos=['"]{0,1}(\d+)[^<>]*>''', re.IGNORECASE) Code:
link_pattern = re.compile(r'''<a[ ]+filepos=['"]{0,1}0*(\d+)['"]{0,1} *>''', re.IGNORECASE) Last edited by DiapDealer; 12-11-2011 at 01:42 PM. |
![]() |
![]() |
![]() |
#233 |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 155
Karma: 200000
Join Date: Dec 2009
Location: Britania
Device: Android
|
Sorry, yes. 0.32 from this thread works correctly. I was using the version from Siebert's git repo which describes itself as 0.29.
Thanks for pointing it out. I'm probably used to assuming 'git' means 'the latest version'. But that's not true in general, and I should have said where I got the program from. [Nitpick: I think you quoted the wrong link_pattern - there's two of them, and the first appears unchanged. The relevant one has your name next to it in 0.32 ![]() Code:
# Two different regex search and replace routines. # Best results are with the second so far IMO (DiapDealer). #link_pattern = re.compile(r'''<a filepos=['"]{0,1}0*(\d+)['"]{0,1} *>''', re.IGNORECASE) link_pattern = re.compile(r'''<a\s+filepos=['"]{0,1}0*(\d+)['"]{0,1}(.*?)>''', re.IGNORECASE) #srctext = link_pattern.sub(r'''<a href="#filepos\1">''', srctext) srctext = link_pattern.sub(r'''<a href="#filepos\1"\2>''', srctext) |
![]() |
![]() |
![]() |
#234 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,358
Karma: 203720150
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
|
![]() |
![]() |
![]() |
#235 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,478
Karma: 5703586
Join Date: Nov 2009
Device: many
|
mobiunpack and the new K8 format
Hi All,
You should check out the following links to get copies of the new amazon k8 format files to play around with and test with: http://www.the-digital-reader.com/20...now-available/ I grabbed the Jerome.mobi and tried unpacking it via mobiunpack.py with all DEBUG turned on. It seems that Amazon have simply combined two different mobi ebooks into one palm doc container. The one at the top is simply the normal mobi and mobiunpack works well on it but it generates extra raw pieces. You can find all of these extra raw pieces hidden away as image*.raw files inside the images folder. These include FONT and RESC files plus copies of each section in its own file until the end of the palm doc. So by examining these extra image*.raw files in a text editor we can see what each section of the palmdoc contains. Immediately after the normal mobi ebook (in the very next section) you can find a whole section that appears to be nothing but the word "BOUNDARY" which seems to be the divider between the older .mobi file format and the new format. It is followed by what looks like a new section 0 mobi header, and that is followed by all of the raw .xhtml in each section until the end (but unlike true image sections these has been compressed so we will need to uncompress them to see what the new xhtml looks like. So the old format mobi is at the top of the palmdoc container and immediately after the images and FLIS, FCIS (the images appear to by shared by both versions of the ebook) you can see the pieces that make up the new format. So it appears we can look for things in the first mobi header that indicates that that a KF8 style data is included, and then parse those records using the new section 0 very much like we process the original mobi. So anyone want to take a shot to modify the latest mobiunpack to unpack both versions of the files for these new K8s? Volunteers welcome! Last edited by KevinH; 12-16-2011 at 10:22 AM. |
![]() |
![]() |
![]() |
#236 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
The second entry is the source file I believe, generally an ePub exactly duplicated. Or are you talking about some other data?
|
![]() |
![]() |
![]() |
#237 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,478
Karma: 5703586
Join Date: Nov 2009
Device: many
|
Hi,
No there is a separate section for the source zip file as well. No we are talking about the a the k8 version of the ebook packed immediately after the normal mobi one in one palmdoc container. Grab version 0.32 of mobiunpack, edit it with a text editor to set DEBUG = True and run it on that K8 ebook and examine the extra .raw sections stored under debug mode inside of the image folder to see what I am referring to. |
![]() |
![]() |
![]() |
#238 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,478
Karma: 5703586
Join Date: Nov 2009
Device: many
|
very experimental mobi unpack for K8 mobis
Hi,
Just in case anyone wants to play around with the latest K8 .mobi files, I have attached a newest_mobi_unpack.zip I made massive changes and reorganized everything and split it into many different files and then renamed it to mobi_unpack to prevent confusion. This is very experimental and probably will not work for you. But if you want to play around, download and unzip it. Copy the test Jerome.mobi (see earlier link) into that directory. Change to that directory and then run: python ./mobi_unpack.py Jerome.mobi test/ (or whatever the windows equivalent is if you are on windows) If it works, inside of test you should see the original mobi info, a K8 folder that has the new K8 xhtml files, and a Jerome.epub which is the epub created from the new K8 files. You should also see a kindlegensrc.zip file which represents the original epub that was used to generate the Jerome.mobi which you can unzip and compare against the files in the K8 folder or the Jerome.epub. Please report any difficulties so we can fix any bugs. Happy Holidays! KevinH Last edited by KevinH; 01-14-2012 at 04:49 PM. Reason: remove attachment, updated version in later posts |
![]() |
![]() |
![]() |
#239 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,478
Karma: 5703586
Join Date: Nov 2009
Device: many
|
FYI:
DiapDealer found and fixed a number of bugs in the new mobi_unpack program for K8 files. Thanks to DiapDealer! So if anyone wants the updated version, check out my later posts in this thread to find the very latest version. KevinH Last edited by KevinH; 01-14-2012 at 04:50 PM. Reason: removed older version attachment, directing people to newer version |
![]() |
![]() |
![]() |
#240 |
Member
![]() ![]() Posts: 16
Karma: 148
Join Date: Apr 2010
Device: iPad, NOOK, Kindle, Kobo
|
Thanks, Kevin! This is so helpful.
Can you confirm that the only thing mobi_unpack does is show what was in the mobi file? It doesn't generate anything, right? When I convert an EPUB file to mobi with KindleGen2, and then unpack it with your latest version of mobi_unpack, I get a folder that contains a smaller version of the EPUB file than the original, an HTML file with what looks like the contents of the entire book, along with an ncx and opf file, and a folder with reduced size images. Then, there's a K8 folder that contains a completely re-engineered set of files, all renamed, resized images, etc. of what was originally in my EPUB file. And then there's a kindlegensrc.zip file, that when unzipped, contains my original unaltered files. It all seems so excessive. thanks, Liz |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Can i rotate text and insert images in Mobi and EPUB? | JanGLi | Kindle Formats | 5 | 02-02-2013 04:16 PM |
PDF to Mobi with text and images | pocketsprocket | Kindle Formats | 7 | 05-21-2012 07:06 AM |
Mobi files - images | DWC | Introduce Yourself | 5 | 07-06-2011 01:43 AM |
pdf to mobi... creating images rather than text | Dumhed | Calibre | 5 | 11-06-2010 12:08 PM |
Transfer of images on text files | anirudh215 | 2 | 06-22-2009 09:28 AM |