11-03-2012, 03:43 PM | #436 |
BLAM!
Posts: 13,477
Karma: 26012492
Join Date: Jun 2010
Location: Paris, France
Device: Kindle 2i, 3g, 4, 5w, PW, PW2, PW5; Kobo H2O, Forma, Elipsa, Sage, C2E
|
Just a quick heads up: I'm using a trimmed down version of MobiUnpack in the latest K5 ScreenSavers hack . (I say trimmed down, because I only needed to extract the cover, so I chopped off everything I didn't need ).
It works surprisingly well (after a painful cross-compile of Python 2.7.3 >_<") so far, the only thing of notice I ran into was a MemoryError on the loadSection() of the last section. I looked at how Calibre was doing it, and saw that it wrapped it in a try/except block to catch OverflowError exceptions (and, indeed, a bit of good old printf debugging seems to point out that after looks like an overflow on the last section). I tweaked that a bit: Code:
@@ -267,7 +67,12 @@ def loadSection(self, section): before, after = self.sections[section:section+2] self.stream.seek(before) - return self.stream.read(after - before) + try: + return self.stream.read(after - before) + # This bombs out with a MemoryError on Kindle on the last section (where after overflows) + except (OverflowError, MemoryError): + self.stream.seek(before) + return self.stream.read() Last edited by NiLuJe; 11-03-2012 at 03:58 PM. |
11-03-2012, 04:54 PM | #437 | |
The Grand Mouse 高貴的老鼠
Posts: 71,463
Karma: 305784726
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
Quote:
|
|
Advert | |
|
11-03-2012, 05:06 PM | #438 |
BLAM!
Posts: 13,477
Karma: 26012492
Join Date: Jun 2010
Location: Paris, France
Device: Kindle 2i, 3g, 4, 5w, PW, PW2, PW5; Kobo H2O, Forma, Elipsa, Sage, C2E
|
@pdurrant: Thanks for the explanation (and good luck )!
|
11-06-2012, 12:26 PM | #439 | |
Sigil Developer
Posts: 7,569
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Quote:
That stream interface was only added to allow MobiUnpack to work inside Calibre before Calibre fully support KF8 style ebooks. It needed to handle both interfaces because what Calibre handed to mobiunpack might be a stream or a file depending on things. So feel free to take it out as Calibre no longer supports internal use of Calibre but you might need to add something then to the plugin interface code so that DiapDealer's MobiUnpack plugin continues to work if you run into any problems. Hope this helps explain why it was there. Take care, KevinH |
|
11-06-2012, 12:48 PM | #440 |
The Grand Mouse 高貴的老鼠
Posts: 71,463
Karma: 305784726
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
|
Advert | |
|
11-07-2012, 01:55 PM | #441 |
Junior Member
Posts: 2
Karma: 10
Join Date: Nov 2012
Location: New Jersey, USA
Device: Kindle Keyboard 3G
|
Hi, a DeDRM'd mobi7 that I unpacked has corrupted index entries in the HTML. The values in the idx:orth tags are byte sausage, with illegal control characters and everything:
<idx:orth value="^Ch^H\^H_"> (This is how emacs shows control characters.) The same thing happens whether I use the latest or older versions of mobiunpack. All the other data in the file seems fine. The charset is utf-8. Any idea what could be causing this? Thanks. |
11-07-2012, 10:38 PM | #442 |
Grand Sorcerer
Posts: 27,508
Karma: 193125762
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Support for dictionary type MOBIs (or anything with extensive use of idx:orth) has always been quite limited and very unreliable. While the text is fine (as you've discovered), the actual dictionary functionality is often broken when trying to rebuild the source.
|
11-08-2012, 06:54 AM | #443 |
The Grand Mouse 高貴的老鼠
Posts: 71,463
Karma: 305784726
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
I have uploaded MobiUnpack 0.59.
The main changes have been in the debug/dump code, which now identifies and dumps (in one form or another) every section in the file, as well as providing much more info on the Mobi headers and EXTH, hopefully replicating the the functionality of DumpMobiHeader in a reasonably nicely formatted way. There's still a lot to be done to make it really neat and tidy, but that will have to wait for another day. |
11-08-2012, 06:51 PM | #444 |
BLAM!
Posts: 13,477
Karma: 26012492
Join Date: Jun 2010
Location: Paris, France
Device: Kindle 2i, 3g, 4, 5w, PW, PW2, PW5; Kobo H2O, Forma, Elipsa, Sage, C2E
|
@pdurrant: And indeed, it now works properly without an ugly hack on the Kindle, thanks!
|
11-15-2012, 04:01 PM | #445 |
Grand Sorcerer
Posts: 27,508
Karma: 193125762
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I'm starting to get the idea that I'm chasing my own tail with regard to ensuring compliant OPF files.
I thought the escape method from the standard xml.sax library was working quite well on metadata items—and it is, in fact, converting all instances of '&' and '<' or '>' to xml compliant entities as it was intended. But I'm discovering that a lot of metadata out there (especially KF8 subjects/descriptions) seem to contain html entities. This, by itself, wouldn't pose a problem. The problem is that my xml escape method is dutifully whacking all the ampersands in those poor defenseless entities and turning them into gibberish, basically. So in one more attempt to overthink a process... enter the criminally underutilized (not to mention unsung) "unescape" method of Python's HTMLParser module. The unescape method first converts all entities that may be present in the data to their unicode character representations (OPF files are utf-8/16 by spec, afterall). Only then does the xml escape method fixup any stray ampersands and/or left/right angle brackets. All this rambling means that I have an updated mobi_opf.py script for you to consider, pdurrant. |
11-15-2012, 05:16 PM | #446 |
The Grand Mouse 高貴的老鼠
Posts: 71,463
Karma: 305784726
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
Thanks! I'll take a look as soon as I can.
|
11-25-2012, 11:53 AM | #447 |
Connoisseur
Posts: 75
Karma: 498122
Join Date: May 2010
Location: Europe
Device: Bookeen Cybook Gen3, Kindle 3, Kindle PW, Kindle Voyage
|
Hello adamselene,
I try to change the incorrectly set language code of a dictionary in order to make it work on the Paperwhite. I followed your script which runs - however, I get following "Error: Dictionary contains multiple inflection index sections, which is not yet supported". I assume this breaks the process? Do you know of another possibility to change the in/out language of a .mobi file? Thanks a lot! Last edited by miquele; 11-25-2012 at 12:10 PM. Reason: attachment |
11-25-2012, 05:23 PM | #448 | |
The Grand Mouse 高貴的老鼠
Posts: 71,463
Karma: 305784726
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
Quote:
|
|
12-10-2012, 06:08 PM | #449 |
Junior Member
Posts: 4
Karma: 10
Join Date: Jul 2012
Device: None
|
I'd like to contribute v060 if I could. What this version fixes:
-- Encoding chapter names in UTF-8. This fixes NCX and OPF files from being encoded in non UTF-8 encodings. -- From my test, chapter names with UTF-8 characters were not being written properly to the resulting .NCX file. This causes the file charset to be "unknown-8bit", and trying to parse these files would result in errors. This patch fixes this issue. I've attached the source. -- I'd also like to bring up the idea of setting up a git repository for this project(bitbucket.com or github.com). I'd love to keep contributing to this project, and I think this would not only make it easier for me and others to do so, but also help the author keep track of all versions. I'd be willing to set this up if anybody would like. |
12-11-2012, 09:16 AM | #450 |
Sigil Developer
Posts: 7,569
Karma: 5433388
Join Date: Nov 2009
Device: many
|
your changes
Hi,
Could you post a diff of your proposed changes? I have modified my tree with a number of other fixes (Amazon Page Break, fixes for div tables that have broken insert positions, fixes for ncx with broken insert positions, fixes for non-existent links to css files, fixes for hangs in debug mode, fixes for not properly describing CTOC sections in the section description output etc, addition of DiapDealer's opf output fixes for metadata that incorporate html tags, etc. As for hosting this project, we already have a google code project but it pretty well went unused after a short bit and development seemed to only continue here. I am planning a major clean-up of the code over the holiday break to hopefully simplify things and clean up little nits and things. Once we have all all of the patches and clean-up done, perhpas trying once again to create a shared repository might be a good idea. Thanks, KevinH |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Can i rotate text and insert images in Mobi and EPUB? | JanGLi | Kindle Formats | 5 | 02-02-2013 04:16 PM |
PDF to Mobi with text and images | pocketsprocket | Kindle Formats | 7 | 05-21-2012 07:06 AM |
Mobi files - images | DWC | Introduce Yourself | 5 | 07-06-2011 01:43 AM |
pdf to mobi... creating images rather than text | Dumhed | Calibre | 5 | 11-06-2010 12:08 PM |
Transfer of images on text files | anirudh215 | 2 | 06-22-2009 09:28 AM |