06-29-2014, 10:56 AM | #841 | |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi,
Yes, please go ahead and post just the new mobi_split.py code if there is a significant speed improvement. That way people can drop it in whatever version of KindleUnpack they currently use for testing purposes. I will take a look at the opf code and all of your new routines and let you know what I think. I'll be traveling for the next week and out of reach of the internet. So please take a shot at whatever approach you feel is best and I will do the same and we can compare approaches upon my return. Thanks, KevinH Quote:
Last edited by KevinH; 06-29-2014 at 11:00 AM. |
|
06-29-2014, 10:59 AM | #842 | |
Resident Curmudgeon
Posts: 73,974
Karma: 128903378
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
|
|
Advert | |
|
06-29-2014, 11:09 AM | #843 |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi,
As I said, it is not going to happen. The extra metadata will not hurt anything and shows what was in the original azw3 which helps when diagnosing new Kindlegen features. This tool is not really an azw3 to epub converter because it is not guaranteed to even generate an epub that meets spec. It is meant to unpack the AZW3/Mobi file so that modifications can be made, html/css code differences can be detected, etc, and then passed back thru kindlegen to create a new azw3/mobi. Any epub-like structure generated by KindleUnpack should be tested, and edited in Sigil (or any text editor) and fixed. During that process feel free to hack any unwanted metadata out. KevinH |
06-29-2014, 11:15 AM | #844 |
Grand Sorcerer
Posts: 27,549
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
As mentioned in my post, KindleUnpack's main goal is not (directly) about handy-dandy format shifting (nor is it about creating the sleekest ePubs). The metadata is staying. That's pretty-much all there is to say. You'll just have to delete it if it bugs you that badly.
|
06-29-2014, 03:19 PM | #845 |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Quick Questions?
Hi tkeo,
I have started studying the RESC, new OPF, and taglist code, and I had some questions: 1. It looks like there is a lot of code that is simply grabbing the original idref from the RESC section and then trying to make sure that none of that duplicates anything we use. Why is it important or at all useful to keep the idrefs from the RESC and re-use them in the new opf? Given we do not know the original file names, the original idrefs seem more than a bit meaningless. It would make the code much easier to follow and support if we simply ignore all of these original idrefs from the RESC and simply sequentially number ours as was done by KindleUnpack originally. That should simplify or eliminate the need for the code to get all of the skeleton/partinfo from k8proc, shouldn't it? 2. Can we please move parseK8RESC() out of mobi_opf and into k8resc so that the k8resc object encapsulates all RESC decoding. The opf routine can call into k8resc to get and add the extra metadata information as needed. We do not need to keep the distinction as to the source (it will either be from the mobi exth header or the RESC, and so it shouldn't matter). 3. Do we really need to use a full blown HTMLParser() and all of the additional regular expression code just to parse the RESC section? This seems overkill at best. The problem with HTMLParsers in general is that they are not robust and can easily freak out over improper bytes (i.e leave an imbedded null lying around in the html/xml file and watch them barf all over it). And it almost looks like your regular expression metadata parsing code is trying to act like a full blown HTMLParser of some sort instead of just extracting what you want. Perhaps this is the only way but I am not convinced yet. So isn't some way to simply walk the RESC data and extract what we want more simply with much less code in general? Edit: Please see my attached simple proof of concept code. 4. There really is way too much overlap in all of the code for the various use cases in mobi_opf. There really is no reason to split opf writing out for mobi7 vs mobi8 for epub 2. Our original routine handled that case just fine. So I envision one routine for both mobi7 and mobi8 epub 2 with calls out to shared support routines which we will pull out and identify much like you have already done. And a second separate routine for mobi8 epub3 with calls out to many of the same shared support routines. How does that sound? We want to keep KindleUnpack as simple and straightforward as possible with as little fluff as possible to make support and learning from the code easy to do. Please let me know what you think? ps.: As a ***very rough*** proof of concept I threw together a very simple program to parse the RESCXXXXX.dat files generated by KindleUnpack.py. It does not do anything other than parse the RESCXXXX.dat file. But it returns the prefix and path and a dictionary with all of the attributes and any related content. Once we throw out all of the main routine, and utf8 parsing of the command line nonsense, you will see it is quite small and easy to adapt as we see fit. So we simply walk the RESC file tag by tag checking for the tags and things we want to further process in some way and build the tables and things needed inside k8resc. For example, a list of skelids (which are kf8 skeleton numbers) and their page-properties could easily be generated on the fly, as could extracting the meta data into whatever form we want (epub 2 or epub 3) and storing it in the correct form in the k8resc object to be called for either in the main routine or in the opf as needed. Please run it on one of the RESCXXXXX.dat files from one of your more complicated epub3 based ebooks converted to mobi and let me know if you think this type of very simple approach will work for us. Thanks, KevinH Last edited by KevinH; 06-29-2014 at 08:14 PM. |
Advert | |
|
06-29-2014, 09:42 PM | #846 |
BLAM!
Posts: 13,477
Karma: 26012494
Join Date: Jun 2010
Location: Paris, France
Device: Kindle 2i, 3g, 4, 5w, PW, PW2, PW5; Kobo H2O, Forma, Elipsa, Sage, C2E
|
@Doitsu: Yup, it's a byproduct of the new 'PiP' chapter browsing since FW 5.4. Tap the lower left corner of the screen to toggle between Locations/Pages/Time Left in Chapter/Time Left In Book/Nothing .
Last edited by NiLuJe; 06-30-2014 at 12:33 PM. |
06-30-2014, 05:10 AM | #847 |
Grand Sorcerer
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
|
06-30-2014, 07:44 AM | #848 | ||||
Connoisseur
Posts: 94
Karma: 10
Join Date: Feb 2014
Location: Japan
Device: Kindle PaperWhite, Kobo Aura HD
|
Quote:
In addition, to get cover page creation condition needs skelid in RESC and part in k8proc. Quote:
Quote:
I have run your code. It's fine!! Please change to your code although more improvement of the code is required to parse RESC. Be careful to parse comments especially multi-line ones. At the start point, I have no knowledge what is necessary to retrieve from RESC, so I have made that.But now I know there is no necessity of it. I have once considered to change to use functions in mobi_taglist.py a little bit simpler than Metadata class. Quote:
But I think the main reason of the difference of the opinion between mine and yours is because I am a new comer and not familiar to the older versions of the KindleUnpack. So, please choose one as you like. Thanks, |
||||
06-30-2014, 08:17 AM | #849 |
Connoisseur
Posts: 94
Karma: 10
Join Date: Feb 2014
Location: Japan
Device: Kindle PaperWhite, Kobo Aura HD
|
faster mobi_split (preview)
Hi,
This is the faster version of mobi_split.py. It can be switched original code and modified one, by FAST_MODE constant. Below is the example of the improvement. test file: HDimage_test.mobi, 16MB including src. Attached in https://www.mobileread.com/forums/showpost.php?p=2851879&postcount=779 org mobi_split: mobi7 processing time 0.08s mobi_split: mobi8 processing time 0.24s modified mobi_split: mobi7 processing time 0.05s mobi_split: mobi8 processing time 0.06s Thanks, |
06-30-2014, 10:01 AM | #850 | |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi tkeo,
Quote:
That means the code that tries to generate uniqueness of idrefs is just a source of potential problems, and adds code complications and code size for no true added benefit. Please remove it unless you can demonstrate that using the idrefs actually improves functionality in some way. For the cover, you can parse the idref and then let opf give it our own unique idref with no extra code. Thanks, KevinH Last edited by KevinH; 06-30-2014 at 10:20 AM. |
|
06-30-2014, 10:19 AM | #851 | ||
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi tkeo,
Quote:
Edit: Ah, I see tricks can be played using a multiline comment to hide or invalidate a block of tags! So I will have to special case comments even more when parsing to grab everything up to the "-->" no matter how far ahead it is. If you have an RESCxxxxx.dat file that is very complicated, I would love to have it as a testcase. I won't need the entire book just the RESCxxxxx.dat file. Thanks. My idea is that we change the parseRESC code to use a loop with yield so we can then create our own RESC iterator. Then using one loop in k8resc we check each tag name sequentially that exists in the RESC, and process the meta data we need building up an epub2 or 3 version on the fly, find the cover info, get the spine attributes, and grab the skelids and any properties associated with them and store all of this away in the k8resc object for later retrieval. That should handle everything we need correct? Quote:
I am willing to do whatever you feel is best in the opf code as long as it reduces redundancies and creates simpler code overall. Thanks! KevinH Last edited by KevinH; 06-30-2014 at 11:08 AM. |
||
07-01-2014, 08:10 AM | #852 | |
Connoisseur
Posts: 94
Karma: 10
Join Date: Feb 2014
Location: Japan
Device: Kindle PaperWhite, Kobo Aura HD
|
Hi Kevin,
Quote:
Indeed they do not improve functionality, but is it not enough for reason to keep them? Thanks, |
|
07-01-2014, 09:16 AM | #853 | |||
Connoisseur
Posts: 94
Karma: 10
Join Date: Feb 2014
Location: Japan
Device: Kindle PaperWhite, Kobo Aura HD
|
Hi Kevin,
Quote:
Quote:
Could you use OrderedDict in order to keep attribues order? It will make easier to check bugs by comparing reconstructed opfs between KindleUnpack versions using diff. Quote:
I will reconsider this. But I have forgotten about auto detection of epub version at all. We might come to one opf generation function which calls sub-functions in the end. Thanks, Last edited by tkeo; 07-01-2014 at 09:29 AM. Reason: Failed to attach a file. |
|||
07-02-2014, 06:34 AM | #854 | |
Connoisseur
Posts: 94
Karma: 10
Join Date: Feb 2014
Location: Japan
Device: Kindle PaperWhite, Kobo Aura HD
|
Hi Kevin,
Quote:
I am sorry for bothering you. Have a good travel! tkeo |
|
07-02-2014, 08:00 AM | #855 |
Connoisseur
Posts: 94
Karma: 10
Join Date: Feb 2014
Location: Japan
Device: Kindle PaperWhite, Kobo Aura HD
|
I test larger mobi. Here is the result.
test file: about 50MB, 300images, nosouce org mobi_split: mobi7 processing time 0.70s mobi_split: mobi8 processing time 22.80s modified mobi_split: mobi7 processing time 0.56s mobi_split: mobi8 processing time 0.28s |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Can i rotate text and insert images in Mobi and EPUB? | JanGLi | Kindle Formats | 5 | 02-02-2013 04:16 PM |
PDF to Mobi with text and images | pocketsprocket | Kindle Formats | 7 | 05-21-2012 07:06 AM |
Mobi files - images | DWC | Introduce Yourself | 5 | 07-06-2011 01:43 AM |
pdf to mobi... creating images rather than text | Dumhed | Calibre | 5 | 11-06-2010 12:08 PM |
Transfer of images on text files | anirudh215 | 2 | 06-22-2009 09:28 AM |