06-15-2014, 10:27 PM | #781 | |
Evangelist
Posts: 456
Karma: 1044878
Join Date: Apr 2009
Device: Kindle Paperwhite 4
|
Quote:
|
|
06-17-2014, 12:52 PM | #782 |
Sigil Developer
Posts: 7,654
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Announcing KindleUnpack_v071
Hi All,
Attached is a new version of KindleUnpack v071: New features include: - HDImages are now parsed and extracted. Ebook authors can choose to use them to manually replace non-HD images if they so desire (see the new HDImages folder) - kindlegen generated PAGE sections are now used to create a proper page-map.xml in the Mobi 8 section if present in the .mobi - experimental support for page-maps contained in associated APNX files Only for AZW3 (Mobi 8) ebooks - NOTE: Many apnx files are just arbitrary page start offsets and will therefore just confuse KindleUnpack. If the APNX was generated based on actual page start positions (with the proper id_tags) KindleUnpack stands a good chance of dealing with them (compare them against the printed book to see if they are real) - CONT Headers are now recognized and their associated EXTH metadata can be dumped (using the dump option). - KindleUnpack.pyw (Tk GUI for KindleUnpack) has been updated to allow passing in of optional apnx files - KindleUnpack_ReadMe.htm has also been updated with the new options - Improved Palm Section Maps/Descriptions in DUMP mode to reduce the number of unknown data dumps generated and hopefully allow new section types to be more easily detected in the future. Thanks to DiapDealer and Tkeo for testing and help debugging the new features. Please report any bugs or issues here and I will try to deal with them. There may be inadvertent breakage of older features due to the refactoring but hopefully all will be well. Co-Developers: I have completely refactored the code because the kindleunpack.py file was simply getting too cumbersome to deal with and could not be easily followed or read. There are now more associated mobi_*.py library files. Long routines have been split into more easily followed and understood pieces, etc. Given the large number of resulting changes, I will not be posting a full diff. That said, the code in kindleunpack.py is now hopefully much more readable and supportable. Tkeo: I have changed very little in the mobi_opf.py file so most of your epub3 changes will hopefully still apply with little to minor fixes. If not, let me know and I will help hand apply them. Tkeo: Also, I would like your help moving much of the RESC support code from kindleunpack.py back into mobi_k8resc.py by simply passing in k8proc (to prevent the needless copying over of structures stored in k8proc). Thanks! Last edited by KevinH; 06-17-2014 at 01:07 PM. Reason: fix typos |
06-17-2014, 01:18 PM | #783 |
Sigil Developer
Posts: 7,654
Karma: 5433388
Join Date: Nov 2009
Device: many
|
tkeo,
Since the bulk of the refactoring is now complete, I am ready to merge in your final epub 3 changes. Please let me know if my refactoring caused any trouble with your changes. If not, feel free to incorporate your latest epub3 support changes into v071 to create v071a and post it for testing. Hopefully, that will bring KindleUnpack up to date with everything we know about Kindle file formats. Thanks, KevinH Last edited by KevinH; 06-17-2014 at 01:23 PM. |
06-17-2014, 01:23 PM | #784 |
Sigil Developer
Posts: 7,654
Karma: 5433388
Join Date: Nov 2009
Device: many
|
KindleUnpack Co-Developers and Interested Parties,
There is still a lot we do not know in case anyone wants to jump in ... 1. the kindlegen generated CONT section (HD_CONTAINER) is actually a full Header of some sort with lots of unknown fields and its own EXTH section. I have added the code to dump the new EXTH section but the fields and what they mean are at unknown. 2. how to unpack an azk file generated for iPhone (it appears to a zip archive with a set of gzipped json objects (skeleton and fragments) and other pieces (similar to a azw3 skeleton and fragments?) 3. what an azw6 file is and how to unpack it I am hoping since they are paired with azw3 pieces, that azw6 files represent a set of HDImages store inside some kind of container (see the CONT section info above). But this is just a wild guess until we get our hands on one. So if anyone likes to reverse-engineer things, please take a shot at any of the above. Thanks, KevinH |
06-17-2014, 04:07 PM | #785 |
Grand Sorcerer
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
As to azk. I have placed some information in the wiki that should help in unpacking this file.
Dale |
06-18-2014, 08:11 AM | #786 |
Connoisseur
Posts: 94
Karma: 10
Join Date: Feb 2014
Location: Japan
Device: Kindle PaperWhite, Kobo Aura HD
|
Bug fixes for v0.71
Hi Kevin,
Firstly, I would like to appreciate updating KindleUnpack with new features. I will modify my epub3 supported version to fit to the the newer version. I have found and fixed bugs. (I have not changed the version since it is on the experimental stage.) Thanks, tkeo |
06-18-2014, 08:43 AM | #787 |
Connoisseur
Posts: 94
Karma: 10
Join Date: Feb 2014
Location: Japan
Device: Kindle PaperWhite, Kobo Aura HD
|
About refactoring
Kevin,
I would like to ask you for a modification about refactoring. Could you allow me to remove the imgnames pamareter and change the return value from imgnames to imginfo whose structure is [dir, imgname, type, secno, dataoffset, data(=None)], to functions listed below, and appending it in process_all_mobi_headers(), orto change the parameter from imgnames to imglist(= list of the imginfo)? The list of functions to modify: processSRCS(), processPAGE(), processCMET(), processFONT(), pocessCRES(), processCONT(), processkind(), processRESC(), processImage(). Because I am considering to move all calling of write() except for DUMP into process_all_mobi_headers() in order to make easier to understand and writing files to mobi8 folder directly instead of copying files from mobi7 folder, in addition, creating epub files from the imglist. I think it will make easy to support HD images. Since to make the epub for the HD images, recreating XHTML files or renaming the file names of the HD images are required. I attach the newest preview version I have, as a reference. Thanks, |
06-18-2014, 09:48 AM | #788 | |
Sigil Developer
Posts: 7,654
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi tkeo,
Great work! It seems My refactoring had broken a number of things. I will apply your patch and release v072 asap. Thanks, Kevin Quote:
|
|
06-18-2014, 10:06 AM | #789 | |
Sigil Developer
Posts: 7,654
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi tkeo,
If we did we should probably have to rename it to resource_info since it would need to store fonts, images, HDImages, possibly your RESC, and also the pageMap info as well. Also, I don't think we should be passing around lists with the actual data in it. Image and font data can be quite big, especially when all we need is the name and the type and where it is stored. If you don't like storing them in the mobi7 folder first then, I guess I don't understand why we can't simply write the files to a neutral location as we read them. Perhaps a base Images/ and HDImages/ and then in processMobiX put them in the proper location? Also for the mobi 8 we want to create both an epub file and leave it unpacked in place so that users can see what is there more easily. I will take a look at your v067 code to get a better idea of how you are using it. Thanks, Kevin Quote:
|
|
06-18-2014, 11:20 AM | #790 |
Sigil Developer
Posts: 7,654
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Bug Fix Release: KindleUnpack_v072
Hi All;,
Yes my refactoring had broken a number of things which tkeo has caught and fixed! So here is a bug fix release KindleUnpack_v072a Bugs Fixed by tkeo and DiapDealer - Print Replicas should now work again - RESC section processing should now work again - Bug fix for page-map processing encodings - obfuscating/mangling of previously obfuscated fonts should now work again Attached is KindleUnpack_v072a.zip . Last edited by KevinH; 06-18-2014 at 03:12 PM. Reason: Updated to version 0.72a (with fix by DiapDealer) |
06-18-2014, 12:22 PM | #791 | |
Sigil Developer
Posts: 7,654
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi tkeo,
I looked at your patch from v067 to add epub3 support and your comments. I have a few things that I don't understand and therefore would like to discuss: - why do you want to move writing of files back to process_all_mobi_headers? In general, an object should know how and where it should write itself. Therefore, NAV, OPF, NCX etc should all know where and how to create themselves from the passed in data and the "files" object. In fact, my inclination is to move the writing of the text/html files out of even processMobiX and out to header specific routines. - your partslist[] simply duplicates much of what is in k8proc so I don't understand why it is needed. We can simply pass the k8proc object along if you need access to that information. - you seem to duplicate information from k8proc to pull into k8resc. It would simply be easier to pass in the k8proc object is and where you need that information. - your datalist[] simply duplicates much of what imgnames is used for and you never store any raw data in that list anyway. Why is the data offset information there if the data itself is never stored? Why do you need the section number? The directory and type information can easily be deduced from the file name extension, and metadata for (offsets). - why do we need to know the width and height of the images? Is this needed to create svg based cover pages? In general, I think it is easy in the opf to know where something is when we know what name and extension it has because we pass in our unpack structure files object and imgnames list. Now if you really think you must have additional information, lets focus on reusing the data structures in k8proc for the text, css, and svg pieces and not add or use partslist[]. We could also simply rename and imgnames to resource_names or something similar since they can deal with fonts and the like, although a simple directory listing of the output file structure can tell you everything you need to know as well. At minimum, resource_names would need, filename with extension (which is what it has now) and from the extension, the opf would know where it should be located and what it is. But if you need more we should go for something as minimal as possible and not reinvent yet another data structure to pass around. From what I could see I simply do not think we need the section number, data offset, or data itself, ever. So let's figure out the very minimum needed and use that. - Also do we really need to rename the cover image as "cover"? - Also Can't we simply use the EmptyImagePlaceholders and the kindleembed string from the "kind" section and info from CONT metadata to overwrite the the corresponding non-HD image file with the correct HDImage but keep its original name so nothing in the OPF needs to care. The CONT and following sections up to the container boundary is a simple one-to-one mapping from the first image to the last HDImage with empty place holders being used to indicate images that do not have HD replacements (so they only need to keep track of things up to the last HDImage. Please let me know what you think. We can discuss this further via personal messaging on this site so as to not spam the list if you so desire. Take care and Thanks for all of your hard work! KevinH Quote:
|
|
06-18-2014, 12:42 PM | #792 |
Resident Curmudgeon
Posts: 74,037
Karma: 129333114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
But do we really need or even want ePub 3 support given that most devices we use don't support ePub 3. Will it still generate ePub 2 as well as ePub 3 or would be be stuck with ePub 3 only?
|
06-18-2014, 12:44 PM | #793 |
Resident Curmudgeon
Posts: 74,037
Karma: 129333114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Could we have it so fonts are just left unobfuscated when all is done?
|
06-18-2014, 12:53 PM | #794 |
The Grand Mouse 高貴的老鼠
Posts: 71,511
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
|
06-18-2014, 02:11 PM | #795 |
Sigil Developer
Posts: 7,654
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi,
Yes, tkeo's epub3 changes allows the user to select epub2 or epub3 or even allow it to auto select based on features. So no worries there. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Can i rotate text and insert images in Mobi and EPUB? | JanGLi | Kindle Formats | 5 | 02-02-2013 04:16 PM |
PDF to Mobi with text and images | pocketsprocket | Kindle Formats | 7 | 05-21-2012 07:06 AM |
Mobi files - images | DWC | Introduce Yourself | 5 | 07-06-2011 01:43 AM |
pdf to mobi... creating images rather than text | Dumhed | Calibre | 5 | 11-06-2010 12:08 PM |
Transfer of images on text files | anirudh215 | 2 | 06-22-2009 09:28 AM |