05-25-2014, 08:04 AM | #46 |
KCC Co-Author
Posts: 845
Karma: 765434
Join Date: Mar 2013
Location: Poland
Device: Kindle Oasis 2
|
Well @KevinH. There is small problem.
Your code work correctly but only on 64bit Python. On 32bit release when input file have more than ~300MB I'm getting MemoryError exception here. Not sure even why :-S It should not hit 32bit memory limits. |
05-25-2014, 08:25 PM | #47 | |
Sigil Developer
Posts: 7,645
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi,
I am not sure either. Please verify this problem exists with the original v003 script as well. It builds the datalst using append and not allocating all 3 pieces at once. If it exists with my original script, please post a link where I can download such a huge comic book and try myself to see if we can delete objects more aggressively to free up memory. Kevin Quote:
|
|
05-26-2014, 12:45 AM | #48 |
KCC Co-Author
Posts: 845
Karma: 765434
Join Date: Mar 2013
Location: Poland
Device: Kindle Oasis 2
|
Well I made some additional tests and results are ever more puzzling.
If I extract my Python3 version of your code and run standalone - It run correctly. If I extract my Python3 version of your code and run standalone as QRunnable thread (Like my program) - It run correctly. If I run it as QRunnable thread from my program - MemoryError. If I run it from my program main worker QThread - MemoryError. As we can see apparently that is not directly connected to your code. Either way debugging that will be pain :-) Thank you. Last edited by AcidWeb; 05-26-2014 at 12:50 AM. |
05-26-2014, 11:24 AM | #49 | |
Sigil Developer
Posts: 7,645
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi AcidWeb,
Threads are typically allocated with their own max stack allocation. I am not sure whether python objects are allocated on the stack or the heap at run time and if that changes when objects are "returned". Also for "returned" objects does it matter if they are named objects or auto temp allocated/deallocated objects? So as a workaround, why not spawn a full subprocess (fork) from a thread and just wait for it to finish to collect the output? That should allocate an entire new process (not just a thread) but still allow you to have full concurrency. My 2 cents ... Anyway, Glad it is not me tracking it down! ;-) Take care, KevinH Quote:
|
|
05-26-2014, 11:38 AM | #50 |
KCC Co-Author
Posts: 845
Karma: 765434
Join Date: Mar 2013
Location: Poland
Device: Kindle Oasis 2
|
It is not fault of QT. Code run through generic threading.Thread crash too.
|
05-26-2014, 12:16 PM | #51 | |
KCC Co-Author
Posts: 845
Karma: 765434
Join Date: Mar 2013
Location: Poland
Device: Kindle Oasis 2
|
Heh. Not threads are cause of problem. Is is even more strange.
Look on this snippet. X.mobi have 400mb. Quote:
Importing any bigger third party library is starting to crash program (still only on 32bit Python). I would say that is something wrong with my Python enviroment - but I replicated that on two machines. Last edited by AcidWeb; 05-26-2014 at 12:24 PM. |
|
05-26-2014, 02:25 PM | #52 |
Sigil Developer
Posts: 7,645
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi,
Be careful you are not mixing threading systems. Tk uses TCL which on many platforms has its own threading library. I know on Mac OS X, there was a horrible conflict between tcl threads and true Mac OS X (Mach-kernel) threads, and then normal posix threads. It can also cause problems to spawn main loops in Tk from threads that are not main themselves. Also, make sure you are using a version of Tk/TCL that is specifically compiled for your version of Python (you seem to be using version 3.X and not 2.7). A good place to get the latest TCL is from ActiveState (free community addition). One thing to note, be careful tracking down "errors" or "bugs" when running out of memory or memory corruption occurs as it will often give you false positives. Have fun! KevinH |
05-26-2014, 03:03 PM | #53 |
KCC Co-Author
Posts: 845
Karma: 765434
Join Date: Mar 2013
Location: Poland
Device: Kindle Oasis 2
|
I highly doubt that is Tk fault. If I import Paramiko or Pillow it also start MemoryError crash.
Also I stopped using threads in that code at all. As long I don't import any bigger library DualMetaFix work correctly. Last edited by AcidWeb; 05-26-2014 at 03:05 PM. |
05-26-2014, 03:14 PM | #54 |
Sigil Developer
Posts: 7,645
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi,
Can you run it in its own process and watch how memory is allocated to see just how large it gets. Perhaps I have done something stupid and unused memory is not being collected/freed properly? Wow, this is a tough one. KevinH |
05-27-2014, 04:20 AM | #55 | ||
KCC Co-Author
Posts: 845
Karma: 765434
Join Date: Mar 2013
Location: Poland
Device: Kindle Oasis 2
|
Input file = 409MB
Memory usage before: Quote:
After that line: 819.8828125 And it crash line later on: Quote:
Using append don't impact memory usage. EDIT: Well after spending another 6h on tests now I'm quite sure that is not an error. It just use too much memory. All these anomalies were caused by fact that standalone program was running very close to memory limit and success depended on the number of imports (lol!). Both of headers are on beginning on the file? Why we loading entire file? Last edited by AcidWeb; 05-27-2014 at 08:45 AM. |
||
05-27-2014, 01:54 PM | #56 | |
Sigil Developer
Posts: 7,645
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi,
Both headers are not at the beginning of the file. Typically the mobi6 header comes right after the palm section table, then there are lots of additional sections that hold all of the text of the file, all of the resources (fonts, images, resc section) and then a ncx index, flis, fcis, srcs sections, datp, etc and then a boundary section and then finally comes the mobi8 header, followed by its own text sections, and its indexes, and then a new boundary section containing a CONT section which is an HD Container with lots of HD images. So to edit both headers you need to split the file at the headers and then recreate the entire file twice. There really is no other way to deal with this unless you want to use file io to build the new version from smaller chunks and pieces which will be much slower than doing it in memory. I will take a look at it when I get a free moment. Kevin Quote:
|
|
05-27-2014, 03:07 PM | #57 |
Sigil Developer
Posts: 7,645
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi,
Looking more closely, we could mmap the file and in that way create the equivalent of a mutable string so we would not have the issues with having multiple copies of the data at the same time. Using mmap should keep memory usage quite close to the original 400 meg. Alternatively, we can use direct access file io operations seek and read and write to build the output file on the fly reading it in in small chunks and writing it out as we go. Either approach would eliminate the need to deal with the memory allocation and deallocation of python's immutable strings. Do a google search on python and mmap or on python and random/direct access files using seek I personally think that using mmap would be fastest and easiest with 1X type memory usage (ie. kept around 400 meg for this file) but that using fileio approaches would have the smallest memory footprint but would be slower. Let me know what you think. KevinH |
05-27-2014, 04:08 PM | #58 |
Sigil Developer
Posts: 7,645
Karma: 5433388
Join Date: Nov 2009
Device: many
|
dualmetafix_mmap.py
Hi AcidWeb,
Attached is a quick and dirty revision of my original dualmetafix.py to use mmap. I called it dualmetafix_mmap.py. It passes my check with a small sample file. But Please check its memory usage against your 400 Meg file and see how bad it gets. It should stay near just a few meg over the file size. If not, we should probably move to complete fileio operations with seek and read/write in chunks. Please let me know what you see. KevinH |
05-27-2014, 04:28 PM | #59 |
KCC Co-Author
Posts: 845
Karma: 765434
Join Date: Mar 2013
Location: Poland
Device: Kindle Oasis 2
|
You work fast :-)
I will check it out tomorrow. EDIT: It is working great. 32bit Python now can process even 650MB (KindleGen limit) MOBI files. Memory usage coincides with your assumptions. Thank you very much. Really impressive work. Last edited by AcidWeb; 05-28-2014 at 12:48 AM. |
06-10-2014, 02:45 PM | #60 |
Grand Sorcerer
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
@KevinH: Thanks for creating this very useful tool!
I might have found either a Mobi Meta Editor (MME) bug or a bug with your script. The script works great with files converted straight from the source with KindleGen, but somehow it doesn't seem to like Mobi files whose metadata section was regenerated by Mobi Meta Editor and/or already contains an ASIN or EBOK value. In these cases it fails with the following error message: Code:
Error: add_exth: trimmed non-null bytes at end of section 1. Generate a .mobi file (KindleGen -dont_append_source Parrot.epub) 2. Open the generated .mobi file with MME, add an EXTH 113 ASIN value and save the new file. 3. Process the new file with your script. Please find attached the original file (Parrot_orig.mobi) and the file processed by MME (Parrot_MME.mobi). |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Recommended settings to convert dual-column PDF to useable MOBI format | Cephas Atheos | Conversion | 7 | 09-18-2012 07:32 AM |
Insert metadata as page at start of book adds does not replace (mobi to mobi) | linusnc | Calibre | 2 | 07-19-2012 03:54 PM |
Update Mobi header/file metadata without doing a Mobi to Mobi conversion | RecQuery | Conversion | 2 | 06-30-2012 11:43 AM |
EPUB (CSS) tweaker app | Loccy | Conversion | 9 | 01-23-2011 10:22 PM |
Firefox Tweaker: Flexbeta FireTweaker XP | Alexander Turcic | Lounge | 0 | 08-16-2004 04:51 AM |