Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Kindle Formats

Notices

Reply
 
Thread Tools Search this Thread
Old 05-25-2014, 08:04 AM   #46
AcidWeb
KCC Co-Author
AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.
 
AcidWeb's Avatar
 
Posts: 845
Karma: 765434
Join Date: Mar 2013
Location: Poland
Device: Kindle Oasis 2
Well @KevinH. There is small problem.

Your code work correctly but only on 64bit Python. On 32bit release when input file have more than ~300MB I'm getting MemoryError exception here.

Not sure even why :-S It should not hit 32bit memory limits.
AcidWeb is offline   Reply With Quote
Old 05-25-2014, 08:25 PM   #47
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi,
I am not sure either. Please verify this problem exists with the original v003 script as well. It builds the datalst using append and not allocating all 3 pieces at once.

If it exists with my original script, please post a link where I can download such a huge comic book and try myself to see if we can delete objects more aggressively to free up memory.

Kevin

Quote:
Originally Posted by AcidWeb View Post
Well @KevinH. There is small problem.

Your code work correctly but only on 64bit Python. On 32bit release when input file have more than ~300MB I'm getting MemoryError exception here.

Not sure even why :-S It should not hit 32bit memory limits.
KevinH is online now   Reply With Quote
Advert
Old 05-26-2014, 12:45 AM   #48
AcidWeb
KCC Co-Author
AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.
 
AcidWeb's Avatar
 
Posts: 845
Karma: 765434
Join Date: Mar 2013
Location: Poland
Device: Kindle Oasis 2
Well I made some additional tests and results are ever more puzzling.

If I extract my Python3 version of your code and run standalone - It run correctly.
If I extract my Python3 version of your code and run standalone as QRunnable thread (Like my program) - It run correctly.
If I run it as QRunnable thread from my program - MemoryError.
If I run it from my program main worker QThread - MemoryError.

As we can see apparently that is not directly connected to your code. Either way debugging that will be pain :-)

Thank you.

Last edited by AcidWeb; 05-26-2014 at 12:50 AM.
AcidWeb is offline   Reply With Quote
Old 05-26-2014, 11:24 AM   #49
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi AcidWeb,

Threads are typically allocated with their own max stack allocation. I am not sure whether python objects are allocated on the stack or the heap at run time and if that changes when objects are "returned". Also for "returned" objects does it matter if they are named objects or auto temp allocated/deallocated objects?

So as a workaround, why not spawn a full subprocess (fork) from a thread and just wait for it to finish to collect the output? That should allocate an entire new process (not just a thread) but still allow you to have full concurrency.

My 2 cents ...

Anyway, Glad it is not me tracking it down! ;-)

Take care,

KevinH




Quote:
Originally Posted by AcidWeb View Post
Well I made some additional tests and results are ever more puzzling.

If I extract my Python3 version of your code and run standalone - It run correctly.
If I extract my Python3 version of your code and run standalone as QRunnable thread (Like my program) - It run correctly.
If I run it as QRunnable thread from my program - MemoryError.
If I run it from my program main worker QThread - MemoryError.

As we can see apparently that is not directly connected to your code. Either way debugging that will be pain :-)

Thank you.
KevinH is online now   Reply With Quote
Old 05-26-2014, 11:38 AM   #50
AcidWeb
KCC Co-Author
AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.
 
AcidWeb's Avatar
 
Posts: 845
Karma: 765434
Join Date: Mar 2013
Location: Poland
Device: Kindle Oasis 2
It is not fault of QT. Code run through generic threading.Thread crash too.
AcidWeb is offline   Reply With Quote
Advert
Old 05-26-2014, 12:16 PM   #51
AcidWeb
KCC Co-Author
AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.
 
AcidWeb's Avatar
 
Posts: 845
Karma: 765434
Join Date: Mar 2013
Location: Poland
Device: Kindle Oasis 2
Heh. Not threads are cause of problem. Is is even more strange.

Look on this snippet. X.mobi have 400mb.
Quote:
import os
import sys
import argparse
import configparser
#from tkinter import Tk, ttk, filedialog
from threading import Thread
from KindleButler import DualMetaFix

ready_file = DualMetaFix.DualMobiMetaFix("D:\X.mobi", bytes('12345', 'UTF-8'))
exit(0)
It works. But if I uncomment tkinter it start to crash with MemoryError.
Importing any bigger third party library is starting to crash program (still only on 32bit Python).
I would say that is something wrong with my Python enviroment - but I replicated that on two machines.

Last edited by AcidWeb; 05-26-2014 at 12:24 PM.
AcidWeb is offline   Reply With Quote
Old 05-26-2014, 02:25 PM   #52
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi,

Be careful you are not mixing threading systems. Tk uses TCL which on many platforms has its own threading library. I know on Mac OS X, there was a horrible conflict between tcl threads and true Mac OS X (Mach-kernel) threads, and then normal posix threads. It can also cause problems to spawn main loops in Tk from threads that are not main themselves.

Also, make sure you are using a version of Tk/TCL that is specifically compiled for your version of Python (you seem to be using version 3.X and not 2.7). A good place to get the latest TCL is from ActiveState (free community addition).

One thing to note, be careful tracking down "errors" or "bugs" when running out of memory or memory corruption occurs as it will often give you false positives.

Have fun!

KevinH
KevinH is online now   Reply With Quote
Old 05-26-2014, 03:03 PM   #53
AcidWeb
KCC Co-Author
AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.
 
AcidWeb's Avatar
 
Posts: 845
Karma: 765434
Join Date: Mar 2013
Location: Poland
Device: Kindle Oasis 2
I highly doubt that is Tk fault. If I import Paramiko or Pillow it also start MemoryError crash.
Also I stopped using threads in that code at all.

As long I don't import any bigger library DualMetaFix work correctly.

Last edited by AcidWeb; 05-26-2014 at 03:05 PM.
AcidWeb is offline   Reply With Quote
Old 05-26-2014, 03:14 PM   #54
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi,

Can you run it in its own process and watch how memory is allocated to see just how large it gets. Perhaps I have done something stupid and unused memory is not being collected/freed properly?

Wow, this is a tough one.

KevinH
KevinH is online now   Reply With Quote
Old 05-27-2014, 04:20 AM   #55
AcidWeb
KCC Co-Author
AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.
 
AcidWeb's Avatar
 
Posts: 845
Karma: 765434
Join Date: Mar 2013
Location: Poland
Device: Kindle Oasis 2
Input file = 409MB

Memory usage before:
Quote:
datalst = [datain[0:secstart], secdata, datain[secend:]]
420.35546875

After that line:
819.8828125

And it crash line later on:
Quote:
datalst = b''.join(datalst)
On 64bit Python memory usage after that line is around 1225.

Using append don't impact memory usage.

EDIT:
Well after spending another 6h on tests now I'm quite sure that is not an error. It just use too much memory.
All these anomalies were caused by fact that standalone program was running very close to memory limit and success depended on the number of imports (lol!).

Both of headers are on beginning on the file? Why we loading entire file?

Last edited by AcidWeb; 05-27-2014 at 08:45 AM.
AcidWeb is offline   Reply With Quote
Old 05-27-2014, 01:54 PM   #56
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi,

Both headers are not at the beginning of the file. Typically the mobi6 header comes right after the palm section table, then there are lots of additional sections that hold all of the text of the file, all of the resources (fonts, images, resc section) and then a ncx index, flis, fcis, srcs sections, datp, etc and then a boundary section and then finally comes the mobi8 header, followed by its own text sections, and its indexes, and then a new boundary section containing a CONT section which is an HD Container with lots of HD images.

So to edit both headers you need to split the file at the headers and then recreate the entire file twice.

There really is no other way to deal with this unless you want to use file io to build the new version from smaller chunks and pieces which will be much slower than doing it in memory.

I will take a look at it when I get a free moment.

Kevin


Quote:
Originally Posted by AcidWeb View Post
Input file = 409MB

Memory usage before:

420.35546875

After that line:
819.8828125

And it crash line later on:

On 64bit Python memory usage after that line is around 1225.

Using append don't impact memory usage.

EDIT:
Well after spending another 6h on tests now I'm quite sure that is not an error. It just use too much memory.
All these anomalies were caused by fact that standalone program was running very close to memory limit and success depended on the number of imports (lol!).

Both of headers are on beginning on the file? Why we loading entire file?
KevinH is online now   Reply With Quote
Old 05-27-2014, 03:07 PM   #57
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi,

Looking more closely, we could mmap the file and in that way create the equivalent of a mutable string so we would not have the issues with having multiple copies of the data at the same time.

Using mmap should keep memory usage quite close to the original 400 meg.

Alternatively, we can use direct access file io operations seek and read and write to build the output file on the fly reading it in in small chunks and writing it out as we go.

Either approach would eliminate the need to deal with the memory allocation and deallocation of python's immutable strings.

Do a google search on python and mmap
or on python and random/direct access files using seek

I personally think that using mmap would be fastest and easiest with 1X type memory usage (ie. kept around 400 meg for this file) but that using fileio approaches would have the smallest memory footprint but would be slower.

Let me know what you think.

KevinH
KevinH is online now   Reply With Quote
Old 05-27-2014, 04:08 PM   #58
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
dualmetafix_mmap.py

Hi AcidWeb,

Attached is a quick and dirty revision of my original dualmetafix.py to use mmap. I called it dualmetafix_mmap.py. It passes my check with a small sample file.

But Please check its memory usage against your 400 Meg file and see how bad it gets. It should stay near just a few meg over the file size. If not, we should probably move to complete fileio operations with seek and read/write in chunks.

Please let me know what you see.

KevinH
Attached Files
File Type: zip dualmetafix_mmap.py.zip (3.8 KB, 376 views)
KevinH is online now   Reply With Quote
Old 05-27-2014, 04:28 PM   #59
AcidWeb
KCC Co-Author
AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.AcidWeb ought to be getting tired of karma fortunes by now.
 
AcidWeb's Avatar
 
Posts: 845
Karma: 765434
Join Date: Mar 2013
Location: Poland
Device: Kindle Oasis 2
You work fast :-)

I will check it out tomorrow.

EDIT:
It is working great. 32bit Python now can process even 650MB (KindleGen limit) MOBI files.
Memory usage coincides with your assumptions. Thank you very much. Really impressive work.

Last edited by AcidWeb; 05-28-2014 at 12:48 AM.
AcidWeb is offline   Reply With Quote
Old 06-10-2014, 02:45 PM   #60
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
@KevinH: Thanks for creating this very useful tool!

I might have found either a Mobi Meta Editor (MME) bug or a bug with your script. The script works great with files converted straight from the source with KindleGen, but somehow it doesn't seem to like Mobi files whose metadata section was regenerated by Mobi Meta Editor and/or already contains an ASIN or EBOK value. In these cases it fails with the following error message:
Code:
Error: add_exth: trimmed non-null bytes at end of section
Steps to reproduce this error:

1. Generate a .mobi file (KindleGen -dont_append_source Parrot.epub)
2. Open the generated .mobi file with MME, add an EXTH 113 ASIN value and save the new file.
3. Process the new file with your script.

Please find attached the original file (Parrot_orig.mobi) and the file processed by MME (Parrot_MME.mobi).
Attached Files
File Type: mobi Parrot_orig.mobi (36.2 KB, 195 views)
File Type: mobi Parrot_MME.mobi (28.2 KB, 207 views)
Doitsu is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Recommended settings to convert dual-column PDF to useable MOBI format Cephas Atheos Conversion 7 09-18-2012 07:32 AM
Insert metadata as page at start of book adds does not replace (mobi to mobi) linusnc Calibre 2 07-19-2012 03:54 PM
Update Mobi header/file metadata without doing a Mobi to Mobi conversion RecQuery Conversion 2 06-30-2012 11:43 AM
EPUB (CSS) tweaker app Loccy Conversion 9 01-23-2011 10:22 PM
Firefox Tweaker: Flexbeta FireTweaker XP Alexander Turcic Lounge 0 08-16-2004 04:51 AM


All times are GMT -4. The time now is 10:12 AM.


MobileRead.com is a privately owned, operated and funded community.