![]() |
#16 | ||
US Navy, Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,896
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
|
Quote:
Quote:
If you send me a link to your file in a PM I'll be glad to look at it. |
||
![]() |
![]() |
![]() |
#17 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 303
Karma: 1033852
Join Date: Jun 2011
Device: Sony PRS-350,Sony PRS-950,Pocketbook 360+,B&N Nook Simple Touch Reader
|
Kovid, can you please chime in here? I'm at a loss.
|
![]() |
![]() |
Advert | |
|
![]() |
#18 |
Developer
![]() ![]() ![]() Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
|
Hi,
I'm not sure if calibre can convert the result to epub as the unpacked html might contain mobi specific tags, but you can try to unpack the mobi file with my version of mobiunpack found here: https://www.mobileread.com/forums/sho...9&postcount=72 As it can handle dictionaries whose source is hundreds of megabytes big, I'm pretty optimistic that it also works for this mobipocket file. Ciao, Steffen |
![]() |
![]() |
![]() |
#19 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 303
Karma: 1033852
Join Date: Jun 2011
Device: Sony PRS-350,Sony PRS-950,Pocketbook 360+,B&N Nook Simple Touch Reader
|
Thanks for this. Downloaded it. It extracts to a py file. How on earth do I use it?
|
![]() |
![]() |
![]() |
#20 |
Developer
![]() ![]() ![]() Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
|
You need to install python 2.x (not 3.x!) from python.org (I'm using version 2.5.4, but a newer 2.x version might work, too).
Then copy the mobi file in the same directory as the mobiunpack.py script and run from a shell/cmd window: python mobiunpack.py file.mobi Ciao, Steffen |
![]() |
![]() |
Advert | |
|
![]() |
#21 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 485
Karma: 270594
Join Date: Aug 2010
Device: palm tx, Windows7, Galaxy A5
|
Somewhere is a nice thingy named mobiunpack.pyw, it gives you friendlier GUI to use. (I only shudder at cmd line).
|
![]() |
![]() |
![]() |
#22 | ||
Developer
![]() ![]() ![]() Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
|
Quote:
Quote:
![]() Ciao, Steffen |
||
![]() |
![]() |
![]() |
#23 | |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,761
Karma: 5706256
Join Date: Nov 2009
Device: many
|
Hi Steffen,
Quote:
I diffed your speedup changes against the original and all looks great except for one thing, why did you remove the imghdr code that detects the proper image type so that it creates a file with the proper extension? I, for one, want all of my file extensions to match the actual contents of the file because not every program ignores the extension when working with files. Are you using fake "image" files to store extra sections (non-html, non-image) from the original mobi file? Perhaps index information from the dictionaries? Also, it would be nice to grab all of the string concats and file writes into one function that passes in the "big-file" flag, and new data and handles it, just to make the code look cleaner. That said, I find it hard to think that even a 26 meg mobi file fills up memory in todays multi gb machines. It might simply be that the string concatenation needs to be replaced with simply adding string pieces to a list and then doing a "".join(list) at the end. This should prevent the creation of multiple copies of the 26 meg long string which is what must be filling up memory. Perhaps because the garbage collection is not aggressive enough to reclaim and reuse it in a timely fashion? ... at least older version of python used to recommend that for heavy string concats. Thanks again for all of your work on it. Take care, KevinH |
|
![]() |
![]() |
![]() |
#24 | ||||||
Developer
![]() ![]() ![]() Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
|
Quote:
![]() Quote:
Quote:
I have an improved version which is not yet published which detects and ignores these non-images, but that is only for cosmetic reasons. Quote:
![]() Quote:
And into this 100MB thousands of strings have to be inserted all over the html text, which means for each insert all the 100MB of data must be copied at least once. Even the decompression of the compressed texts is much faster if I append each block to a temporary disk file instead of handling everything in memory. Quote:
Ciao, Steffen |
||||||
![]() |
![]() |
![]() |
#25 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,761
Karma: 5706256
Join Date: Nov 2009
Device: many
|
Hi Steffen,
> Feel free to provide an improved version ![]() Thanks, I already helped write the original version you adapted and my interest is in the additional code that converts the old mobi raw html into normal html for archival purposes. So having the extensions on the images is useful. I will add that back in. > First of all the 20MB dictionary mobi file uncompresses into 100MB html text. > And into this 100MB thousands of strings have to be inserted all over the html text, which means for each insert all the 100MB of data must be copied at least once. Or as I said, we could try using lists of string segments and inserting segments into position via list insertion and then doing a join to put it all together. If that works, then I will rewrite it that way, if not I will pull all of the pieces that do the write to a file versus concatenating strings into a separate function to clean the code up and make it more readable. > Even the decompression of the compressed texts is much faster if I append each block to a temporary disk file instead of handling everything in memory. Good point. > But feel free to do a test... Will do. Thanks, Kevin |
![]() |
![]() |
![]() |
#26 | |
Developer
![]() ![]() ![]() Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
|
Quote:
See https://www.mobileread.com/forums/sho...7&postcount=75 for details. Ciao, Steffen |
|
![]() |
![]() |
![]() |
#27 | |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,761
Karma: 5706256
Join Date: Nov 2009
Device: many
|
Hi Steffen,
Thanks! I will grab it and start from there. Not much of a git user (the Linux kernel started using git long after I stopped contributing to the Linux PPC port). Used just about everything else at one point or another rcs, cvs, svn, hg, etc. Guess I should at least play around with git. Paul and I had already set up a google code page for the earlier versions http://code.google.com/p/ebook-conversion-tools/ but only Paul was adding much to it lately. Take care, Kevin Quote:
|
|
![]() |
![]() |
![]() |
#28 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,761
Karma: 5706256
Join Date: Nov 2009
Device: many
|
Hi Steffen,
I made all of the code changes and created a FastConcat class that hides all of the hugeFile temp file creation and string lists appending. It is simple to use and it uses the python tempfile module. fc = FastConcat(hugeFile) ... fc.concat(data) ... fc.getresult() That all seemed to work fine. Then I reverted your image file name extension changes and now I can see why you decided to ignore the file extensions on images! ;-) Your approach allows you to update all image links with one regular expressions substitution which is much faster than doing one for each image. I had one old dictionary to play/test with and it unfortunately uses the older unsupported inflection rules but it did let me play with things and it used over 9000 gifs and jpegs. It would indeed take a very long run time to process all of those image links one by one. So I will have to try something else to speed it up. When I get a workable solution, I will post it. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Unable to convert to MOBI | chota | Conversion | 7 | 03-06-2011 06:19 AM |
PRS-700 Unable to convert pdf to other formats (epub/rtf/doc) | testndtv | Sony Reader | 1 | 09-24-2010 01:45 PM |
Convert .prc / .mobi to epub | goldberry | Calibre | 3 | 09-12-2010 03:56 PM |
Unable to convert RTF files to ePub | Chrysanthemum | Calibre | 14 | 07-07-2010 01:57 PM |
Unable Convert Gutenberg TXT to Mobi | ascherjim | Calibre | 4 | 06-23-2009 08:55 AM |