Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 07-16-2011, 10:27 AM   #16
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,896
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
Quote:
Originally Posted by Japes View Post
I have the De-DRM plugin installed, so, if there IS DRM, it's automatically removed, so, I don't know for sure if it has DRM or not. Is there a way for me to tell?
If you try to open the original in Calibre's viewer before being added to calibre the viewer will tell you.

Quote:
Originally Posted by Japes View Post
And, yes, I can view the Mobi file in Calibre by double clicking it (it takes forever to open but it does open and, from a quick scan, it appears to look fine).
Just so I'm clear you are able to view the book in calibre's viewer by opening the file from within calibre? Or are you double clicking on the original downloaded file before you add it to calibre.

If you send me a link to your file in a PM I'll be glad to look at it.
DoctorOhh is offline   Reply With Quote
Old 07-16-2011, 01:22 PM   #17
Japes
Addict
Japes ought to be getting tired of karma fortunes by now.Japes ought to be getting tired of karma fortunes by now.Japes ought to be getting tired of karma fortunes by now.Japes ought to be getting tired of karma fortunes by now.Japes ought to be getting tired of karma fortunes by now.Japes ought to be getting tired of karma fortunes by now.Japes ought to be getting tired of karma fortunes by now.Japes ought to be getting tired of karma fortunes by now.Japes ought to be getting tired of karma fortunes by now.Japes ought to be getting tired of karma fortunes by now.Japes ought to be getting tired of karma fortunes by now.
 
Posts: 303
Karma: 1033852
Join Date: Jun 2011
Device: Sony PRS-350,Sony PRS-950,Pocketbook 360+,B&N Nook Simple Touch Reader
Kovid, can you please chime in here? I'm at a loss.
Japes is offline   Reply With Quote
Advert
Old 07-17-2011, 12:48 PM   #18
siebert
Developer
siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.
 
Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
Hi,

I'm not sure if calibre can convert the result to epub as the unpacked html might contain mobi specific tags, but you can try to unpack the mobi file with my version of mobiunpack found here: https://www.mobileread.com/forums/sho...9&postcount=72

As it can handle dictionaries whose source is hundreds of megabytes big, I'm pretty optimistic that it also works for this mobipocket file.

Ciao,
Steffen
siebert is offline   Reply With Quote
Old 07-17-2011, 05:30 PM   #19
Japes
Addict
Japes ought to be getting tired of karma fortunes by now.Japes ought to be getting tired of karma fortunes by now.Japes ought to be getting tired of karma fortunes by now.Japes ought to be getting tired of karma fortunes by now.Japes ought to be getting tired of karma fortunes by now.Japes ought to be getting tired of karma fortunes by now.Japes ought to be getting tired of karma fortunes by now.Japes ought to be getting tired of karma fortunes by now.Japes ought to be getting tired of karma fortunes by now.Japes ought to be getting tired of karma fortunes by now.Japes ought to be getting tired of karma fortunes by now.
 
Posts: 303
Karma: 1033852
Join Date: Jun 2011
Device: Sony PRS-350,Sony PRS-950,Pocketbook 360+,B&N Nook Simple Touch Reader
Thanks for this. Downloaded it. It extracts to a py file. How on earth do I use it?
Japes is offline   Reply With Quote
Old 07-17-2011, 05:34 PM   #20
siebert
Developer
siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.
 
Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
You need to install python 2.x (not 3.x!) from python.org (I'm using version 2.5.4, but a newer 2.x version might work, too).

Then copy the mobi file in the same directory as the mobiunpack.py script and run from a shell/cmd window:

python mobiunpack.py file.mobi

Ciao,
Steffen
siebert is offline   Reply With Quote
Advert
Old 07-18-2011, 01:57 AM   #21
travger
Evangelist
travger ought to be getting tired of karma fortunes by now.travger ought to be getting tired of karma fortunes by now.travger ought to be getting tired of karma fortunes by now.travger ought to be getting tired of karma fortunes by now.travger ought to be getting tired of karma fortunes by now.travger ought to be getting tired of karma fortunes by now.travger ought to be getting tired of karma fortunes by now.travger ought to be getting tired of karma fortunes by now.travger ought to be getting tired of karma fortunes by now.travger ought to be getting tired of karma fortunes by now.travger ought to be getting tired of karma fortunes by now.
 
travger's Avatar
 
Posts: 485
Karma: 270594
Join Date: Aug 2010
Device: palm tx, Windows7, Galaxy A5
Somewhere is a nice thingy named mobiunpack.pyw, it gives you friendlier GUI to use. (I only shudder at cmd line).
travger is offline   Reply With Quote
Old 07-18-2011, 03:57 AM   #22
siebert
Developer
siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.
 
Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
Quote:
Originally Posted by travger View Post
Somewhere is a nice thingy named mobiunpack.pyw, it gives you friendlier GUI to use.
I never tried that, but as it is just some GUI calling the actual mobiunpack.py for the unpacking, it should work if you make sure that it uses my mobiunpack.py instead of the delivered one, otherwise you won't get dictionary support nor the speed optimization for huge files.

Quote:
(I only shudder at cmd line).
No pain, no gain

Ciao,
Steffen
siebert is offline   Reply With Quote
Old 07-18-2011, 12:04 PM   #23
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,761
Karma: 5706256
Join Date: Nov 2009
Device: many
Hi Steffen,

Quote:
Originally Posted by siebert View Post
I never tried that, but as it is just some GUI calling the actual mobiunpack.py for the unpacking, it should work if you make sure that it uses my mobiunpack.py instead of the delivered one, otherwise you won't get dictionary support nor the speed optimization for huge files.
Steffen
I just wanted to say very nice job with your new mobiunpack.py version!

I diffed your speedup changes against the original and all looks great except for one thing, why did you remove the imghdr code that detects the proper image type so that it creates a file with the proper extension? I, for one, want all of my file extensions to match the actual contents of the file because not every program ignores the extension when working with files. Are you using fake "image" files to store extra sections (non-html, non-image) from the original mobi file? Perhaps index information from the dictionaries?

Also, it would be nice to grab all of the string concats and file writes into one function that passes in the "big-file" flag, and new data and handles it, just to make the code look cleaner.

That said, I find it hard to think that even a 26 meg mobi file fills up memory in todays multi gb machines. It might simply be that the string concatenation needs to be replaced with simply adding string pieces to a list and then doing a "".join(list) at the end. This should prevent the creation of multiple copies of the 26 meg long string which is what must be filling up memory. Perhaps because the garbage collection is not aggressive enough to reclaim and reuse it in a timely fashion? ... at least older version of python used to recommend that for heavy string concats.

Thanks again for all of your work on it.

Take care,

KevinH
KevinH is offline   Reply With Quote
Old 07-18-2011, 02:15 PM   #24
siebert
Developer
siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.
 
Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
Quote:
Originally Posted by KevinH View Post
Hi Steffen,
I just wanted to say very nice job with your new mobiunpack.py version!
Thanks

Quote:
I diffed your speedup changes against the original and all looks great except for one thing, why did you remove the imghdr code that detects the proper image type so that it creates a file with the proper extension?
It's one of the speed optimization things. My image handling is generic, if I have a reference in the html for the image stored in section x, I don't have to look up the file but the name is just 0000y.jpg, where y is calculated from x.

Quote:
Are you using fake "image" files to store extra sections (non-html, non-image) from the original mobi file? Perhaps index information from the dictionaries?
No, only images are needed. In the mobiunpack version I've published some non-image sections will be written as image files, but they won't be referenced by the html source, so it doesn't matter.

I have an improved version which is not yet published which detects and ignores these non-images, but that is only for cosmetic reasons.

Quote:
Also, it would be nice to grab all of the string concats and file writes into one function that passes in the "big-file" flag, and new data and handles it, just to make the code look cleaner.
Feel free to provide an improved version

Quote:
That said, I find it hard to think that even a 26 meg mobi file fills up memory in todays multi gb machines.
First of all the 20MB dictionary mobi file uncompresses into 100MB html text.

And into this 100MB thousands of strings have to be inserted all over the html text, which means for each insert all the 100MB of data must be copied at least once.

Even the decompression of the compressed texts is much faster if I append each block to a temporary disk file instead of handling everything in memory.

Quote:
It might simply be that the string concatenation needs to be replaced with simply adding string pieces to a list and then doing a "".join(list) at the end.
Maybe, I didn't test that, but I doubt it would be as fast. But feel free to do a test...

Ciao,
Steffen
siebert is offline   Reply With Quote
Old 07-18-2011, 02:33 PM   #25
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,761
Karma: 5706256
Join Date: Nov 2009
Device: many
Hi Steffen,

> Feel free to provide an improved version

Thanks, I already helped write the original version you adapted and my interest is in the additional code that converts the old mobi raw html into normal html for archival purposes. So having the extensions on the images is useful. I will add that back in.

> First of all the 20MB dictionary mobi file uncompresses into 100MB html text.

> And into this 100MB thousands of strings have to be inserted all over the html text, which means for each insert all the 100MB of data must be copied at least once.

Or as I said, we could try using lists of string segments and inserting segments into position via list insertion and then doing a join to put it all together.

If that works, then I will rewrite it that way, if not I will pull all of the pieces that do the write to a file versus concatenating strings into a separate function to clean the code up and make it more readable.

> Even the decompression of the compressed texts is much faster if I append each block to a temporary disk file instead of handling everything in memory.

Good point.

> But feel free to do a test...

Will do.

Thanks,

Kevin
KevinH is offline   Reply With Quote
Old 07-18-2011, 05:18 PM   #26
siebert
Developer
siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.
 
Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
Quote:
Originally Posted by KevinH View Post
Thanks, I already helped write the original version you adapted and my interest is in the additional code that converts the old mobi raw html into normal html for archival purposes. So having the extensions on the images is useful. I will add that back in.
Ok. I've just published my latest changes so you can start from there. To ease development I've also pushed my git repository to github. If you're used to git, just fork my repository and start coding.

See https://www.mobileread.com/forums/sho...7&postcount=75 for details.

Ciao,
Steffen
siebert is offline   Reply With Quote
Old 07-18-2011, 05:51 PM   #27
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,761
Karma: 5706256
Join Date: Nov 2009
Device: many
Hi Steffen,

Thanks! I will grab it and start from there. Not much of a git user (the Linux kernel started using git long after I stopped contributing to the Linux PPC port). Used just about everything else at one point or another rcs, cvs, svn, hg, etc. Guess I should at least play around with git.

Paul and I had already set up a google code page for the earlier versions

http://code.google.com/p/ebook-conversion-tools/

but only Paul was adding much to it lately.

Take care,

Kevin

Quote:
Originally Posted by siebert View Post
Ok. I've just published my latest changes so you can start from there. To ease development I've also pushed my git repository to github. If you're used to git, just fork my repository and start coding.

See https://www.mobileread.com/forums/sho...7&postcount=75 for details.

Ciao,
Steffen
KevinH is offline   Reply With Quote
Old 07-19-2011, 12:14 AM   #28
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,761
Karma: 5706256
Join Date: Nov 2009
Device: many
Hi Steffen,

I made all of the code changes and created a FastConcat class that hides all of the hugeFile temp file creation and string lists appending. It is simple to use and it uses the python tempfile module.

fc = FastConcat(hugeFile)
...
fc.concat(data)
...
fc.getresult()

That all seemed to work fine. Then I reverted your image file name extension changes and now I can see why you decided to ignore the file extensions on images! ;-)

Your approach allows you to update all image links with one regular expressions substitution which is much faster than doing one for each image.

I had one old dictionary to play/test with and it unfortunately uses the older unsupported inflection rules but it did let me play with things and it used over 9000 gifs and jpegs.

It would indeed take a very long run time to process all of those image links one by one.

So I will have to try something else to speed it up. When I get a workable solution, I will post it.
KevinH is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Unable to convert to MOBI chota Conversion 7 03-06-2011 06:19 AM
PRS-700 Unable to convert pdf to other formats (epub/rtf/doc) testndtv Sony Reader 1 09-24-2010 01:45 PM
Convert .prc / .mobi to epub goldberry Calibre 3 09-12-2010 03:56 PM
Unable to convert RTF files to ePub Chrysanthemum Calibre 14 07-07-2010 01:57 PM
Unable Convert Gutenberg TXT to Mobi ascherjim Calibre 4 06-23-2009 08:55 AM


All times are GMT -4. The time now is 07:29 PM.


MobileRead.com is a privately owned, operated and funded community.