View Full Version : mobi2oeb


kovidgoyal
02-15-2008, 10:34 AM
After an 8-hour hackathon I'm happy to announce mobi2oeb. Converts (non DRMed) .mobi/.prc files to an exploded OEBPS ebook. Supports all three levels of compression in .mobi files. Part of libprs500 v0.4.37 (http://libprs500.kovidgoyal.net). To use


mobi2oeb book.mobi


This is an initial release, so expect bugs. Thanks to darkninja for the HUFF/CDIC decompression code.

HarryT
02-15-2008, 10:55 AM
Great! Thanks, Kovid; this will make it very easy to edit a MobiPocket book. Use your tool to explode to OEB, make the edits, then use Mobi Creator to rebuild it.

Ortep
02-16-2008, 10:25 AM
After an 8-hour hackathon I'm happy to announce mobi2oeb. Converts (non DRMed) .mobi/.prc files to an exploded OEBPS ebook. Supports all three levels of compression in .mobi files. Part of libprs500 v0.4.37 (http://libprs500.kovidgoyal.net).



Sounds great, But look what I got when I clicked on the link:


There is a problem with this website's security certificate.


The security certificate presented by this website was not issued by a trusted certificate authority.

Security certificate problems may indicate an attempt to fool you or intercept any data you send to the server.
We recommend that you close this webpage and do not continue to this website.
Click here to close this webpage.
Continue to this website (not recommended).
More information


If you arrived at this page by clicking a link, check the website address in the address bar to be sure that it is the address you were expecting.
When going to a website with an address such as https://example.com, try adding the 'www' to the address, https://www.example.com.
If you choose to ignore this error and continue, do not enter private information into the website.

For more information, see "Certificate Errors" in Internet Explorer Help.

Nate the great
02-16-2008, 10:40 AM
That's because he doesn't have the money to pay for a security certificate. His site has alway been that way. I just checked again; his site is still there. I canvouch for him.

Ortep
02-16-2008, 11:13 AM
Ok, thanks

We can't be to carefull these days :2thumbsup

FixB
02-16-2008, 11:30 AM
Thanks kovidgoyal !
Once again, your work helps us all so much !!

nrapallo
02-17-2008, 02:02 AM
After an 8-hour hackathon I'm happy to announce mobi2oeb. Converts (non DRMed) .mobi/.prc files to an exploded OEBPS ebook. Supports all three levels of compression in .mobi files. Part of libprs500 v0.4.37 (http://libprs500.kovidgoyal.net). To use


mobi2oeb book.mobi


This is an initial release, so expect bugs. Thanks to darkninja for the HUFF/CDIC decompression code.

I used my 'mobi2imp' (version 5) to output 'SpaceEncyclopedia.mobi' into OEBFF (.oeb) output. I use:mobi2imp --oeb 'SpaceEncyclopedia.mobi' Space

Can you check it against the output of mobi2oeb? Can you load my .oeb in mobicreator or use it with libprs500 utils?

Our (ebook format) worlds are crossing...

-Nick

kovidgoyal
02-17-2008, 12:03 PM
There are really only a handful of things to do in mobi->oeb conversion so I see no reason why your OEB output should be wrong. THey are
1) Read metadata from the EXTH header to create the .opf file
2) Decompress the text using the three possible compression chemes
3) Replace the filepos attributes
4) Replace the mobi specific tags like <mbp:pagebreak>
5) Extract the images and replace the <img recindex> tags

At the moment, the only not fullly implemented step is 4). The only mobi specific markup that mobi2oeb replaces is <mbp:pagebreak>

nrapallo
02-17-2008, 12:37 PM
There are really only a handful of things to do in mobi->oeb conversion so I see no reason why your OEB output should be wrong. THey are
1) Read metadata from the EXTH header to create the .opf file
2) Decompress the text using the three possible compression chemes
3) Replace the filepos attributes
4) Replace the mobi specific tags like <mbp:pagebreak>
5) Extract the images and replace the <img recindex> tags

At the moment, the only not fullly implemented step is 4). The only mobi specific markup that mobi2oeb replaces is <mbp:pagebreak>

Hey, this reads like psuedo-code and would be a great guide to 'rolling-your-own' program.

However, for me, this was all accomplished by using tompe's 'mobi2html' and making my .IMP specific changes to get 'mobi2imp'.

So, in the end, it appears we get the same result.

Cool!

-Nick

brecklundin
03-04-2008, 08:05 PM
kovid....kovid...kovid....awesome...thanks!!

here is the best I can offer in return:

http://brecklundin.com/tmp/fmaid.jpg

She is yours... ;)

kovidgoyal
03-04-2008, 08:24 PM
I appreciate the gesture, but I have to say I like 'em with a leetle more meat on the bones ;)

brecklundin
03-07-2008, 02:12 AM
your wish is our command oh great code breaker...

http://www.brecklundin.com/tmp/bouncygirl.gif

IceHand
03-07-2008, 10:41 AM
Nice work, thanks! One question though: is it normal that the exploded html file has only three lines? Line one is always "<html><head>" line two is "<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />" and line three is the rest. It's no problem to make some breaks with par, but the resulting html code is not very cleary arranged for manual editing.

llasram
03-07-2008, 11:56 AM
Nice work, thanks! One question though: is it normal that the exploded html file has only three lines? Line one is always "<html><head>" line two is "<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />" and line three is the rest. It's no problem to make some breaks with par, but the resulting html code is not very cleary arranged for manual editing.

All of the pre-.epub HTML-based e-book formats seems to do this – strip out all “unnecessary” whitespace to save space. ConvertLIT tries to fix this for LIT files by adding whitespace to the generated HTML, but it gets it wrong often enough to be troublesome. For adding whitespace and otherwise cleaning up grody HTML check out HTML Tidy (http://tidy.sourceforge.net/).

IceHand
03-07-2008, 12:33 PM
Thanks for the tip, but I already knew of HTML Tidy and it won't generate a cleaned up version if the source file has errors – which includes most exploded Mobipocket html files.

Anyway, I had a closer look at the html code and it seems that running a search and replace for "> <" with ">\n<" does the trick. Maybe an idea for the next mobi2oeb version?

kovidgoyal
03-07-2008, 04:20 PM
Thanks for the tip, but I already knew of HTML Tidy and it won't generate a cleaned up version if the source file has errors – which includes most exploded Mobipocket html files.

Anyway, I had a closer look at the html code and it seems that running a search and replace for "> <" with ">\n<" does the trick. Maybe an idea for the next mobi2oeb version?

That's not quite safe, what if you have something like

<font size=4>W</font><font size=2>ord</font>

IceHand
03-07-2008, 05:36 PM
That's not quite safe, what if you have something like

<font size=4>W</font><font size=2>ord</font>

Then nothing will happen for that line. It's >space< that would be replaced with >line break< which gives the same output.

>< with no space between should of course not be separated by a line break.

kovidgoyal
03-07-2008, 08:26 PM
Are there spaces in the output HTML? Seems odd there would be, if the creation tools are stripping unneeded whitespace characters.

IceHand
03-08-2008, 07:59 AM
Yes, there are. To me it doesn't look like that the creation tools are stripping unneeded whitespace characters, but rather like either they are converting line breaks to whitespaces (would seem odd to me, if they would do that) or the script used for exploding to html misinterprets line breaks as whitespaces (that's only a guess of course).

Here's a small sample output from mobi2oeb from a selfmade mobi file. Notice that whereever there is "> <" there should have been a line break between:

<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<guide></guide></head><body><br/><br/> <h1 align="center"><b>Book Title</b></h1> <br/> <h2 align="center">Author Name</h2> </body></html>

kovidgoyal
03-08-2008, 03:39 PM
OK will be in next release.

IceHand
03-09-2008, 07:50 AM
Great, thanks!

nrapallo
03-12-2008, 11:00 AM
As .oeb (an OEBFF container produced by eBook Publisher) is a 'generic' self-contained format, are there any tools you may have that convert from it to mobi format? to any format?

I'm adding .oeb as an output format to PDFRead and will soon release same as version 1.8 (with Ashish Kulkarni's permission). I wanted to easily allow mobipocket users to benefit from this addition as PDFRead does not presently support .prc output formats natively.

By the way, I also added .html as an output format to PDFRead which produces a .opf file along with this, so I guess opf2mobi or Mobipocket creator can do the trick for mobipocket users. However, I'm having issues with this as the .opf file produced is not mobipocket specific and breaks sometimes.

I prefer however a direct .oeb to .mobi tool, if there is one already.

kovidgoyal
03-12-2008, 01:19 PM
mobi2oeb doesn't actually produce an oeb file, which is really just a zipped up set of HTML + OPF files. You should be talking to tompe, the creator of opf2mobi.

IceHand
03-17-2008, 11:12 AM
Hm, that's strange – the resulting HTMLs from huffdic compressed e-books from HarperCollins are twice as big as they should be. I looked at the generated HTMLs and found out that after they should end, the books start again with the table of contents like:
Title page(s)
Table of Contents
Book content
Book ending
Table of Contents (this and the following content shouldn't be there)
Book content
Book ending

You can test this with the free e-book Flight of the Nighthawks (http://www.harpercollinsebooks.com/C0104965-F362-40EE-8053-ED2360F220F1/10/125/en/eos10?WT.mc_id=REFL_EOSBL_SHAMX10_011508) by Raymond E. Feist.

nrapallo
03-17-2008, 11:24 AM
Hm, that's strange – the resulting HTMLs from huffdic compressed e-books from HarperCollins are twice as big as they should be. I looked at the generated HTMLs and found out that after they should end, the books start again with the table of contents like:
Title page(s)
Table of Contents
Book content
Book ending
Table of Contents (this and the following content shouldn't be there)
Book content
Book ending

You can test this with the free e-book Flight of the Nighthawks (http://www.harpercollinsebooks.com/C0104965-F362-40EE-8053-ED2360F220F1/10/125/en/eos10?WT.mc_id=REFL_EOSBL_SHAMX10_011508) by Raymond E. Feist.


Actually, I got similar results (duplicated HTML code) using the Mobipocket sample file, SpaceEncyclopedia.mobi, which I believe uses standard compression.

llasram
03-17-2008, 01:33 PM
Actually, I got similar results (duplicated HTML code) using the Mobipocket sample file, SpaceEncyclopedia.mobi, which I believe uses standard compression.

kovidgoyal's got it fixed in svn (http://libprs500.kovidgoyal.net/ticket/565).

nrapallo
03-17-2008, 01:50 PM
kovidgoyal's got it fixed in svn (http://libprs500.kovidgoyal.net/ticket/565).

Thanks, good to know!

FizzyWater
11-15-2008, 12:22 AM
Did something change in mobi2oeb recently? It looks like it's now converting any HTML code using <i></i> to <span="italic"></span>.

Is there a way to turn that off? I looked at the User Guide and didn't find anything in the arguments.

The reason I ask is, I can't get my copy of Word (2002) to recognize "span" tags in HTML. So I lose all my italic formatting when I open the HTML output in Word.

Thanks!

kovidgoyal
11-15-2008, 02:21 AM
yeah it's changed, the reason is that mobipocket html nests <i> and <b> levels at arbitrary depths which causes problems with some HTML parsers. It cant be turned off.

FizzyWater
11-15-2008, 02:41 AM
Sigh.

Thanks, Kovid. I appreciate the answer. Not happy news (for me), but at least I know for sure.

There's still tompe's "mobi2html", for the time being, anyway.

blue223
03-16-2009, 06:10 PM
Sorry to bump this but can someone show me step by step how to use mobi2oeb? My ebook is now DRM free I just need to convert it to another format without all the gibberish...Any help would be appreciated

wallcraft
03-16-2009, 06:47 PM
If you want ePub or LRF, you can now use the Calbre GUI to do the conversion from MOBI.

If you want to "explode" the MOBI to HTML for later conversion to some other format, then Calibre's command line tool mobi2oeb (described here) may be your best option. It is very easy to use, from the command shell issue the command: mobi2oeb -o ebookname ebookname.mobi where "ebookname" is the filename and the "-o" option tells mobi2oeb to put the result in a subdirectory called ebookname. If you already have the MOBI in its own subdirectory you can leave off the "-o ebookname". Your ebook might also have the extension .prc, this is perfectly ok (just use .prc in place of .mobi).

For information about how to get a command prompt, see Command Prompt Vista/XP/Mac (http://wiki.mobileread.com/wiki/Command_Prompt_Vista/XP/Mac). You would then navigate to the directory containing the MOBI ebook using the cd (change directory) command. I generally copy the full directory path from a window open on the directory and paste it into the command window after the cd.

kevindorsey
03-16-2009, 08:38 PM
Thanks for the walkthorugh.

blue223
03-17-2009, 12:26 PM
Thanks! Only thing is, I think my ebook is too long, it's several thousand pages. I left mobi2oeb running for a while, and came back and I got this:

File "reader.py", line 598, in <module>
File "reader.py", line 598, in main
File "reader.py", line 196, in extract_content
File "re.pyo", line 150, in sub

MemoryError

Is my ebook just too long? I'm sooo close to getting this out of mobi format this is the last step left I think!

wallcraft
03-17-2009, 12:38 PM
Another option is mobi2html from MobiPerl (https://dev.mobileread.com/trac/mobiperl/wiki). There are binaries for Windows (you don't need to install Perl to run MobiPerl on Windows). The command line is similar to mobi2oeb. First copy mobi2html into your current directory (not necessary if it is already in your command path) and then issue the command: mobi2html ebookname.mobi ebookname This won't work if the ebook uses MobiPocket's high compression option though.

blue223
03-17-2009, 01:27 PM
I did that now and got this:

probably HUFFDIC_COMPRESSED - CANNOT BE DECOMPRESSED!!!

soo close, do I have any other options?

JSWolf
03-17-2009, 01:28 PM
Try using Calibre to convert your eBook to LIT, ePub, or LRF.

tompe
03-17-2009, 01:43 PM
I did that now and got this:

probably HUFFDIC_COMPRESSED - CANNOT BE DECOMPRESSED!!!

soo close, do I have any other options?

I think that maybe the Perl module on cpan (was the name ebook or) might support it. There should be a thread about this module here.

nrapallo
03-17-2009, 02:41 PM
I think that maybe the Perl module on cpan (was the name ebook or) might support it. There should be a thread about this module here.

No, the CPAN 'Ebook-tools' does not yet handle HuffDic compressed files, only standard compressed ones like MobiPerl. (Tommy, wouldn't it have been nice if the Perl code existed for that! :cool: )

However, MobiHuff.py may do the trick. I have converted a ~ 10 MB .prc ebook this way and it worked well. Google for it and read this post (http://www.mobileread.com/forums/showthread.php?p=306985#post306985) afterwards.

PM if you are interested and still need some assistance.

blue223
03-17-2009, 08:31 PM
Finally got it with MobiHuff, thanks to all who helped

llasram
03-19-2009, 02:32 PM
Thanks! Only thing is, I think my ebook is too long, it's several thousand pages. I left mobi2oeb running for a while, and came back and I got this:

File "reader.py", line 598, in <module>
File "reader.py", line 598, in main
File "reader.py", line 196, in extract_content
File "re.pyo", line 150, in sub

MemoryError

Is my ebook just too long? I'm sooo close to getting this out of mobi format this is the last step left I think!

Could you submit an issue on the calibre trac and attach the book in question? Calibre should be able to extract even very large books, and it's definitely a "bug" if it can't.

ectoplasm
08-25-2009, 11:02 PM
I've finally figured out that mobi2oeb only exists in Calibre 0.59 and earlier. Is there any way to extract mobi's with high compression with version 0.60+?

wallcraft
08-25-2009, 11:33 PM
I've finally figured out that mobi2oeb only exists in Calibre 0.59 and earlier. Is there any way to extract mobi's with high compression with version 0.60+? You should be able to use the GUI to do the conversion now. If you want to use the command line, try ebook-convert title.mobi title_std.mobi I think this will get you a standard compression version of the MOBI. To get an ePub, use: ebook-convert title.mobi title.epub An .epub is just a ZIP of all the ebook's contents. See the manual (http://calibre.kovidgoyal.net/user_manual/cli/ebook-convert.html) for other options.

ectoplasm
08-25-2009, 11:42 PM
I was looking for the extracted html and images so I could make some changes, such as turn off full justification, and use Mobipocket Creator to build it again. I didn't test yet, but I assumed epub output would be different from the extracted mobi html?

wallcraft
08-25-2009, 11:52 PM
I was looking for the extracted html and images so I could make some changes, such as turn off full justification, and use Mobipocket Creator to build it again. There is an OEB option: Finally, if output_file has no extension, then it is treated as a directory and an “open ebook” (OEB) consisting of HTML files is written to that directory. These files are the files that would normally have been passed to the output plugin. The command would be:
ebook-convert title.mobi title_oeb where title_oeb is a directory (that need not yet exist).

ectoplasm
08-26-2009, 09:27 AM
Thanks! That is exactly what I was looking for. I'll give this a try.

sapient
10-28-2009, 05:05 PM
There is an OEB option: The command would be:
ebook-convert title.mobi title_oeb where title_oeb is a directory (that need not yet exist).

This is not working for me / any more. I get an error:
No module named site (Error Code: 1)

JSWolf
10-28-2009, 05:08 PM
Try the following....

ebook-convert ebook.mobi .oeb

that should do it.

sapient
10-30-2009, 04:27 PM
thanks, I sorted it out

mor0o0o
12-02-2009, 09:09 AM
i want mobi2oeb download link to use it

kevindorsey
01-14-2010, 06:15 PM
This is not working for me / any more. I get an error:
No module named site (Error Code: 1)

Hopefully you got it fixed...

ATimson
02-19-2010, 10:22 PM
This may be my lack of knowledge of the Mobipocket format showing, but is there a tool that will just unpack the Mobipocket file? Calibre does some additional processing on the HTML, but I'd rather have it in as raw a format as possible.

(I'm trying to determine which will be better for converting to ePub - Mobipocket or MS Reader. I'm thinking MS Reader, but it's hard to tell for sure when the Mobipocket tools are transformative...)

Pardoz
02-20-2010, 08:23 AM
This may be my lack of knowledge of the Mobipocket format showing, but is there a tool that will just unpack the Mobipocket file?

Mobiunpack - you'll need to install Python to use the script, which you can find here (http://www.mobileread.com/forums/showthread.php?t=61986), attached to post 5 and grab the attachment (version 0.17 at last count).

(I'm trying to determine which will be better for converting to ePub - Mobipocket or MS Reader. I'm thinking MS Reader, but it's hard to tell for sure when the Mobipocket tools are transformative...)

I haven't done much conversion work from unpacked .mobi, but converting unpacked MS Reader to .epub is pretty trivial.

ATimson
02-20-2010, 11:53 AM
Thanks! That's exactly what I was looking for. And answers my question, to boot (stick to MS Reader - between higher-resolution images and non-proprietary tags, it'll work better for my purposes). :)

angelad
02-23-2010, 12:05 PM
I think there is still a need for a comprehensive tool that can do a few conversions at a time.

queequeg
06-28-2010, 02:13 AM
Hi!

I know this will be a silly question... but I'm new to Kindle and Mobi having just switched from a Kobo and I need to fix up the table of contents in my Mobi files. How exactly do I use mobi2oeb? I tried using the command after navigating to the sub dir the Mobi is in, but it tells me that the command is not recognized as an internal or external command... i'm doing something wrong aren't i? :o