View Full Version : Tools for Editing Kindle .mobi Files?


GJN
03-13-2009, 03:07 PM
I have a number of unencrypted .mobi files that I've downloaded or converted from other formats and I'd like to do some simple editing on the files, such as correcting spelling errors or reformatting some sections with page skips or paragraph markers.

Are there any affordable tools (or even better free tools) I could use to clean up and display .mobi files? I'd prefer Mac tools but would be content with Windows. I'm sorry I don't do Linux (yet!).

AnemicOak
03-13-2009, 03:45 PM
There isn't something as simple as an editor unfortunately. You can use mobi2oeb, edit the HTML and then use oeb2mobi to get it into a Mobi file again. These are command line tools that install with Calibre.


We discussed it a bit here...
http://www.mobileread.com/forums/showthread.php?t=41298

NicolasR
10-07-2011, 08:47 AM
Hi !
Do you know where I can find mobi2oeb and oeb2mobi ? I cannot find them in my Calibre installation…
Thank you for the help.

cybmole
10-07-2011, 08:51 AM
if you already have calibre then simply use it to convert from .mobi to whatever you wish to edit in: epub( with sigil), rtf ( with MS word)... then convert back to mobi. you can do this for any .mobi file in your calibre library.

DiapDealer
10-07-2011, 08:58 AM
For simple fixes, I use mobiunpack (http://www.mobileread.com/forums/showpost.php?p=774836&postcount=5) (it has a drag 'n' drop Applescript wrapper)... tweak the html... then rebuild with kindlegen.

jswinden
10-07-2011, 11:33 AM
For simple fixes, I use mobiunpack (http://www.mobileread.com/forums/showpost.php?p=774836&postcount=5) (it has a drag 'n' drop Applescript wrapper)... tweak the html... then rebuild with kindlegen.

Is that a MAC only program? I see it uses Applescript.

DiapDealer
10-07-2011, 11:36 AM
Is that a MAC only program? I see it uses Applescript.
No. It's a cross platform python script. It just has an Applescript wrapper to make life drag 'n' drop for apple users. The script itself still functions normally from the command-line for other OS's

Blossom
10-07-2011, 02:45 PM
I use Calibre. I convert to htmlz then open the zip and inside is the html file. I open that with Word 2003 for editing. If the coding needs fixing up I just use my line editor for that.

jswinden
10-07-2011, 03:32 PM
It would be terrific if there was a Sigil-like program for mobi. Unfortunately sigil only works for ePub. But I usually deDRM my Kindle books anyway, so if I need to modify one I convert it to ePub via calibre then edit it in Sigil then convert it back into mobi via calibre. Sigil is great in that it doesn't create all that convoluted CSS crap that calibre does, so it is my favorite ePub modifier.

DiapDealer
10-07-2011, 03:42 PM
It would be terrific if there was a Sigil-like program for mobi.
The problem is that a mobi book is a binary database file that must be built from its html source. Epub is source that's zipped up. A binary mobi file editor would be a fairly large undertaking (especially since there are still a few mysteries being slowly revealed about the proprietary format). It's much easier to convert... fix... and reconvert than it would be to code a wysiwyg binary mobi editor. And frankly... the end result wouldn't be any (OK... much) different anyway.

Blossom
10-07-2011, 03:49 PM
The problem is that a mobi book is a binary database file that must be built from its html source. Epub is source that's zipped up. A binary mobi file editor would be a fairly large undertaking (especially since there are still a few mysteries being slowly revealed about the proprietary format). It's much easier to convert... fix... and reconvert than it would be to code a wysiwyg binary mobi editor. And frankly... the end result wouldn't be any (OK... much) different anyway.

It's just easier to convert to another format and edit. I don't mind the Calibre code. It how it looks and reads on the reader that counts. ;) I don't care for Sigil it's a little slow but I have used it once or twice. I like using Word 2003 cause I edit pretty quickly what I need with my Macros I made. :D

DiapDealer
10-07-2011, 03:52 PM
It's just easier to convert to another format and edit.
I completely agree.
And it appears most developers feel the same way, too. ;)

JSWolf
10-07-2011, 03:56 PM
Convert to ePub, edit the ePub then convert back to Mobi. That way, you'll also have an ePub version. Best of both worlds.

Blossom
10-07-2011, 04:11 PM
Convert to ePub, edit the ePub then convert back to Mobi. That way, you'll also have an ePub version. Best of both worlds.

I just use html and make both epub and mobi in Calibre. I am picky on how it should look. I have to have all chapter headings marked, a working TOC, indentation and a blank line under every paragraph. :)

jswinden
10-07-2011, 04:25 PM
I just use html and make both epub and mobi in Calibre. I am picky on how it should look. I have to have all chapter headings marked, a working TOC, indentation and a blank line under every paragraph. :)

I hate that so many publishers/authors build their eBooks without using the basic standard HTML tags. Instead of using H1, H2, H3 many use a P tag with a gosh-awful CSS class call out. HELLO, that makes autocreation of TOCs a no go!! I personally like a minimalist approach to CSS within eBooks. If I have to modify a book, the first thing I do is strip out all the garbage from the CSS file. No need to remove the CSS classes and what not from each HTML file as it will be ignored if it cannot be found in the CSS file. Next I make sure each chapter/section heading has an H1 or H2 tag. If it is not to much work I also make sure all subheadings have a heading tag as well. It usually isn't too time consuming, but some publishers seem to use a different class for every subheading (within the same level) making a search replace impossible. Idiots!

Blossom
10-07-2011, 04:35 PM
I hate that so many publishers/authors build their eBooks without using the basic standard HTML tags. Instead of using H1, H2, H3 many use a P tag with a gosh-awful CSS class call out. HELLO, that makes autocreation of TOCs a no go!! I personally like a minimalist approach to CSS within eBooks. If I have to modify a book, the first thing I do is strip out all the garbage from the CSS file. No need to remove the CSS classes and what not from each HTML file as it will be ignored if it cannot be found in the CSS file. Next I make sure each chapter/section heading has an H1 or H2 tag. If it is not to much work I also make sure all subheadings have a heading tag as well. It usually isn't too time consuming, but some publishers seem to use a different class for every subheading (within the same level) making a search replace impossible. Idiots!


Oh the coding on some ebooks are horrible! :eek: I agree they are not consistent either in their coding. It's like each person did a chapter and used their own CSS. :rolleyes: Calibre can clean most of that up making it easy to get a TOC. It removes all those stupid font tags which specify what font to use by converting them into the style sheet. I then edit the CSS sheet removing all font family references. Much easier then removing each font reference tag by it's own. I use Word first though to quickly fix the inconsistencies I find.

DiapDealer
10-07-2011, 04:55 PM
Instead of using H1, H2, H3 many use a P tag with a gosh-awful CSS class call out.
H* tags always trigger a page-break-before on the Kindle. That's not always desirable. Like chapter numbers with a (same-sized) chapter name beneath it. Nobody wants a pagebreak in between those two items, so they use a P tag with CSS to simulate a header.

I wish chapter headers were more consistently tagged, but I also understand why they're oftentimes not.

Serpentine
10-07-2011, 05:59 PM
When doing conversions I find myself spending more and more time just doing everything in regex. Getting a bit too good at it really.

I wish chapter headers were more consistently tagged, but I also understand why they're oftentimes not.

Yeah, it doesnt help that a lot of tools that use automatic tidying break completely valid markup, for chapter markers and part/book pages I generally just use a single h2 tag, throw in a line break, horizontal row and the rest - looks perfect on everything, even converts to mobi without trouble. However every now and then I'll tidy and prettyprint... forgetting. Next thing I know there's 3 h2's, empty paragraphs, styled hr's and a whole load of spans and inline css - urgh :(

But anyway, I'd suggest anyone that does conversion from sites/pdf/poor formats in general should get hold of the terribly-badly-named RegexBuddy - makes life a whole lot easier; I guess there might be a free/OSS tool similar, but last I looked they were pretty lacking.

jswinden
10-07-2011, 06:30 PM
H* tags always trigger a page-break-before on the Kindle. That's not always desirable. Like chapter numbers with a (same-sized) chapter name beneath it. Nobody wants a pagebreak in between those two items, so they use a P tag with CSS to simulate a header.

I wish chapter headers were more consistently tagged, but I also understand why they're oftentimes not.

That's another pet peeve of mine: separating the chapter number from the chapter title. If you place both of those in separate paragraphs the TOC looks bad. I prefer placing both in the same paragraph and on the same line, but if aesthetically speaking separate lines are more appealing then I use <br /> between them.

DiapDealer
10-07-2011, 06:40 PM
That's another pet peeve of mine: separating the chapter number from the chapter title. If you place both of those in separate paragraphs the TOC looks bad. I prefer placing both in the same paragraph and on the same line, but if aesthetically speaking separate lines are more appealing then I use <br /> between them.
I don't disagree... it's just that I've given up on relying on any sort of successful auto-generated TOC. I just get my hands dirty and manually make it what I want it to be. :)

jswinden
10-07-2011, 06:50 PM
I hear you, but my philosophy is to spend less time editing than reading the book. ;) Most books I don't edit much, if any. But I've been unfortunate enough to buy a string of Topaz books lately, and those definitely need editing. For one, I must convert them to mobi from Topaz. Once that is done you can count on a multitude of OCR errors. Search and replace becomes my good friend in those cases! I understand using Topaz for ebooks that were last printed 25 years ago, but I'm getting some new releases in Topaz. :(

kamanza
02-12-2012, 11:39 AM
Originally posted by DiapDealer

For simple fixes, I use mobiunpack (it has a drag 'n' drop Applescript wrapper)... tweak the html... then rebuild with kindlegen.

Could you please explain to a newbie how to use mobiunpack?
Thanks.

thomass
02-12-2012, 03:28 PM
this post might help: http://www.mobileread.com/forums/showthread.php?p=149761#post149761

KevinH
02-12-2012, 04:59 PM
Could you please explain to a newbie how to use mobiunpack?
Thanks.

Yes, if you want you can grab the latest version from here:

http://www.mobileread.com/forums/showpost.php?p=1962039&postcount=297

If you are on Windows and would prefer to use a GUI interface and not the command line, you need to fully install the free community edition of ActiveState ActivePython 2.7.X.


1. Download the attached Mobi_Unpack_v0.39.zip

2. Unzip it (right-click and "Extract All" in Windows)

3. Inside the newly extracted Mobi_Unpack_v0.39 folder

Double-click Mobi_Unpack.pyw

4. In the window that pops up:

- Hit the first Browse... button and select your input mobi ebook file

- Hit the second Browse... button and select a destination folder for the unpacked files

- If you want to split combination mobis, examine the raw markup language, or turn on verbose debugging check the appropriate boxes

- Hit the "Start" button -

The unpacking will start and progress messages and any errors will be indicated in the scrollable Log window. If you run into problems, this Log output may be useful in finding and fixing the issue.

Then look in your destination folder for a mobi7 folder and inside of that you can find the html file, images directory, toc.ncx, content.opf that were processed and stored inside your mobi. You can edit the html any way you like and then use kindlegen on the content.opf file to recreate your modfied mobi ebook.

barncat
03-09-2012, 12:04 PM
mokay, Diap guy, I'm beginning to see what you're saying. The answer on my own sad thread was a little abrupt, and I'm obviously not that savvy - so I'm still trying to figure all this out. But still - I opened that other .mobi with Springy, and now that puzzles me more than ever. I'm going to have to go back to the beginning and try to figure out how these utilities work. Starting from the beginning offers a HUGE learning curve. I mean, I can read the HTML and I know something about CSS, but I don't work with these things every day, so what I've learned I have to revisit every time I do a project. You guys all know so much, maybe you forget to have mercy on the grasshoppers among you. The CS5.5 inDesign plug in generates all that repetitive code you are talking about. The p tag before every flipping paragraph, even though there's no reason to reestablish the identity of the text. Like putting quotation marks around every word in a line of dialogue. But in looking at the generated HTML, I begin to see what I could do by hand, if I wanted to.

So do any of you have any insight now that inDesign's plug-in is up and running? I hate Calibre. The interface intimidates me - it doesn't feel intuitive at all. And are you guys PC or MAC, because it's hard for me to read the posts and understand them if I don't know which platform you're talking about.

DiapDealer
03-09-2012, 12:25 PM
mokay, Diap guy, I'm beginning to see what you're saying. The answer on my own sad thread was a little abrupt, and I'm obviously not that savvy - so I'm still trying to figure all this out. But still - I opened that other .mobi with Springy, and now that puzzles me more than ever. I'm going to have to go back to the beginning and try to figure out how these utilities work. Starting from the beginning offers a HUGE learning curve. I mean, I can read the HTML and I know something about CSS, but I don't work with these things every day, so what I've learned I have to revisit every time I do a project. You guys all know so much, maybe you forget to have mercy on the grasshoppers among you. The CS5.5 inDesign plug in generates all that repetitive code you are talking about. The p tag before every flipping paragraph, even though there's no reason to reestablish the identity of the text. Like putting quotation marks around every word in a line of dialogue. But in looking at the generated HTML, I begin to see what I could do by hand, if I wanted to.
Unfortunately, I have no idea what you may be referring to. A quick glance shows that my last contribution to this particular thread was almost half-a-year ago. Sorry. :o

DaleDe
03-09-2012, 01:01 PM
Barncat: you reopened a 3 year old file. Today check the unpack mobi thread for how to turn a mobi file back into the source which can be edited and rebuilt.

Dale

barncat
03-09-2012, 03:33 PM
Sorry guys. I'm lousy at forums. I just trolled through looking for any help at all, but DD helped me on my own little thread. I'm an idiot.

Justin Nemo
03-12-2012, 05:27 AM
Here's a little experiment I tried.

I converted an Open Office ODT file to epub and mobi in Calibre. The formatting was all over the place. I then edited the epub file in Sigil, which worked really well and is a great little program. Then I converted my newly Sigled epub to mobi in Calibre and the formatting was still all over the place in the mobi. So what is the best way to format your book?

GettaGirl72
05-19-2012, 08:37 PM
Here's a little experiment I tried.

I converted an Open Office ODT file to epub and mobi in Calibre. The formatting was all over the place. I then edited the epub file in Sigil, which worked really well and is a great little program. Then I converted my newly Sigled epub to mobi in Calibre and the formatting was still all over the place in the mobi. So what is the best way to format your book?

I was trying to correct the spacing, punctuation, typos on some of my .mobi files (and I am a complete amateur at all this... no python and coding in my repertoire...) and this path of conversion seemed to work pretty well for me, however it uses quite a bit of them :) but all the tools I used were free and relatively intuitive.

I converted the .mobi to .txt in Calibre, opened the .txt in OpenOffice Writer, did a select all and deleted any formatting, so I had a plain .odt file to edit. Once I had it edited as close to my physical copy of the book (I even scanned the maps and pictures and inserted them into the document, used character map for my Em Dashes, etc), I used the Export as PDF function to get a .pdf from the .odt. I used Calibre to convert the .pdf to .epub and used Sigil to edit the .epub with the headers for a generated TOC, add some italicizing, change font size, etc. It took some trial and error with Sigil as it didn't always convert back in Calibre from .epub to .mobi exactly as I expected. But, in the end, I had a practically perfect Kindle version.

I may try playing with the CSS classes as Blossom does, just to see if that helps in the conversions.

Anyway, that's how I got the best results when editing.

Jesse Chisholm
09-26-2012, 02:47 PM
H* tags always trigger a page-break-before on the Kindle. That's not always desirable. Like chapter numbers with a (same-sized) chapter name beneath it. Nobody wants a pagebreak in between those two items, so they use a P tag with CSS to simulate a header.

I wish chapter headers were more consistently tagged, but I also understand why they're oftentimes not.

The workaround for this is, naturally:
<h1>ChapterNumber<br>Chapter Name</h1>

But there will be cases (like the TOC) where that doesn't look quite right either.

UPDATE: I see I'm responding to a much older post that I originally thought I was. Nevermind! :(

-Jesse

Small Elephant
11-18-2012, 05:54 AM
Hi
I was frustrated not having a working Table Of Contents for my e-ink Kindle books and after reading your posts I now know how to do it.

In Calibre I convert them to Epub.
In Sigil I create TOC.
In Calibre I convert to MOBI.

Thank you so much for the information.

I have a question though. Is there any way to automatically remove the paragraph following all the page numbers? As it is now, the page number shows up in the middle of the text followed by a new paragraph with the name of the author or chapter and then the book continues on yet another new paragraph. It would be nice to only see the page number followed by a space and then continuing text. I know how to manually find and replace the paragraphs in Sigil. It takes a long time and in some books the text following the page numbers is different from chapter to chapter therefore I have to go into all chapters and manually do it.

example:
thought he 79</p>

<p class="calibre2">MILTON ERICKSON</p>

<p class="calibre2">would try something different.

Klip
11-22-2012, 01:29 AM
I hate that so many publishers/authors build their eBooks without using the basic standard HTML tags. Instead of using H1, H2, H3 many use a P tag with a gosh-awful CSS class call out. HELLO, that makes autocreation of TOCs a no go!! I personally like a minimalist approach to CSS within eBooks. If I have to modify a book, the first thing I do is strip out all the garbage from the CSS file. No need to remove the CSS classes and what not from each HTML file as it will be ignored if it cannot be found in the CSS file. Next I make sure each chapter/section heading has an H1 or H2 tag. If it is not to much work I also make sure all subheadings have a heading tag as well. It usually isn't too time consuming, but some publishers seem to use a different class for every subheading (within the same level) making a search replace impossible. Idiots!

InDesign spits out HTML like that - with classes instead of simple h1, h2 etc. Not sure if other software does too. Very irritating.

Tagbert
12-26-2013, 03:05 PM
Actually, the best way to format that chapter heading would be like this:

<h1><span class="chapternum">ChapterNumber </span>Chapter Name</h1>

then in the css you would have
.chapternum{display:block;}

That way, it should just display normally in a TOC with no line breaks. The css "display" setting forces a line break after that element.