View Full Version : Mobi2html troubles


whitearrow
07-16-2009, 04:11 PM
I liberated an Amazon .azw file -- just to fix the formatting, which had double-spaces between each paragraph and quadruple spaces between sections.

The dedrm went fine, and that file looks good when I open it in Mobi reader. But when I ran the file through mobi2html to get the html, the file had problems. Random characters were thrown all over the place, and at least in one place, a paragraph simply cuts off at least 4 sentences before its supposed to.

Any ideas, or alternative methods for digging out the html? I was hoping to just do a quick search and replace on whatever html markup was causing the double-spacing.

JSWolf
07-16-2009, 04:44 PM
Try using Calibre to convert to HTML and then back to Mobipocket.

whitearrow
07-16-2009, 05:23 PM
Thanks, will give that a try.

tompe
07-17-2009, 08:37 AM
I liberated an Amazon .azw file -- just to fix the formatting, which had double-spaces between each paragraph and quadruple spaces between sections.

The dedrm went fine, and that file looks good when I open it in Mobi reader. But when I ran the file through mobi2html to get the html, the file had problems. Random characters were thrown all over the place, and at least in one place, a paragraph simply cuts off at least 4 sentences before its supposed to.

Any ideas, or alternative methods for digging out the html? I was hoping to just do a quick search and replace on whatever html markup was causing the double-spacing.

That might be an UTF-8 problem. If you can give me the file I can see if this can be fixed in mobi2html.

wayrad
07-18-2009, 05:41 PM
I'm having the same problem with a Mobi (.prc) book I just bought from Diesel. I don't see the issue when viewing the unDRM'd .prc file in Calibre, so it seems to be mobi2html that is having a problem with the encoding.

tompe
07-18-2009, 06:19 PM
I'm having the same problem with a Mobi (.prc) book I just bought from Diesel. I don't see the issue when viewing the unDRM'd .prc file in Calibre, so it seems to be mobi2html that is having a problem with the encoding.

I totally hate to program for and solve problem for UTF-8 so Calibre is much better on handling UTF-8 coded files.

wayrad
07-18-2009, 10:05 PM
Unfortunately Calibre refused to convert my deDRM'd file. :(
Wait, I just tried one last time, and didn't get an error message this time...hmmm...

OK, I got a file, but all the hyphens and quote marks (single and double) are replaced by question marks. I think the mobi2html output will be easier to clean up than that.

tompe
07-18-2009, 10:31 PM
OK, I got a file, but all the hyphens and quote marks (single and double) are replaced by question marks. I think the mobi2html output will be easier to clean up than that.

Question marks might be the correct UTF-8 characters. Make sure that your font or program displaying the file can handle UTF-8. If it cannot it is common to get question marks.

wayrad
07-19-2009, 08:41 PM
Thanks, that helped! I still haven't been able to get an error-free .rtf (my original goal), but after a bit of googling I was able to tweak Word to convert a plain text file to an error-free version. :D Which isn't all that bad since .rtf and .txt look about the same in PalmFiction anyway.

Edited to add: Got the .rtf. PRC to Epub and back to .mobi with Calibre as suggested by JSWolf earlier, then mobi2html, then convert to .rtf and final cleanup of "hellip"s and "mdash"s in Word.