View Full Version : Mobigen Mass Batch conversion of HTML-Single-File ebooks to .mobi ebooks


cklammer
03-03-2009, 05:25 AM
Hi all,

First a warning: Looooooonnnnnggggg post ahead !!!!!!!!!!!!!

As part of my current activities for migrating from Plucker to Mobipocket I was faced to mass convert approx. 500 ebooksfrom single-file-html format to mobipocket .mobi/.prc format. Actually, a lot of the ebooks were in text format, lit format and pdf format originally and were then converted for reading on a Nokia smartphone into text format some time back using tools ABC Amber Lit converter when appropriate.

I did not simply want to drag-and-drop all the text files into the windows mobipocket reader as I want to have at least the title and author tags properly set. Dragging and dropping a bunch of files will not do that - quite the opposite: the file name will be the title of the resulting mobi ebook and the author will either left empty (if you are lucky) or set to some random value (if you are unlucky - depending on your circumstances).

Now I tried to mass convert the text files with mobiperl or mobigen instead but they proved unsuitable for direct conversion with either of those two tools.

So I downloaded "Easy Text to HTML Converter" and batched my ~500 text files for conversion into HTML using said tool's default template. That was slow but steady - the job was finished ~ 24 hours later (with some other unrelated stuff like DVD burning going on the conversion machine). (see also below)

That netted me my ~500 html ebooks now - so far, so good. At this point let me remark not to ever delete your original lit/pdf/any format source ebook files like I did in the past - you never know when you might need them again !!! And don't be cocksure about what can be deleted: I was ....

For the html mass conversion I decided to write a script to achieve this goal. I started out with using mobiperl and ended up using the win32 executable of mobi2html with a single line "Windows Command processor" (cmd) which did converted my html to .mobi files just fine. The only problem was that almost every of the ebooks generated showed up in the list of the Mobipocket windows reader just fine but could not be opened resulting in a file corruption error message. The ebooks concerned were the files generated from the text to HTML conversion using "Easy Text to HTML Converter"'s default template. No twiddling would change this result - so I abandoned mobiperl because it has obvious problems with the shitty/complicated/whatever-it-is HTML generated by "Easy Text to HTML Converter"'s default template.

I recommend for anybody to stay away from "Easy Text to HTML Converter"' based on my experience.

My next approach for mass conversion was to use mobigen. But a opf project file is needed for every ebook to be generated if one wants the author and titles properly set .... I fired up Mobipocket Creator and converted a single HTML file to Mobipocket and looked at the resulting .opf file: To my surprise it was simply XML serialized in a single line text file ... tadaaa. Now I knew that I was almost home free if mobigen could handle the "Easy Text to HTML Converter" output.

I ran mobigen on the opf file generated by Mobipocket Creator and the result was to my delight a "rather usable" Mobipocket ebook which worked in the Mobipocket Windows Reader.

I then wrote a Visual Basic Script for generating appropriate opf files and running mobigen for the conversion.

So this is what I did in the directory where my ebook html files are stored:

(0) Change all file extensions .htm to .html. You can use LUPAS Rename 2000 for this task.


(1) Preparation of the HTML files' file names: (This is an optional step) I used "LUPAS Rename 2000" to clean up the file names of my HTML files. This step included for me replacing "_" with white space, replacing sequences of two or more white spaces with a single white space and removing angular brackets in the file names. The result of a this are a bunch of files having file names of the form
<Author's last name>, <Author's first names>[, <Author's titles] - <Title>.html

Caveat: If your file names contain the string sequences %1, %2 or %3 at this point you have to remove them at this point before you can proceed with the next step!


(2) Manual creation of a list of ebooks to be converted having the name 00-booklist.txt:


dir /B /O:GNE *.html > 00-booklist.txt
notepad 00-booklist.txt

In notepad replace all occurences of the string ".html" with nothing, save and quit.

This will result in a file 00-booklist.txt where each line contains on ebook entry of the form
<Author's last name>, <Author's first names>[, <Author's titles] - <Title>


(3) Make sure mobigen.exe is either in your %PATH% or the ebook directory. Make sure that the Microsoft Windows Scripting Host is installed and current. This is definitely an issue for Win9x/Me users, possibly an issue for Win2k users, most likely not an issue for WinXP (even unpatched) users and no issue at all for Vista or Win7 users. Microsoft Windows Scripting Host can be obtained from Microsoft downloads (get at least version 5.6 or 5.7).

(4) Make sure the files 00-template.opf, 00-2mobi.vbs are in the ebook directory. Put your own cover for the mobipocket e-books to be generated with the name 00-cover.jpg into the ebook directory.

(5) In your ebook directory run:

cscript 00-2mobi.vbs

That's it if you have done everything according to the above procedure. Now you should find an .opf Mobipocket project file and a .mobi Mobipocket ebook file for every html file unless mobigen has a problem with one file or the other.

Here is the script 00-2mobi.vbs:

REM 00-2mobi.vbs: Mass conversion of HTML Pages to Mobipocket
REM Version 0.1/03-FEB-2009
REM Released under the respective current version of the GPL by cklammer

Main()
WScript.Quit 0

Sub Main()
Const ForReading = 1
Const ForWriting = 2
Const ForAppending = 8

DIM booklistfile
Dim book
Dim bindestrich
Dim author
Dim title
Dim opffile
Dim opftemplate
Dim opfcontent
Dim opftemplatefile
Dim opffilename

Dim FSO
Set FSO = CreateObject("Scripting.FileSystemObject")

Dim oShell
Set oShell = WScript.CreateObject ("WSCript.shell")

Set opftemplatefile = FSO.OpenTextFile("00-template.opf", ForReading)
opftemplate = opftemplatefile.Readline
opftemplatefile.Close

Set booklistfile = FSO.OpenTextFile("00-booklist.txt", ForReading)
Do While (booklistfile.AtEndOfStream = False)
book = booklistfile.Readline
bindestrich = instr(book, " - ")
if bindestrich = 0 or bindestrich = null then
author = "Unknown"
title = book
else
author = Trim(Left(book, bindestrich - 1))
title = Trim(Right(book, Len(book) - bindestrich - Len(" - ") + 1))
end if

opfcontent = replace(opftemplate, "%1", title)
opfcontent = replace(opfcontent, "%2", author)
opfcontent = replace(opfcontent, "%3", book & ".html")

opffilename = book & ".opf"
Set opffile = FSO.CreateTextFile(opffilename, True)
opffile.WriteLine(opfcontent)
opffile.Close()

oShell.run "mobigen " & """" & opffilename & """", 1, True
Loop

booklistfile.Close()
Set FSO = Nothing
Set oShell = Nothing
End Sub

You have to cut and paste the above code intonotepad and save the resulting file under the name 00-2mobi.vbs in your document directory.

Here is the opf template file 00-template.opf:

<?xml version="1.0" encoding="utf-8"?><package unique-identifier="uid"><metadata><dc-metadata xmlns:dc="http://purl.org/metadata/dublin_core" xmlns:oebpackage="http://openebook.org/namespaces/oeb-package/1.0/"><dc:Title>%1</dc:Title><dc:Language>en</dc:Language><dc:Identifier id="uid">0FC99EFF4B</dc:Identifier><dc:Creator>%2</dc:Creator></dc-metadata><x-metadata><output encoding="Windows-1252"></output><EmbeddedCover>00-cover.jpg</EmbeddedCover></x-metadata></metadata><manifest><item id="item1" media-type="text/x-oeb1-document" href="%3"></item></manifest><spine><itemref idref="item1"/></spine><tours></tours><guide></guide></package>

This file is attached.

The source file for the example is Obama, Barack Hussein - Inaugural Presidential Address (http://www.gutenberg.org/files/28001/28001-h.zip). Unpack the html file inside into your ebook document directory and rename it Obama, Barack Hussein - Inaugural Presidential Address.html.

Have fun and good luck,
cklammer

cklammer
03-03-2009, 05:59 AM
Hi all,

I wrote my OP on a locked down machine without zip archive creation capability. Pls find now attached all files referred to in the OP attached as 00-2mobi.zip

Sorry for any inconvienence,
cklammer

mtravellerh
03-26-2009, 10:21 AM
Hi all,

I wrote my OP on a locked down machine without zip archive creation capability. Pls find now attached all files referred to in the OP attached as 00-2mobi.zip

Sorry for any inconvienence,
cklammer

Hey good work. I am thinking about migrating my Aportis Doc files to Mobi. I think your approach might work there, too (although I would have to mass convert the pdbs to txt)

nrapallo
03-26-2009, 04:23 PM
Mobi2IMP will convert PalmDoc (Text/Read) .pdb ebooks and leaves behind the .HTML and .opf!!! ;)

mtravellerh
03-26-2009, 04:37 PM
Mobi2IMP will convert PalmDoc (Text/Read) .pdb ebooks and leaves behind the .HTML and .opf!!! ;)

Oh goody. problem solved.

nrapallo
03-27-2009, 02:35 PM
Oh goody. problem solved.

BTW, this post (http://www.mobileread.com/forums/showthread.php?p=163046#post163046) explains how to get Mobi2IMP to convert many PalmDoc .pdb files in a directory, recursively.

You can use the supplied prc2imp.bat and edit it to include the /r at the beginning of the for statement or just use this line at the dos prompt:for /r %i in (*.pdb) do mobi2imp.exe --verbose "%i" "%~ni"


Hope this helps. :)

kevindorsey
03-27-2009, 07:56 PM
I fell asleep, I'm sorry :0

quocsan
08-08-2009, 11:09 PM
Thank you for your helpful tips, Masters!
But, does anyone know how to make MobiGen run faster?
I think if MobiGen uses RAM for storing temporary files, it will be much faster.

velusamys
11-18-2009, 01:10 AM
Hi

I have auto generated HTML Files (nearly 200) and I want to convert as a single MOBI File.

I had tried using the Mobipocket Creator.But Only the partial content are Displayed.How to i generate the table of contents.

Thanks

Velu

cklammer
11-20-2009, 04:00 AM
Hi

I have auto generated HTML Files (nearly 200) and I want to convert as a single MOBI File.

I had tried using the Mobipocket Creator.But Only the partial content are Displayed.How to i generate the table of contents.

Thanks

Velu

I had this come up, too, sometime back one way or the other ... I used a freeware "dir2html" (Google it or Softpedia maybe?) to generate an HTML document containing a bare bones (you need deselect some "dir2html" options for that in order to disable some "fluff") listing of all the HTML files in the directory and then edited is manually to "beautify" it. I used that one then as a TOC for the further conversion process with Mobipocket Creator.

This worked pretty well: Find your document in the TOC, jump to it, read it until you are done and then use the "Back" function in the Mobipocket Reader until you back in the TOC.

Good Luck,
cklammer

P.S.: Don't hesitate to ask but keep in mind that am at GMT+4.