First a warning: Looooooonnnnnggggg post ahead !!!!!!!!!!!!!
As part of my current activities for migrating from Plucker to Mobipocket I was faced to mass convert approx. 500 ebooksfrom single-file-html format to mobipocket .mobi/.prc format. Actually, a lot of the ebooks were in text format, lit format and pdf format originally and were then converted for reading on a Nokia smartphone into text format some time back using tools ABC Amber Lit converter when appropriate.
I did not simply want to drag-and-drop all the text files into the windows mobipocket reader as I want to have at least the title and author tags properly set. Dragging and dropping a bunch of files will not do that - quite the opposite: the file name will be the title of the resulting mobi ebook and the author will either left empty (if you are lucky) or set to some random value (if you are unlucky - depending on your circumstances).
Now I tried to mass convert the text files with mobiperl or mobigen instead but they proved unsuitable for direct conversion with either of those two tools.
So I downloaded "Easy Text to HTML Converter" and batched my ~500 text files for conversion into HTML using said tool's default template. That was slow but steady - the job was finished ~ 24 hours later (with some other unrelated stuff like DVD burning going on the conversion machine). (see also below)
That netted me my ~500 html ebooks now - so far, so good. At this point let me remark not to ever delete your original lit/pdf/any format source ebook files like I did in the past - you never know when you might need them again !!! And don't be cocksure about what can be deleted: I was ....
For the html mass conversion I decided to write a script to achieve this goal. I started out with using mobiperl and ended up using the win32 executable of mobi2html with a single line "Windows Command processor" (cmd) which did converted my html to .mobi files just fine. The only problem was that almost every of the ebooks generated showed up in the list of the Mobipocket windows reader just fine but could not be opened resulting in a file corruption error message. The ebooks concerned were the files generated from the text to HTML conversion using "Easy Text to HTML Converter"'s default template. No twiddling would change this result - so I abandoned mobiperl because it has obvious problems with the shitty/complicated/whatever-it-is HTML generated by "Easy Text to HTML Converter"'s default template.
I recommend for anybody to stay away from "Easy Text to HTML Converter"' based on my experience.
My next approach for mass conversion was to use mobigen. But a opf project file is needed for every ebook to be generated if one wants the author and titles properly set .... I fired up Mobipocket Creator and converted a single HTML file to Mobipocket and looked at the resulting .opf file: To my surprise it was simply XML serialized in a single line text file ... tadaaa. Now I knew that I was almost home free if mobigen could handle the "Easy Text to HTML Converter" output.
I ran mobigen on the opf file generated by Mobipocket Creator and the result was to my delight a "rather usable" Mobipocket ebook which worked in the Mobipocket Windows Reader.
I then wrote a Visual Basic Script for generating appropriate opf files and running mobigen for the conversion.
So this is what I did in the directory where my ebook html files are stored:
(0) Change all file extensions .htm to .html. You can use
for this task.
(1) Preparation of the HTML files' file names: (This is an optional step) I used "LUPAS Rename 2000" to clean up the file names of my HTML files. This step included for me replacing "_" with white space, replacing sequences of two or more white spaces with a single white space and removing angular brackets in the file names. The result of a this are a bunch of files having file names of the form
Caveat: If your file names contain the string sequences , or at this point you have to remove them at this point before you can proceed with the next step!
<Author's last name>, <Author's first names>[, <Author's titles] - <Title>.html
(2) Manual creation of a list of ebooks to be converted having the name
dir /B /O:GNE *.html > 00-booklist.txt
In notepad replace all occurences of the string ".html" with nothing, save and quit.
This will result in a file 00-booklist.txt where each line contains on ebook entry of the form
<Author's last name>, <Author's first names>[, <Author's titles] - <Title>
(3) Make sure
is either in your %PATH% or the ebook directory. Make sure that the
Microsoft Windows Scripting Host
is installed and current. This is definitely an issue for Win9x/Me users, possibly an issue for Win2k users, most likely not an issue for WinXP (even unpatched) users and no issue at all for Vista or Win7 users.
Microsoft Windows Scripting Host
can be obtained from Microsoft downloads (get at least version
(4) Make sure the files
are in the ebook directory. Put your own cover for the mobipocket e-books to be generated with the name
into the ebook directory.
(5) In your ebook directory run:
That's it if you have done everything according to the above procedure. Now you should find an .opf Mobipocket project file and a .mobi Mobipocket ebook file for every html file unless mobigen has a problem with one file or the other.
Here is the script
REM 00-2mobi.vbs: Mass conversion of HTML Pages to Mobipocket
REM Version 0.1/03-FEB-2009
REM Released under the respective current version of the GPL by cklammer
Const ForReading = 1
Const ForWriting = 2
Const ForAppending = 8
Set FSO = CreateObject("Scripting.FileSystemObject")
Set oShell = WScript.CreateObject ("WSCript.shell")
Set opftemplatefile = FSO.OpenTextFile("00-template.opf", ForReading)
opftemplate = opftemplatefile.Readline
Set booklistfile = FSO.OpenTextFile("00-booklist.txt", ForReading)
Do While (booklistfile.AtEndOfStream = False)
book = booklistfile.Readline
bindestrich = instr(book, " - ")
if bindestrich = 0 or bindestrich = null then
author = "Unknown"
title = book
author = Trim(Left(book, bindestrich - 1))
title = Trim(Right(book, Len(book) - bindestrich - Len(" - ") + 1))
opfcontent = replace(opftemplate, "%1", title)
opfcontent = replace(opfcontent, "%2", author)
opfcontent = replace(opfcontent, "%3", book & ".html")
opffilename = book & ".opf"
Set opffile = FSO.CreateTextFile(opffilename, True)
oShell.run "mobigen " & """" & opffilename & """", 1, True
Set FSO = Nothing
Set oShell = Nothing
You have to cut and paste the above code intonotepad and save the resulting file under the name
in your document directory.
Here is the opf template file
<?xml version="1.0" encoding="utf-8"?><package unique-identifier="uid"><metadata><dc-metadata xmlns:dc="http://purl.org/metadata/dublin_core" xmlns:oebpackage="http://openebook.org/namespaces/oeb-package/1.0/"><dc:Title>%1</dc:Title><dc:Language>en</dc:Language><dc:Identifier id="uid">0FC99EFF4B</dc:Identifier><dc:Creator>%2</dc:Creator></dc-metadata><x-metadata><output encoding="Windows-1252"></output><EmbeddedCover>00-cover.jpg</EmbeddedCover></x-metadata></metadata><manifest><item id="item1" media-type="text/x-oeb1-document" href="%3"></item></manifest><spine><itemref idref="item1"/></spine><tours></tours><guide></guide></package>
This file is attached.
The source file for the example is Obama, Barack Hussein - Inaugural Presidential Address
. Unpack the html file inside into your ebook document directory and rename it
Obama, Barack Hussein - Inaugural Presidential Address.html
Have fun and good luck,