Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Kindle Formats

Notices

Reply
 
Thread Tools Search this Thread
Old 03-03-2009, 04:25 AM   #1
cklammer
Zealot
cklammer has a complete set of Star Wars action figures.cklammer has a complete set of Star Wars action figures.cklammer has a complete set of Star Wars action figures.cklammer has a complete set of Star Wars action figures.cklammer has a complete set of Star Wars action figures.
 
cklammer's Avatar
 
Posts: 105
Karma: 450
Join Date: Feb 2009
Location: Abu Dhabi, United Arab Emirates
Device: Palm Centro, Acer Aspire One
Mass Batch conversion of HTML-Single-File ebooks to .mobi ebooks

Hi all,

First a warning: Looooooonnnnnggggg post ahead !!!!!!!!!!!!!

As part of my current activities for migrating from Plucker to Mobipocket I was faced to mass convert approx. 500 ebooksfrom single-file-html format to mobipocket .mobi/.prc format. Actually, a lot of the ebooks were in text format, lit format and pdf format originally and were then converted for reading on a Nokia smartphone into text format some time back using tools ABC Amber Lit converter when appropriate.

I did not simply want to drag-and-drop all the text files into the windows mobipocket reader as I want to have at least the title and author tags properly set. Dragging and dropping a bunch of files will not do that - quite the opposite: the file name will be the title of the resulting mobi ebook and the author will either left empty (if you are lucky) or set to some random value (if you are unlucky - depending on your circumstances).

Now I tried to mass convert the text files with mobiperl or mobigen instead but they proved unsuitable for direct conversion with either of those two tools.

So I downloaded "Easy Text to HTML Converter" and batched my ~500 text files for conversion into HTML using said tool's default template. That was slow but steady - the job was finished ~ 24 hours later (with some other unrelated stuff like DVD burning going on the conversion machine). (see also below)

That netted me my ~500 html ebooks now - so far, so good. At this point let me remark not to ever delete your original lit/pdf/any format source ebook files like I did in the past - you never know when you might need them again !!! And don't be cocksure about what can be deleted: I was ....

For the html mass conversion I decided to write a script to achieve this goal. I started out with using mobiperl and ended up using the win32 executable of mobi2html with a single line "Windows Command processor" (cmd) which did converted my html to .mobi files just fine. The only problem was that almost every of the ebooks generated showed up in the list of the Mobipocket windows reader just fine but could not be opened resulting in a file corruption error message. The ebooks concerned were the files generated from the text to HTML conversion using "Easy Text to HTML Converter"'s default template. No twiddling would change this result - so I abandoned mobiperl because it has obvious problems with the shitty/complicated/whatever-it-is HTML generated by "Easy Text to HTML Converter"'s default template.

I recommend for anybody to stay away from "Easy Text to HTML Converter"' based on my experience.

My next approach for mass conversion was to use mobigen. But a opf project file is needed for every ebook to be generated if one wants the author and titles properly set .... I fired up Mobipocket Creator and converted a single HTML file to Mobipocket and looked at the resulting .opf file: To my surprise it was simply XML serialized in a single line text file ... tadaaa. Now I knew that I was almost home free if mobigen could handle the "Easy Text to HTML Converter" output.

I ran mobigen on the opf file generated by Mobipocket Creator and the result was to my delight a "rather usable" Mobipocket ebook which worked in the Mobipocket Windows Reader.

I then wrote a Visual Basic Script for generating appropriate opf files and running mobigen for the conversion.

So this is what I did in the directory where my ebook html files are stored:

(0) Change all file extensions .htm to .html. You can use
Code:
LUPAS Rename 2000
for this task.


(1) Preparation of the HTML files' file names: (This is an optional step) I used "LUPAS Rename 2000" to clean up the file names of my HTML files. This step included for me replacing "_" with white space, replacing sequences of two or more white spaces with a single white space and removing angular brackets in the file names. The result of a this are a bunch of files having file names of the form
Code:
<Author's last name>, <Author's first names>[, <Author's titles] - <Title>.html
Caveat: If your file names contain the string sequences
Code:
%1
,
Code:
%2
or
Code:
%3
at this point you have to remove them at this point before you can proceed with the next step!



(2) Manual creation of a list of ebooks to be converted having the name
Code:
00-booklist.txt
:

Code:
dir /B /O:GNE *.html > 00-booklist.txt
notepad 00-booklist.txt
In notepad replace all occurences of the string ".html" with nothing, save and quit.

This will result in a file 00-booklist.txt where each line contains on ebook entry of the form
Code:
<Author's last name>, <Author's first names>[, <Author's titles] - <Title>

(3) Make sure
Code:
mobigen.exe
is either in your %PATH% or the ebook directory. Make sure that the
Code:
Microsoft Windows Scripting Host
is installed and current. This is definitely an issue for Win9x/Me users, possibly an issue for Win2k users, most likely not an issue for WinXP (even unpatched) users and no issue at all for Vista or Win7 users.
Code:
Microsoft Windows Scripting Host
can be obtained from Microsoft downloads (get at least version
Code:
5.6
or
Code:
5.7
).

(4) Make sure the files
Code:
00-template.opf
,
Code:
00-2mobi.vbs
are in the ebook directory. Put your own cover for the mobipocket e-books to be generated with the name
Code:
00-cover.jpg
into the ebook directory.

(5) In your ebook directory run:

Code:
cscript 00-2mobi.vbs
That's it if you have done everything according to the above procedure. Now you should find an .opf Mobipocket project file and a .mobi Mobipocket ebook file for every html file unless mobigen has a problem with one file or the other.

Here is the script
Code:
00-2mobi.vbs
:

Code:
REM 00-2mobi.vbs: Mass conversion of HTML Pages to Mobipocket
REM Version 0.1/03-FEB-2009
REM Released under the respective current version of the GPL by cklammer

Main()
WScript.Quit 0

Sub Main()
	Const ForReading = 1
	Const ForWriting = 2
	Const ForAppending = 8

	DIM booklistfile
	Dim book
	Dim bindestrich
	Dim author
	Dim title
	Dim opffile
	Dim opftemplate
	Dim opfcontent
	Dim opftemplatefile
	Dim opffilename

	Dim FSO
	Set FSO = CreateObject("Scripting.FileSystemObject")

	Dim oShell
	Set oShell = WScript.CreateObject ("WSCript.shell")

	Set opftemplatefile = FSO.OpenTextFile("00-template.opf", ForReading)
	opftemplate = opftemplatefile.Readline
	opftemplatefile.Close

	Set booklistfile = FSO.OpenTextFile("00-booklist.txt", ForReading)
	Do While (booklistfile.AtEndOfStream = False)
		book = booklistfile.Readline
		bindestrich = instr(book, " - ")
		if bindestrich = 0 or bindestrich = null then
			author = "Unknown"
			title = book
		else
			author = Trim(Left(book, bindestrich - 1))
			title = Trim(Right(book, Len(book) - bindestrich - Len(" - ") + 1))
		end if

		opfcontent = replace(opftemplate, "%1", title)
		opfcontent = replace(opfcontent,  "%2", author)
		opfcontent = replace(opfcontent,  "%3", book & ".html")

		opffilename = book & ".opf"
		Set opffile = FSO.CreateTextFile(opffilename, True)
		opffile.WriteLine(opfcontent)
		opffile.Close()

		oShell.run "mobigen " & """" & opffilename & """", 1, True
	Loop

	booklistfile.Close()
	Set FSO = Nothing
	Set oShell = Nothing
End Sub
You have to cut and paste the above code intonotepad and save the resulting file under the name
Code:
00-2mobi.vbs
in your document directory.

Here is the opf template file
Code:
00-template.opf
:

Code:
<?xml version="1.0" encoding="utf-8"?><package unique-identifier="uid"><metadata><dc-metadata xmlns:dc="http://purl.org/metadata/dublin_core" xmlns:oebpackage="http://openebook.org/namespaces/oeb-package/1.0/"><dc:Title>%1</dc:Title><dc:Language>en</dc:Language><dc:Identifier id="uid">0FC99EFF4B</dc:Identifier><dc:Creator>%2</dc:Creator></dc-metadata><x-metadata><output encoding="Windows-1252"></output><EmbeddedCover>00-cover.jpg</EmbeddedCover></x-metadata></metadata><manifest><item id="item1" media-type="text/x-oeb1-document" href="%3"></item></manifest><spine><itemref idref="item1"/></spine><tours></tours><guide></guide></package>
This file is attached.

The source file for the example is Obama, Barack Hussein - Inaugural Presidential Address. Unpack the html file inside into your ebook document directory and rename it
Code:
Obama, Barack Hussein - Inaugural Presidential Address.html
.

Have fun and good luck,
cklammer
Attached Thumbnails
Click image for larger version

Name:	00-cover.jpg
Views:	427
Size:	187.0 KB
ID:	24882  
Attached Files
File Type: opf 00-template.opf (642 Bytes, 391 views)
File Type: txt 00-booklist.txt (56 Bytes, 280 views)
File Type: opf Obama, Barack Hussein - Inaugural Presidential Address.opf (747 Bytes, 303 views)
File Type: mobi Obama, Barack Hussein - Inaugural Presidential Address.mobi (89.1 KB, 413 views)

Last edited by cklammer; 03-03-2009 at 04:27 AM. Reason: I fucked up. not enough code tags
cklammer is offline   Reply With Quote
Old 03-03-2009, 04:59 AM   #2
cklammer
Zealot
cklammer has a complete set of Star Wars action figures.cklammer has a complete set of Star Wars action figures.cklammer has a complete set of Star Wars action figures.cklammer has a complete set of Star Wars action figures.cklammer has a complete set of Star Wars action figures.
 
cklammer's Avatar
 
Posts: 105
Karma: 450
Join Date: Feb 2009
Location: Abu Dhabi, United Arab Emirates
Device: Palm Centro, Acer Aspire One
00-mob.zip contains all files.

Hi all,

I wrote my OP on a locked down machine without zip archive creation capability. Pls find now attached all files referred to in the OP attached as
Code:
00-2mobi.zip
Sorry for any inconvienence,
cklammer
Attached Files
File Type: zip 00-2mobi.zip (1,010.6 KB, 549 views)
cklammer is offline   Reply With Quote
Old 03-26-2009, 09:21 AM   #3
mtravellerh
book creator
mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.
 
mtravellerh's Avatar
 
Posts: 9,615
Karma: 1620342
Join Date: Oct 2008
Location: Luxembourg
Device: PB360°
Quote:
Originally Posted by cklammer View Post
Hi all,

I wrote my OP on a locked down machine without zip archive creation capability. Pls find now attached all files referred to in the OP attached as
Code:
00-2mobi.zip
Sorry for any inconvienence,
cklammer
Hey good work. I am thinking about migrating my Aportis Doc files to Mobi. I think your approach might work there, too (although I would have to mass convert the pdbs to txt)
mtravellerh is offline   Reply With Quote
Old 03-26-2009, 03:23 PM   #4
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530531
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Mobi2IMP will convert PalmDoc (Text/Read) .pdb ebooks and leaves behind the .HTML and .opf!!!
nrapallo is offline   Reply With Quote
Old 03-26-2009, 03:37 PM   #5
mtravellerh
book creator
mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.
 
mtravellerh's Avatar
 
Posts: 9,615
Karma: 1620342
Join Date: Oct 2008
Location: Luxembourg
Device: PB360°
Quote:
Originally Posted by nrapallo View Post
Mobi2IMP will convert PalmDoc (Text/Read) .pdb ebooks and leaves behind the .HTML and .opf!!!
Oh goody. problem solved.
mtravellerh is offline   Reply With Quote
Old 03-27-2009, 01:35 PM   #6
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530531
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Quote:
Originally Posted by mtravellerh View Post
Oh goody. problem solved.
BTW, this post explains how to get Mobi2IMP to convert many PalmDoc .pdb files in a directory, recursively.

You can use the supplied prc2imp.bat and edit it to include the /r at the beginning of the for statement or just use this line at the dos prompt:
Code:
for /r %i in (*.pdb)  do mobi2imp.exe --verbose "%i" "%~ni"
Hope this helps.
nrapallo is offline   Reply With Quote
Old 03-27-2009, 06:56 PM   #7
kevindorsey
Evangelist
kevindorsey has a complete set of Star Wars action figures.kevindorsey has a complete set of Star Wars action figures.kevindorsey has a complete set of Star Wars action figures.
 
Posts: 488
Karma: 258
Join Date: Mar 2009
Device: kindle
I fell asleep, I'm sorry :0
kevindorsey is offline   Reply With Quote
Old 08-08-2009, 10:09 PM   #8
quocsan
Member
quocsan began at the beginning.
 
Posts: 19
Karma: 10
Join Date: Jul 2009
Device: none
Thank you for your helpful tips, Masters!
But, does anyone know how to make MobiGen run faster?
I think if MobiGen uses RAM for storing temporary files, it will be much faster.
quocsan is offline   Reply With Quote
Old 11-18-2009, 12:10 AM   #9
velusamys
Junior Member
velusamys began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Nov 2009
Device: kindle
MORE HTML FILES TO SINGLE MOBI FILE

Hi

I have auto generated HTML Files (nearly 200) and I want to convert as a single MOBI File.

I had tried using the Mobipocket Creator.But Only the partial content are Displayed.How to i generate the table of contents.

Thanks

Velu
velusamys is offline   Reply With Quote
Old 11-20-2009, 03:00 AM   #10
cklammer
Zealot
cklammer has a complete set of Star Wars action figures.cklammer has a complete set of Star Wars action figures.cklammer has a complete set of Star Wars action figures.cklammer has a complete set of Star Wars action figures.cklammer has a complete set of Star Wars action figures.
 
cklammer's Avatar
 
Posts: 105
Karma: 450
Join Date: Feb 2009
Location: Abu Dhabi, United Arab Emirates
Device: Palm Centro, Acer Aspire One
You need a separate HTML TOC

Quote:
Originally Posted by velusamys View Post
Hi

I have auto generated HTML Files (nearly 200) and I want to convert as a single MOBI File.

I had tried using the Mobipocket Creator.But Only the partial content are Displayed.How to i generate the table of contents.

Thanks

Velu
I had this come up, too, sometime back one way or the other ... I used a freeware "dir2html" (Google it or Softpedia maybe?) to generate an HTML document containing a bare bones (you need deselect some "dir2html" options for that in order to disable some "fluff") listing of all the HTML files in the directory and then edited is manually to "beautify" it. I used that one then as a TOC for the further conversion process with Mobipocket Creator.

This worked pretty well: Find your document in the TOC, jump to it, read it until you are done and then use the "Back" function in the Mobipocket Reader until you back in the TOC.

Good Luck,
cklammer

P.S.: Don't hesitate to ask but keep in mind that am at GMT+4.
cklammer is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Several xhtml/html to a single epub file help. clowe1028 ePub 3 03-21-2010 03:47 AM
Building eBooks, what happens when HTML file changes? Guido Henkel Calibre 2 02-09-2010 09:13 PM
How To mass-convert ereader files into HTML and then into MOBI GatorDeb Kindle Formats 2 12-18-2009 03:51 PM
ebooks.adelaide Mobi Conversion Failures ascherjim Calibre 16 07-14-2009 12:16 PM
Batch conversion html to lrf lilpretender Sony Reader 5 04-22-2008 09:22 PM


All times are GMT -4. The time now is 12:13 AM.


MobileRead.com is a privately owned, operated and funded community.