View Single Post
Old 02-13-2011, 12:12 PM   #1
webwizard
Member
webwizard began at the beginning.
 
webwizard's Avatar
 
Posts: 20
Karma: 10
Join Date: Feb 2011
Device: Kindle DX
Batch doc conversion

Here I am again.

I had a problem of converting a bunch of doc files to .mobi. Problem is that Calibre doesn't handle doc conversion, so a possible solution is to pass through the html format, but if this is a good solution for three or four docs, when you have fifty doesn't look so good anymore.

What I managed to do is to write a macro to open, convert and save a group of files from MS Word, that means, of course, that you must have MS Word installed. I used 2007, but 2003 should work as well.

I'm not going to detail how to write a VBA macro, try Google first. If you're stuck, anyway, I'll try to help.

The macro is this:
Code:
Sub BatchConvertToHTML()

Dim dlgInputFolder As New CommonDialog
Dim strFileName As String
Dim strNames() As String

dlgInputFolder.MaxFileSize = 32000
dlgInputFolder.Flags = cdlOFNAllowMultiselect + cdlOFNExplorer + cdlOFNLongNames
dlgInputFolder.Filter = "Word document (*.doc)|*.doc"

dlgInputFolder.ShowOpen
'Parse the zeroes in the string
For x = 1 To Len(dlgInputFolder.FileName)
If Asc(Mid(dlgInputFolder.FileName, x, 1)) = 0 Then int_zeroes = int_zeroes + 1
Next

ReDim strNames(int_zeroes) As String
int_index = 0

'put each file name in a string array
For x = 1 To Len(dlgInputFolder.FileName)
If Asc(Mid(dlgInputFolder.FileName, x, 1)) = 0 Then
    int_index = int_index + 1
Else
    strNames(int_index) = strNames(int_index) + (Mid(dlgInputFolder.FileName, x, 1))
End If
Next
'Conversion starting
For x = 1 To int_zeroes
    Wordconvert (strNames(0) & "\" & strNames(x))
Next
    
End Sub


Sub Wordconvert(strDoc As String)

   Dim strAuthor As String, strTitle As String
   Dim strDocName As String, strFileName As String
'extract author and title from file name
   strDocName = Left(strDoc, Len(strDoc) - 4)
   strFileName = extractfilename(strDocName)
   strAuthor = Trim(Left(strFileName, InStr(1, strFileName, " - ")))
   strTitle = Trim(Right(strFileName, Len(strFileName) - InStr(1, strFileName, " - ") - 2))
   strDocName = strDocName & ".html"
'open the document
   Documents.Open strDoc
'if your word document already has title and author correctly set, just comment or delete the two following lines 
   Documents(strDoc).BuiltInDocumentProperties(wdPropertyAuthor).Value = strAuthor
   Documents(strDoc).BuiltInDocumentProperties(wdPropertyTitle).Value = strTitle
'save and close the document
   Documents(strDoc).SaveAs FileFormat:=wdFormatFilteredHTML, FileName:=strDocName
   Documents(strDocName).Close
End Sub

Function extractfilename(strfile As String) As String

'Simply put, the file string passed by common dialog is complete with the full path
'Here I strip out the path and take only the file name, to extract author and title

pos = 1
pos1 = 1

Do

    pos1 = InStr(pos, strfile, "\")
    If pos1 > 0 Then pos = pos1 + 1

Loop Until pos1 = 0

extractfilename = Right(strfile, Len(strfile) - pos)

End Function
I know it's not a perfect coding, but was made in half an hour so take it as it is

Another important thing is to activate a reference for the common dialog. This is done in Visual Basic by clicking on Tools, then References, search for "Microsoft Common Dialog Control 6.0" and select it. If you don't find it in the list, browse for the file "COMDLG32.OCX". This is needed for the "Open File" form to work.

You have to create a new module under the Visual Basic window of an empty document and paste the code there. Next you click on the "play macro" button of the taskbar. An "open file" window will appear: select all the doc files you need to convert and click "Open". Word will then quickly open and save as HTML all the files.

Just one thing is important to remember: if you plan to extract the title and author from the file, the filename MUST be in the following format:
<author> - <title>.doc
and no dashes are allowed in the name or title.

That's the best I could do in half an hour, but I hope that helps.

Bye

Paul

PS: That's the last one, I swear...
webwizard is offline   Reply With Quote