View Full Version : Word macro for clean HTML code


Toxaris
07-12-2011, 08:50 AM
If anybody is interested, I create a macro to transform a Word document to clean HTML. The following things are supported:
- simple formating (italic, bold, underline, superscript, etc)
- 1 level lists
- tables
- images
- foot- and endnotes
- hyperlinks (internal and external)
- header tags
- custom named styles in Word
- bookmarks

It will use UTF-8.

Just unpack the .bas file. Press Alt+F11 in word and import the .bas file into the Normal.dot.

Comments have been removed or translated into English.

Ralph Sir Edward
07-12-2011, 08:58 AM
What version of Word/what level of Word documents are needed?

Toxaris
07-12-2011, 09:30 AM
It should work for version 2007 and up. I would not be surprised if it would also work for 2003. It works for both doc and docx.

mvo
07-13-2011, 01:45 AM
Thank you! Luckily, I can read a little bit of Dutch - just enough to get by :)

Toxaris
07-13-2011, 04:18 PM
New version... Better way to change the headers (Word language independant) and prevent of some possible loops.

Toxaris
03-07-2012, 01:51 PM
A new version again. Installation procedure is the same.

Please be aware. The macro will not create a stylesheet, but will create a reference to a stylesheet (if you choose to).

mmat1
03-07-2012, 05:00 PM
It should work for version 2007 and up. I would not be surprised if it would also work for 2003. It works for both doc and docx.

This is not an in-depht analysis, but a quick check shows, that it works well in W2003. So W97 shoudn't cause much problems.

It's great !!!!!!:thanks:

Toxaris
03-17-2012, 03:51 PM
Small bugfixes. It should also work on a Mac now. I also converted the comments and texts into English.

Updated the start post with the latest version.

Keroberos
03-17-2012, 07:02 PM
Many thanks :thanks: (now I don't have to finish translating it myself :iloveyou:).

Hitch
03-28-2012, 04:24 AM
Toxaris:

I downloaded the 2.5 version from your post on 3-7-12, but nothing is in English--did I get the wrong zip?

Tx,
Hitch

Toxaris
03-28-2012, 02:07 PM
Hitch, use the attachment from the top post.

Hitch
03-28-2012, 04:19 PM
Thanks, Tox:

Thought I'd take it out for a spin; we're always adding new people and new tools, so anything that makes our process faster or can be used by some of our new folks is good.

(ETA: D'oh! "Eng." :facepalm}

Hitch

Hitch
04-06-2012, 06:02 PM
Hi, Toxaris:

I'll try this again, but every time I attempt to run it, it crashes Win7, Word2010, on a PC Quad.

ETA: Crashes it EVERY TIME.

Hitch

Toxaris
04-07-2012, 02:29 PM
A crash? I never had a crash by the macro before nor have heard of others. I also use Win7, but Word 2007. Does the crash occur at a certain step? Perhaps there are some differences between 2007 and 2010, but a crash should not happen anyway of course.
Does it contain a table?

Hitch
04-07-2012, 03:48 PM
Hi, Tox:

No, no tables. The first one I tried contained some Word x-refs (I think to a now non-existent character list, perhaps), which I didn't realize, and I think that just tanked it completely. It simply stopped responding, across all open instances of Word, and I had to restart it.

The second one I tried (and, as far as it goes, it worked a treat, btw) completed, but then it seemed to hang Word. I had to close Word from the task bar; the (now-empty) Word program just sat there like a dead dodo, I couldn't close it by the usual methods, either by File->exit, File->close, or x'ing out, I was only able to close it from the taskbar.

The resulting html file, though, was findable and usable, so whatever happened, it produced the file, placed the finished html file in the dir, but then Word just wanked. (I realize that these are highly descriptive technical terms: wanked). My only other question, other than the Word program weirdness, is the inquiry into the other paragraph styles. I infer that what the macro is asking is for the Word-given style name, is that right? Does this iterate if there are, for example, 5 styles that you wish to retain? (This is simply a curiosity question; I'm comparing this macro's capabilities against the BookCreator macro for our newer bookmakers who haven't developed their own macros/clips/PERL programs to do it.)

Thanks--hope this is remotely useful,

Hitch

Toxaris
04-07-2012, 05:25 PM
Hi Hitch,

I will check, but I don't remember anything strange after saving.

Anyway, my way of working is that I have a more or less default stylesheet available with all kind of classes. If I check a text in Word and see some text that will need a certain class in the ePUB, I will create a style in Word and make sure that that text has that style. I don't care about the style itself, it is just for identification. It does not create a stylesheet for you.

mmat1
04-07-2012, 05:44 PM
(I realize that these are highly descriptive technical terms: wanked).
Sound as if it's german with an english "past tense" ending ... :)


Does this iterate if there are, for example, 5 styles that you wish to retain?
It will keep asking you to change styles until you say "NO" (It's a simple loop)

To your crashes: Word has a wide area of options doing things automatically, some of them could cause a crash since they are working in background even if a macro is running. Autoformatting is the worst of all. The macro sets the style to standard, autoformating guesses that it should be something else ...

I wonder if the problems are the same if you save the document to convert into another format (i. e. Word 97)

Oldpilot
05-17-2012, 09:31 AM
Just unpack the .bas file. Press Alt+F11 in word and import the .bas file into the Normal.dot.


Using Word 2000, I press alt+f11 and get a window Microsoft Visual Basic. I can find the Normal file, which I assume is Normal.dot. I click on that.

Across the top I am offered File / Edit / View / Insert / Format / Debug / Run / Tools / Add-Ins / Window / Help

None of these seem to offer an "import" option. I thought Insert might do it but all I see there are Userform / Module / Class Module

I have never messed with Visual Basic before!

Toxaris
05-17-2012, 05:21 PM
You can try to right-click on the Normal file and see if there is an import there. I cannot check on Word 2000. I can also not vouch that it will work on Word 2000. There were quite some changes in 2003.

mmat1
05-25-2012, 04:52 PM
None of these seem to offer an "import" option. I thought Insert might do it but all I see there are Userform / Module / Class Module
The Import-Option is here (see attachment). All pretty much the same in WW98 through WW2003

I have never messed with Visual Basic before!
The you probably missed some fun :)

steppe
08-08-2012, 05:16 AM
Thank you Toxaris. The macro works great for my Kindle books!

Toxaris
08-08-2012, 07:13 AM
Will post a newer version soon. I check for centering now.

mmat1
08-08-2012, 01:08 PM
Will post a newer version soon. I check for centering now.

Btw, there is an endless loop in "replace_headers". You check for empty headers but do not reset the style to -1 if an empty header is found.

Toxaris
08-08-2012, 01:16 PM
That is strange. That code is not changed. Will look at it though.

Edit: the clearformatting removes the header style. So, no issue.

mmat1
08-08-2012, 01:35 PM
That is strange. That code is not changed. Will look at it though.

Edit: the clearformatting removes the header style. So, no issue.

Give it a try

Toxaris
08-08-2012, 03:14 PM
Will try this weekend. Can't before that.

Start
12-05-2012, 02:42 PM
First of all, thank you very much for that Macro Word.

This is my first post here, in this forum and I'd like to know if there are other news about. Following your messagges seems to me like a... undesired stop or not? Maybe in another topic?

Toxaris
12-05-2012, 03:25 PM
No, not really. There is a small updated version I think, but I haven't translated that yet.

I still use it for each book.

Start
12-06-2012, 06:19 PM
Hi,
It's a good news to know there's a new development!

I'm new in that Macro world (about Visual Basic) and I don't know if it is possibile to save the output of your Html already splitted in chapters. Is it possible to code it? Maybe like an option.

I have other ideas but this one is a little strange... :o

Toxaris
12-07-2012, 03:35 AM
I could, but I won't. I import the whole book into Sigil and I find it easier to work with as a whole. Splitting in chapters is one of my last steps actually.

Toxaris
12-07-2012, 03:35 AM
Give it a try

Sorry, still haven't tried. Will do so tonight. Promise.

Start
12-07-2012, 08:48 AM
I could, but I won't. I import the whole book into Sigil and I find it easier to work with as a whole. Splitting in chapters is one of my last steps actually.

:) no problem... It's the same.

And what about your last improvements?

Toxaris
12-07-2012, 09:52 AM
They are mainly small ones. I actually have to see which version is here to be certain. I made some changes so it should also work on a Mac (not tested!) and with other regional settings. I know the last one sounds strange, but believe me in the fact that it matters.

Toxaris
12-07-2012, 01:44 PM
New version on in the first post. Some minor changes and fixes. Centered text will be converted correctly.

@mmat1, your issue has been resolved. Thank you!

Start
12-13-2012, 09:49 AM
Thank you very much! Just downloaded your last version.

Only one question. Are you interested in cleaning more your text before becoming a good HTML file? I'm thinking of "a double space" problem and similar one. What do you think about? May I write here some suggestions, if you are interested?

Toxaris
12-13-2012, 10:22 AM
Suggestions are always welcome. You can also mention certain HTML entities you want to be transformed to their entity name. Sometimes that works better.
I am not aware of a double space problem. Double spaces are ignored by HTML and will be cleaned automatically upon importing into Sigil. Unless you mean something else of course.

Start
12-13-2012, 05:23 PM
Thank you, I have sundry difficulties in english so I can use "strange" words. If so, please forgive me. My "two spaces problem" is surely a known case but as I'm describing it, maybe, seems an unknown one. :(

I'll try to tell you more.

When I'm preparing a book of mine, well, I'd like to save it in two formats. One in pdf, and the other in epub. 2 files from our docx file.

So I'd like to have a good cleaning text even in my Word file. In the mean time, if I transform in HTML a good text, it's possible to do lesser work than before...

The 2 spaces is only a no-required space before or after a "punctuation". The first example is when you have to close a sentence and you have a space before a point and a second space after that point. The first space is to be deleted.

I write here some examples about the space to be deleted:

Space before a point .
Space before a comma ,
Space before a semicolon ;
Space before the colon :
Space before the ellipsis (not always superfluous, mind you) ...
Space before the parenthesis (also applies to the square or curly) )
Space after the opening parenthesis (also applies to the square or curly) (
Space before a question mark ?
Space before an exclamation mark !
Space before the percent sign %
Space before guillemet closed '
Space after the quotes low open «

But I've seen another problem that it is present in a docx file. An example is when you write something in italics and your italics isn't finished at the end of a word but after a space or other puntuactions. So you can see in html, as follows:

<i>dfsfsf </i>dsdsad
<i>dfsfsf;</i> dsdsad
(<i>dfsfsf)</i> dsdsad

and so on. Not good, I think. There are all mistakes. The first with a space in italics, the second with puncuation in italics not required. An so on.

I'd like to know your opinion. I'd have other ideas but I'd like to know before your thoughts about what already said. I hope interesting. :blink:

Toxaris
12-14-2012, 04:32 PM
Ok, I can see that you have issues with your punctuation. Resolving that is not always easy btw, since for some cases it is not a case of one s&r string fits all. I actually don't have these issues, since the HTML Export macro is actually my last step in Word. Before this I run several other macros. One prefix macro to solve a lot of standard issues with ABBYY exports. It fixes linebreaks and so on. Then I run a large S&R macro to fix a lot of formatting and OCR issues which can be changed rather dynamically. Then a macro to fix broken dialogues and at last one to check all accented words (typical OCR errors).
So, for me most of the errors you describe are solved. I haven't shared these macro's, since I am only co-author of these or just plain advisor.

Now, your issue with italics are a little different. Your first issue is solved by importing the HTML into Sigil. The second and third one are quite complex, there can be cases when this is correct. Also I feel that this actually should not be in this macro. This macro is to create clean HTML, not to solve formatting issues.

That being said, a macro could be written to solve these kind of issues. Whether I would have the time, that is another.

Start
12-16-2012, 06:14 AM
Another problem that I have is about the book mistakes: mistakes by the editors of the book. One is when they forget to close the « or " and so I have to check Always a book for correcting this problem, if exists because I can do this work and I can find out that all is right... :-O Also this can happen.

And yes there are different problems explained in my posts, it isn't only about the resulting html file. Thank you for you reply, very appreciated!

Toxaris
12-16-2012, 08:34 AM
That is a classic example of a missing dialogue quote. I have a macro for that, but I haven't made it available. It catches al instances, but in some books it doesn't work because of the style. For example for an extended dialogue on a new line starts with an opening quote without closing it on the line above. The macro does help, but cannot auto correct that.

This particular macro made me see just how much physical books mess up the quotes.

Start
12-16-2012, 04:50 PM
The real problem is that a Macro needs a Visual Basic knowledge... :(

Toxaris
12-16-2012, 06:07 PM
That is not a problem, if you have that knowledge... Basically a macro automates a lot of steps, but some are much more sophisticated than others. The dialogue macro needs an overhaul in my mind and perhaps I will publish it when finished.

Start
12-17-2012, 06:30 PM
When ready I'll be here ;)

Start
12-19-2012, 10:09 AM
Help! I don't know why but using your Macro I've find out a problem: all « or » become simple "

:(

I can't have got a conversion like this one as in my old books the «» are for the direct saying and all "" are when you're thinking.

How can you avoid this automatic conversion in your macro? Is it possible to choose what you'd like to see or not?

If you need I can attach here (if I have this permission in this forum a single page doc for example).

Toxaris
12-19-2012, 01:23 PM
Can you sent me an example file? It should not touch those characters at all.

Start
12-19-2012, 03:16 PM
Can you sent me an example file? It should not touch those characters at all.

That's right! Here attached you can see the Word file before (doc) and the file after (saved as html). You can see that the conversion changed those characters.

Toxaris
12-19-2012, 05:35 PM
I did a quick test, but when I run the macro the characters remain. I am curious, something has happened to the export. The macro creates HTML, not MHT.

Start
12-20-2012, 07:49 AM
I "saved as" MHT for avoiding the html file and its folder, but the result is the same. In the html all is changed as you can see. If you have no problem it's logical to think of an automated option in my Office that I don't know where to deselect...
Unfortunately I don't know Word as I'm a simple user and not a professional as you.

I'll try to reinstall all Office by the original DVD. Only that I can do...

PS. Why my scan have text pages in a frame or table? But in conversion all is good as no table/frame appears. A very good Macro.

Toxaris
12-20-2012, 09:30 AM
At the end a question is asked to save the file. Only use that option, since it is saved a special way. The frames are a result of ABBYY 11 and I convert them earlier, so I had no idea the macro would do that. It does make sense though.

So, use the save option of the macro and see if that works. It will not create a folder.

Hitch
12-20-2012, 03:41 PM
At the end a question is asked to save the file. Only use that option, since it is saved a special way. The frames are a result of ABBYY 11 and I convert them earlier, so I had no idea the macro would do that. It does make sense though.

So, use the save option of the macro and see if that works. It will not create a folder.

Tox:

If you use your export HTML macro on a badly-scanned Abbyy 11 scan, one that was done with the text frames, it removes them? Is that what I'm hearing? We get those funky text frames all the time, and although we have a macro that will nuke 'em, I'd much rather just use your (really divine, although there are still a few things I'd like added, myself) macro if that does it as well?

Hitch

Toxaris
12-21-2012, 02:19 AM
Apparently it does. I never tested it myself, I remove the frames in one of my other macro's.

What kind of things would you like to be added? If I think it is useful, there is a big chance I will incorporate it as you know.

Start
12-21-2012, 11:19 AM
Reinstalled Office entirely by DVD for avoiding any mistakes. I use Word 2010.

Then I have activated your last Macro (dowloaded from your post #1 with fixed center error).

Unfortunaly It doesn't work!

In Word after activating the Macro, and after 2 questions about "Remove bold layout" (yes) and and "Remove underlines" (yes) You can see immediately a screen of Visual Basic with a warning window in the center of the display.

It says "errors of programming - FOR WITHOUT NEXT" and so all is stopped by this problem. I can't continue and I have to abort the Macro process.

sellew
12-21-2012, 03:47 PM
"errors of programming - FOR WITHOUT NEXT"

I found the same error: there is a Next missing in the Function replace_formating().

In the visual basic screen go at the end of the code of that function an insert the Next as shown here:

End If
Next
If Not (oRg Is Nothing) Then Set oRg = Nothing

End Function

Toxaris
12-21-2012, 04:20 PM
That is correct. I have updated the version to solve this small error. The mistake was made due to the fact that I maintain it originally in dutch and have a translated version. I forgot the next statement.
Like I said, the version in the startpost is correct again.

Start
12-22-2012, 08:20 AM
Apparently it does. I never tested it myself, I remove the frames in one of my other macro's.

What kind of things would you like to be added? If I think it is useful, there is a big chance I will incorporate it as you know.

Would you like to integrate that Macro in this Macro? :bulb2:

Toxaris
12-22-2012, 09:09 AM
No, it is part of my premacro which prepares output of ABBYY for further work. It actually has no place in the HTML export macro, since it has no real usage here.

I am willing to incorporate a lot, but it must have a place in the macro's usage (being HTML export). Somewhere in the new year I will see if I can publish some of the other macro's.

Start
12-27-2012, 10:49 AM
Hi everybody! Best Wishes to everybody!

I'm happy to tell you that my time spent in resolving the problem about Why Toxaris macro changes my pointing guillemets into simple Double Quotation Mark has been solved!

I'm NOT used to Macro or VBa but when I see that with a new fresh install of my original Office all is wrong, so I decided to solve this problem and if not, to avoid the macro... I have no Others chances, my books are 50years old and use a lot of pointing guillemets.

Moreover I tested Toxaris macro in a friends commercial office and in their corporate Office all is the same: all is wrong with the wrong carachters. I have no chance. To use or not to use the Toxaris macro?

As I don't know what written in a Macro, I simply deleted a paragraph of Macro in Modify window and tested what happening... then I see that without a piece of code all remained unchanged and so I discovered some stranges number codes in that lines. In internet I find out this marvellous page http://www.elizabethcastro.com/html/extras/entities.html


After that I did a long research by Google and I discovered (thanks to Toxaris comments in english for having some words to use in internet) that some carachters have numbers for identifing them in HTML.

After a haedache I tried the "pointing guillemets" numbers instead of original numbers in the Macro and I found out all! All is good!!!!

I inserted these new lines in the Macro and all is ok! Now ALL is the same as the original scanned book.


The Macro code lines were before:

Function replace_quotes()
'Change smart quotes to HTML entity

Dim oRg As Range

Set oRg = ActiveDocument.Range
StatusBar = "Convert special characters to HTML code..."

With oRg.Find
.ClearFormatting
.Text = "^0145"
.Replacement.Text = "‘"
.Execute replace:=wdReplaceAll, Wrap:=wdFindContinue
End With

With oRg.Find
.ClearFormatting
.Text = "^0146"
.Replacement.Text = "’"
.Execute replace:=wdReplaceAll, Wrap:=wdFindContinue
End With

With oRg.Find
.ClearFormatting
.Text = "^0147"
.Replacement.Text = "“"
.Execute replace:=wdReplaceAll, Wrap:=wdFindContinue
End With

With oRg.Find
.ClearFormatting
.Text = "^0148"
.Replacement.Text = "”"
.Execute replace:=wdReplaceAll, Wrap:=wdFindContinue
End With

If Not (oRg Is Nothing) Then Set oRg = Nothing

End Function


and now I have inserted new lines and other comments as all is difficult for me and I hope you appreciate it (I write in my own Language and after that in english too, so no problem for mistakes in translating Macros).

Then I inserted new lines in the Toxaris Macro,
before:

Sub Transform_HTML()
' This macro converts the active document into a document with HTML tags for layout.
' Version 2.52 - Toxaris
' - fixed center
' - fixed handling empty headers

I've inserted also the right version of Toxaris, the 2.53 instead of 2.52. and now all is as follows:

Sub Transform_HMTL()
' This macro converts the active document into a document with HTML tags for layout.
' Version 2.54 - Toxaris
' - fixed 2 lost smart quotes to HTML entity causing bugs
' - more comments in smart quotes to HTML entity
' - translation in english of change center
' Version 2.53 - Toxaris
' - fixed center
' - fixed handling empty headers

I have translated from dutch into english this before:

'centreer omzetten
For Each para In ActiveDocument.Paragraphs
If para.Alignment = wdAlignParagraphCenter Then
' ParaText = Left(para.Range.Text, Len(para.Range.Text) - 1)
para.Alignment = wdAlignParagraphLeft
Set oRg = para.Range
oRg.MoveEndWhile cset:=vbCr, count:=wdBackward 'enter niet meenemen
oRg.InsertBefore "<center>"
oRg.InsertAfter "</center>"
End If

And using the Google translate I tried to understand what written by Toxaris but I don't understand the meaning, I'm not a skilled guy in coding. However I translated as follows:

'change center
For Each para In ActiveDocument.Paragraphs
If para.Alignment = wdAlignParagraphCenter Then
' WHY THIS LINE IS OFF ??? ParaText = Left(para.Range.Text, Len(para.Range.Text) - 1)
para.Alignment = wdAlignParagraphLeft
Set oRg = para.Range
oRg.MoveEndWhile Cset:=vbCr, Count:=wdBackward 'enter not taken ???
oRg.InsertBefore "<center>"
oRg.InsertAfter "</center>"
End If

I have even written "WHY THIS LINE IF OFF" because there is a character that colour all the line code in green and in a Microsoft site there is declared that all green lines are simple comments and not code lines. Maybe is a mistake? I say that because Toxaris usually comments everything and this line is not commented and so it'd be a code and not a comment... sorry but I hope to be useful.

Then I attach here this macro already ready in 2.54 version and I hope Toxaris can approve my work and also that all is ok. I thank you very much toxaris again for this clean work and Microsoft for its online dummies FAQ that permitted to me to debug my problem.

I hope to see this file in the first post by Toxaris and now... stop headache! And well, Best Wishes from me! ;)

Toxaris
12-27-2012, 04:42 PM
The changes seems ok, but should not necessary. The quote conversion is actually required for the normal quotes due to the fact that they have a certain usage in the normal code. The chevrons should have not. However, I have been surprised by Microsofts handling of regional settings before.
Don't worry about the correct translation of the comments. I can do them in an next version if you want. The commented line is an old line which has been replaced by some better code. Since it is commented, it is not being used.

Start
12-27-2012, 06:33 PM
The changes seems ok, but should not necessary. The quote conversion is actually required for the normal quotes due to the fact that they have a certain usage in the normal code. The chevrons should have not. However, I have been surprised by Microsofts handling of regional settings before.
Don't worry about the correct translation of the comments. I can do them in an next version if you want. The commented line is an old line which has been replaced by some better code. Since it is commented, it is not being used.

Thank you for your reply, I hope my solved problem would be useful for who has the same troubles... when deciding to stop using your macro, I don't know how, the solution goes out...

I hope you can use that file for your next release.

Turtle91
12-28-2012, 02:08 AM
Ah...Sweet! Thanks Toxaris!!
I had built up a bunch of macros over the years to do exactly what you have here. Did you know that when you upgrade windows on your computer (clean install) that your "normal" module does NOT get backed up with everything else??!!?? Needless to say I was rather upset that all that work was gone...but now I found this and that saved a bunch of work redoing it all. :)
I also liked how you accomplished some things - I've been working through the code for the last couple of hours. I added a couple things that you might like. Can I send this file to you? I'd rather not post it on here until you've had a chance to look it over - don't want to confuse anyone!! Cheers!

Toxaris
12-28-2012, 05:26 AM
Always looking for improvements! Sent me a pm and I will look at it.

Start
12-28-2012, 09:21 AM
...
I'd rather not post it on here until you've had a chance to look it over - don't want to confuse anyone!!
...

I've read your message and I've found that idea, not to be confusing for anyone or to post something before Toxaris approval. :smack:

I've never thought about my post to create a problem, and so I beg the pardon to everybody and I've just cancelled my file. ;)

Toxaris has downloaded my file, if useful, he can fix the Macro. :bookworm:

Turtle91
12-29-2012, 01:44 AM
Sorry Start, I didn't mean anything towards you by the "confusing" remark! I just didn't want to step on anyone's toes by posting a change to his macro without him seeing it first....especially as it was my FIRST post on this forum! :D

I did send a PM to Toxaris with the file. We'll see what he thinks of it...

Cheers!

Start
12-29-2012, 04:05 AM
All's right now, thank you very much Turtle91 for your kindly and smart reply ;)

Toxaris
12-29-2012, 08:25 AM
I am having a week off with almost no internet, so you all have to wait for a week, sorry.
Will pick it up as soon as I can.

Start
01-02-2013, 02:58 PM
Best 2013 to everybody!

Thinking of an instruction in Toxaris Macro, like a wish-list, if realizable: when you insert a <i> code to see the right place for that code. If space has to be before or not the <i> </i> automatically. The same for <b> </b>.

Is it possible to to make a blank line <p>nbsp;</p> adding it when a paragraph in word has more space after? So it is possible to let the right space in the middle of 2 parapraphs.


Moreover, it's off-topic, but... does it exist a Word Macro for warning about a no-closed Chevron?

Toxaris
01-04-2013, 04:35 PM
I am back and will take a look at the suggestions from Turtle tomorrow.

What do you exactly mean Start? It does not matter if you have _<i> or <i>_. The render will pick it up as _<i>. The same applies for the closing tags. It remains a question if you want to include your quotes, comma's, question marks and so on. It is quite possible to create a small macro that colorizes the bold/italic things in the document. A kind of premacro to check before conversion.

I am not quite sure what you mean with the <p>& nbsp;</p>. If you have an empty line in Word, it will gets converted to <p>& nbsp;</p>. Multiple spaces will be ignored in HTML.

Your last question is yes and no. As you know, there is a check dialogues macro which finds almost all broken dialogues. I believe your question is about languages which uses chevrons as dialogues markers. However, this macro needs overhaul end when I do I will include chevrons.

Start
01-05-2013, 09:23 AM
Hi Toxaris,
interesting the idea of a sort of pre-macro. Maybe like an option to choose some parameters before activating the real Macro.

About <p>& nbsp;</p> it's a little difficult for me to explain in english but i'll try to be clearer, I hope to do so. In a book you can see every single Chapter with all sentences with the same space between lines and between paragraphs. So your Macro is perfect. All </p> is good as we don't need any <p>& nbsp;</p> further. The problem is where I see my scanned pages and there are a double space between 2 paragraphs because usually more space in an ebook has an important meaning, one is the time changing or the change of an idea and so on. So is It possible fo make a further <p>& nbsp;</p> when Word has this stop between paragraphs in order to avoid that a break isn't recognized?

In this case before activating your Macro I check my book in search of all double (and more) spaces between this "area of paragraphs". When I see one of them I write in Word a space by pressing the keyboard return button. So I'm sure that in conversion all is like the original old book. And the problem is that Word has the space after a paragraph activated in some sentences only. So it's difficult to detect them easily. I have to check them manually and sometimes I do mistakes...

And about "As you know, there is a check dialogues macro which finds almost all broken dialogues." Sorry I don't know where I can find that in the forum. Please can you write the right link where to download? But the question, as you say to me is "As you know, there is a check dialogues macro which finds almost all broken dialogues", and I hope to see an idea from you. It's difficult to do all by "hand" and with my poor eyes! ;)

Toxaris
01-05-2013, 02:13 PM
I think I know what you mean now. I am testing a small macro, or actually a routine, as part of another macro that tries to detect scene breaks and places a holder. It is not working exacly as I want yet, but it is almost there.
I'll place it online when it is done.

The dialogue macro is one of the macro's that I did not create alone. That one is due for an overhaul, so it is not available at the moment. I hope to have it in a working form in about 3-4 weeks. It handles dialogues in single quotes, double quotes (including low/high) and chevrons. If it is done, there will be a separate message thread opened.

Toxaris
01-13-2013, 05:06 AM
A new version is online. I have incorporated some code changes given to me by Turtle91 and fixed a few small bugs/strange things. It is in the start post.

No answer on the other macro's yet.

Soxendom
01-16-2013, 10:07 AM
I downloaded the version from the first page, imported it and ran it with one of my files. I received a "variable not defined" error.

Turtle91
01-16-2013, 10:44 AM
Here's how to fix the "Variable not defined" error.

Add "dim i" (without the quotes) to the top of the Sub Transform_HTML() macro...just below the green remarks

Also add "Public Entities, Quotes, CharFonts As Variant" (without the quotes) just below the Option Explicit statement at the very top of the module.

It should look like this when you are done:
Option Explicit
Public Entities, Quotes, CharFonts As Variant

Sub Transform_HTML()
' This macro converts the active document into a document with HTML tags for layout.
' Version 2.6 - Toxaris
' - fixed some strange effect withs tables
' - fixed unnecessary code in HTML when there are no notes
' - added guillemots as quote characters
' - streamlined replacements
Dim i

ActiveDocument.Save
Application.ScreenUpdating = False

I hope that helps!

Cheers,
Dion

Toxaris
01-16-2013, 02:11 PM
That is the result of maintaining two version manually outside the code check... The version in the first post is updated. Next time I will check better... Sorry.

gonzo115
01-27-2013, 12:05 PM
Thanks for the great work but I'm using Word 2011 on Mac and the macro doesn't run. What do I need to fix to get it to work on Word 2011?

Thanks

Turtle91
01-27-2013, 02:38 PM
A big hammer through the screen fixes ALL of my Mac issues!! :D

I don't remember there being any Windows API calls that would cause it to hang on a Mac...I also think Toxaris was very good at incorporating multi-platform language for the file/folder calls.

Is there a particular error that pops up? Can you get a screen print??

Toxaris
01-27-2013, 03:22 PM
I actually made some changes to make it more Mac proof. As I don't have a Mac, I cannot really test. As Turtle91 said, what is the error you get?

rhyous
01-30-2013, 06:23 PM
The With in line 160 is not closed so I got an error.

I closed it and it worked.

Also, I normally don't replace the curly quotes with the special character. Why are the quotes important to replace? I think I will comment out the code to replace the quotes as I like the look with the quotes better and my ebook seems to work fine.

rhyous
01-30-2013, 06:44 PM
Also, there is no license associated with your code?

So right now, they way it is posted, it it free, unlicensed, and public domain so anyone can take and do what they want with it. It that what you want?

You might want to look at the BSD Licenses.
http://opensource.org/licenses/BSD-2-Clause

Toxaris
01-31-2013, 07:14 AM
Correct, the with is not closed correctly. Corrected it in the start post.

The quotes are important to replace for several reasons. One is that Word will convert them back to straight quotes upon saving and that is not what I want...

There is no license to it, that is correct. I use this macro myself and if anyone finds it useful, they can use it. I can put in a BSD or other license reference in it, but that will not stop anyone from sharing if they want.

rhyous
01-31-2013, 12:49 PM
OK. Changes the quotes is only for saving purposes.

I wondered if there was a reason epub needed the character sequence as opposed to the characters.

Cool.

And no need to license it. I plan to use it in a blog post, which is why I asked about license first. Of course, I'll make sure you have credit.

steppe
03-19-2013, 01:00 AM
Thank you again for this neat gadget.

Internal links seem to be deleted in the resulting HTML file (Word 2010 and Word 2013). The links were created using the "Insert" ribbon and "Bookmark" and "Hyperlink" buttons. In the resulting HTML file, the bookmark (anchor) is deleted and the link holds an empty reference like this:
<a href="">Text here</a>

Any chance this can be fixed? :wink:

I am attaching the test files. To create the HTML file I click either "No" or "Cancel" for all questions except the last one: "Save HTML?" where I select "Yes."

Aside from that, the macro seems to work fine in Word 2013 (it's available as a free 30-day trial of Microsoft Office 365 if you are curious).

Toxaris
03-19-2013, 03:13 AM
I will look into it, but since I hardly use internal links in a Word document, I haven't programmed it.

steppe
03-19-2013, 08:13 AM
Your time would be greatly appreciated. I have a last-minute wish :smile: : so that the internal links are Kindle-proof it would be best to code the anchors like this:

either:
<h1 id="anchor1">Chapter One</h1>

or:
<a name="anchor1"></a><br /><h1>Chapter One</h1>

if an anchor precedes a heading tag. The following code will work badly on older Kindle devices:
<a name="anchor1"></a><h1>Chapter One</h1>

(The formatting of the heading will change to regular text when a Kindle user follows that link.) If an anchor is within regular text, then it doesn't matter how it is coded. :thanks:

Turtle91
03-19-2013, 09:22 AM
I thought the NAME element was going away in favor if the ID so the first example would be preferable.

Toxaris
03-19-2013, 12:11 PM
If I include this in the macro (I am not sure yet), it will be with 'ID'. I will take a look later this week.

Toxaris
03-19-2013, 02:24 PM
New version in the startpost. Bookmarks and internal links are supported...

Turtle91
03-19-2013, 05:55 PM
Wow!! You have REALLY short weeks in the Netherlands!! :D

Thanks!

steppe
03-19-2013, 09:14 PM
Bookmarks and internal links are supported...

Thank you, Sir, for the prompt action! :thumbsup:

Unfortunately, the macro seems to crash when processing 6 or more internal links. :chinscratch: I am attaching the test file with screenshots.

Again, :thanks:

steppe
03-19-2013, 09:30 PM
Also, when I try to delete the Macro from Word 2010, I get an error message :chinscratch:

http://www.winterwallpub.com/z_delete_error.jpg

your help would be much appreciated.

Toxaris
03-20-2013, 04:46 AM
Fixed. The first error is a simple one and was easy to fix. It actually doesn't matter how many links are in the document. Again foiled by having to maintain two versions... The second screen showed me that there was something weird. The nestings were wrong. I am glad you sent the word document you used for testing here, because I could use that. You used soft enters and I didn't check those for formatting. I personally would be cautious with soft enters.
I also changed some things in the processing sequence, because links get underline tags. This is unwanted, since anchors already have an underline formatting in standard HTML.

With regards to your last screenshot. You cannot delete the macro from there. The macro contains many routines and functions. If you need to delete the macro, you need to enter the VBA editor (the edit button or Alt-F11). There you can remove the file.

Toxaris
03-20-2013, 05:29 AM
Wow!! You have REALLY short weeks in the Netherlands!! :D

Unfortunately not, but I had some spare time and this was actually a lot easier than than I expected. Some small quirks though. Ranges work differently for bookmarks (sigh...).

Then again, I said later this week...

steppe
03-20-2013, 08:23 AM
Fixed. The first error is a simple one and was easy to fix. It actually doesn't matter how many links are in the document.

Thanks. Now the macro completes the job successfully, but one tiny detail is unfinished: the references miss the # sign :)

With regards to your last screenshot. You cannot delete the macro from there. The macro contains many routines and functions. If you need to delete the macro, you need to enter the VBA editor (the edit button or Alt-F11). There you can remove the file.

Oh I see. Removed the old macro successfully now. :thumbsup:

Toxaris
03-20-2013, 10:58 AM
Thanks. Now the macro completes the job successfully, but one tiny detail is unfinished: the references miss the # sign :)

Have I said before I never used them? :D Silly mistake and solved in the version of the startpost. Only the addition of 5 characters... If you don't want to download again, you can change the following line:

addr = hyper.SubAddress 'internal hyperlink


in:
addr = "#" & hyper.SubAddress 'internal hyperlink

steppe
03-20-2013, 03:13 PM
It's all good now. Thanks a lot, Tox, for your patience and hard work.

Big Kev
03-31-2013, 09:07 PM
Hi Toxaris and all other ePub experts

Could you give a newbie a little guidance please.

I would like to start creating eBooks commencing with a 120 page family history I have put together which consists of text and images. As a complete newbie I would like to do it the best I can without getting into any bad habits from the start.

I have done a couple of the w3 courses and Pablo's tutorial and whilst I still have no experience with HTML, etc. I think I have sufficient understanding of the concepts to give it a go and at least know what questions I will need to ask as I go along.

Currently I have the raw final manuscript in MSWord (2007) format with no images inserted. (I also have the printed version of the book which was printed by an online publisher.)

I also have Sigil and Toxaris's macro downloaded and ready to go but am still a little confused about the best way to prepare my Word document for import to Sigil. More accurately it is not clear to me how far I should go with formatting in my Word document before running Toxaris's macro and moving over to Sigil.

After researching this forum my plan of attack is:

1. Strip the Word document of the minimal formatting it contains (including blank lines between paragraphs.

2. Format paragraphs in Word (No indent, small blank space between para's, justified) using custom named style.

3. Insert images and captions in Word.

4. Run Toxaris's macro and open resulting HTML file in Sigil.

5. Attend to metadata, headers, cover, TOC, splitting chapters in Sigil.

6. Produce ePub.

7. Convert ePub to other formats in Calibre.

So before I make a start I would really appreciate any advice/suggestions regarding my proposed approach as my ignorance may well mean I am wide of the mark.

Regards
Kevin

Toxaris
04-01-2013, 04:43 AM
Sounds about right. A couple of pointers though. If you have blank lines between sections, you can leave the blank line. It will be converted to a blank line. You can also use a placeholder there if you want it later to be done by margins.
Don't forget that the macro will not create a stylesheet, but a reference to it. If the names in the stylesheets are the same as in Word, the references are correct.
Check the images afterwards, sometimes something goes wrong with the numbering, probably something typical Word. I personally place the pictures in Sigil.
Take note that there is almost no way to force captions on the same page as the picture. At least not a method supported by all readers.

mrmikel
04-01-2013, 11:20 AM
You can try to force images onto the same page as the caption by putting them in a paragraph together. But you can create a problem with it bleeding over onto the next page that way, because the whole thing is too big for a page.

If they must stay together, use your graphics program and put them on the picture. But this lacks flexibility, is a lot of work and may not be appropriate for some pictures. However, it does have a sort of old-time appeal. They used to write the date and so on right on the picture in a number of older ones I have seen.

Since it seems primarily for your family's use, whatever you choose should govern.

You might want to consider how you would add pictures which are offered to you of members in earlier stages of your book once members of your family have seen it and how you would add pictures of the next generation and beyond, if this is to be an ongoing thing.

Big Kev
04-01-2013, 08:01 PM
Hi Toxaris & mrmikel

Thanks for the comments. I will now make a start with some comfort that at least I am on the right track. I'm sure there will be more questions as I progress.

With regard to the images I thought that there may have been a problem with captions. Whilst it will mean a little extra work I think your suggestion of adding the caption to the image is the safest way to go.

As you indicated this is easily done with appropriate graphics software and for those who don't like the look of the caption in the original image you can always locate it outside the original image and create a new image. Like in the attached example which shows both options.

Adding images to this particular book won't be a problem as it's specific subject matter will not require any further images. Notwithstanding it does have a family photo album at the end which lends itself to future additions if necessary.

Thanks again for the advice
Regards
Kevin
103784

Well I certainly stuffed that up, I'll have to learn how to insert images properly. Notwithstanding I think you get the point. The "Boy Tom" below the image actually has a transparent background so it appears as a caption normally would in a book. It doesn't show well when enlarged here as the default background seems to be black which obviously doesn't go to well with black text. Now I'm going to leave it alone before I muck it up any more.

mrmikel
04-01-2013, 10:45 PM
One parting shot is to preview what you are doing on the likely device they will be using and try changing text sizes or anything else that is adjustable to make sure you are happy with it. None of the readers or software is perfect, so you don't to stumble on one of the oddities of the particular device.

Big Kev
04-01-2013, 11:09 PM
Hi mrmikel

Thanks for that. Thought that would be the final step.

Regards
Kevin

steppe
04-20-2013, 08:33 AM
7. Convert ePub to other formats in Calibre.

If you wish to convert to the Kindle format, IMO, the best course of action is to upload your EPUB file directly to KDP. This will give you Amazon-generated .mobi file in the "Downloadable Previewer," which you can publish as your final Kindle book.

Calibre produces .mobi files that can be rejected by KDP, especially if you try the new AZW3 format. The old MOBI format (KF6) generated by Calibre is less likely to be rejected, but I have seen enough formatting problems with this sort of .mobi files on the KDP Formatting forum to not recommend this method for publishing on KDP.

steppe
05-04-2013, 08:13 AM
Hi again.

Thanks for this great macro, it helped me immensely during my last ebook project.

When I run the Macro in Word 2010, I click either "No" or "Cancel" for all questions except the last one "Save HTML?" where I click "Yes".

I seem to be having the following problems with the Macro:

1) unordered lists are converted to ordered lists

2) some paragraphs fail to convert, and become a simple hard return (new line) in HTML code

3) page breaks are ignored; I find that the best way to code a page break for Kindle books is
<br style="page-break-after:always;" />
because it produces an empty line in the "Click to LOOK INSIDE" feature on Amazon.com.

4) Word's non-breaking spaces (Ctrl+Shift+Space) are not converted to &nbsp;

Am I doing something wrong with the Macro? Would it be possible to fix some of those issues?

Just in case, I am attaching a test file, with the problem areas highlighted in bubblegum pink color.

:thanks:

Toxaris
05-04-2013, 01:42 PM
Ad 1. The lists might be a typo, will check that one out.
Ad 2. I hope that there are examples in the test files, otherwise I would need some examples.
Ad 3. Correct. That is intentional, since a page break does not make a lot of sense and post activities are required anyway. You need to do splitting etc. later on anyway. No intention in changing this. Also a lot of readers can act funny on the <br /> tag.
Ad 4. Probably, I never use those. Will check it out, should not be a big issue.

I will look into it, perhaps an update later today or tomorrow. I am working on a big project right now, an Word Add-in with a lot of helpful procedures, including this one. No messing around with Macro's anymore.

Toxaris
05-04-2013, 06:22 PM
Looked at it. Number 1 will be a pickle. I hit some Word strangeness. Every type of list is regarded the same in VBA, so it seems that there is no way to make distinction. That would be a real bummer, but I do no tend to give up yet.
Number 2 is caused by the fact that you sometimes have a 'space enter'. The way the routine is built up, will actually cause that it will not be seen. I will remove spaces before enters.
Number 3 will not be implemented.
Number 4 is implemented in my version already, that one was easy.

steppe
05-05-2013, 08:33 AM
Thanks, Tox. Much appreciated.

Toxaris
05-06-2013, 01:33 PM
I have uploaded a new version. There is something strange with the lists. If I create a document with the lists manually or in the test document remove the list and then reapply the list, the function works fine. It seems that there is something not quite right, but I cannot make the distinction in Word as it is now. The only advice I can give is to check your lists afterwards. Perhaps I will find a fix later.

EinfachNurIch
05-11-2013, 12:08 PM
There is a typo in the macro. In the function LoadArrays "&dbquo;" instead "&bdquo;".

Toxaris
05-11-2013, 01:32 PM
Right you are.

steppe
05-13-2013, 05:20 AM
I have uploaded a new version. There is something strange with the lists.

Thanks for fulfulling the requests. Regarding lists, maybe this has to do with different versions of Word. Anyways, changing <ol> to <ul> by hand is not that difficult if you know in advance that the problem may occur.

steppe
05-13-2013, 05:37 AM
This is not really a problem, but the new version of the Macro creates a folder Save_As_HTML.files (attached). This folder is not mentioned in the HTML file.

Toxaris
05-13-2013, 05:52 PM
Correct, it is part of the conversion process to determine which images are there. They should be deleted afterwards, but that does not always work. Do not blame me, blame MS. Handling images is handled poorly in VBA... So much better in VB.Net or C#.
I do not recommend using those images anyway, but insert them in Sigil manually. The quality is usually less than what you get from OCR. Therefore I do enter the links to the images in Sigil style and no real link to the actual image.

steppe
05-23-2013, 06:56 AM
Do not blame me, blame MS. Handling images is handled poorly in VBA... So much better in VB.Net or C#.

I am not blaming you :) Thanks for the explanation.

cathytas
05-27-2013, 02:03 AM
I'm a Mac user and new to this. Do I just drag the macro onto my desktop? Where does it work from? Very nervous about altering Word and then not able to delete it.

Toxaris
05-27-2013, 02:48 AM
You need to add it to Word via the Visual Editor. On Windows this works via Alt-F11, no idea how this works on a Mac.

steppe
06-12-2013, 04:02 AM
I'm a Mac user and new to this. Do I just drag the macro onto my desktop? Where does it work from? Very nervous about altering Word and then not able to delete it.

Unpack the .zip file that you downloaded from the first post in this thread. Launch Microsoft Word, hold down the "Option" key and press the "F11" key. The Visual Basic window will appear. In the left-hand panel, click the "Normal" project, then click the menu "File" of the Visual Basic window and select "Import file." Navigate to Toxaris's .bas file, select it, and press "OK." Close the Visual Basic window. Open your document in Word, select the "View" ribbon, and click the "Macros" button. In the dialog window, select "Transform_HTML" and press "Run." Click either "No" or "Cancel" when presented with various questions, except for the last question "Save HTML?" where you have to select "Yes."

If your version of Word doesn't have the "ribbon interface" (Word 2008 for Mac or older), then you can download a free 30-day trial of the latest Microsoft Office. The macro works fine in the latest Word for PC.

varlog
06-12-2013, 06:52 PM
As I left Windows behind me for several years now I've tried your macro with LO on Linux. No go of course. Some comments? A pointer?

Toxaris
06-13-2013, 05:04 AM
As I left Windows behind me for several years now I've tried your macro with LO on Linux. No go of course. Some comments? A pointer?

Sure... It only works on MS Word. It can never work on LO/OO, since that is a totally different program...

varlog
06-13-2013, 05:07 PM
Being simple as I am I just thought it is Basic program operating on a text document - so the "only" problems are compability of program and document. From your answer I take it nobody played (me excluded:)) with your macro in LO yet.
I added:
Option VBASupport 1
Option Compatible
I get:
BASIC syntax error.
Variable CharFonts already defined.

which is Basic incompatibility probably.
If time allows, your permission presumed, I'll play some more...

Notjohn
06-17-2013, 06:45 AM
This is not an in-depht analysis, but a quick check shows, that it works well in W2003. So W97 shoudn't cause much problems.

It's great !!!!!!:thanks:

Is there any hope for Word 2000?

Thanks - NJ

Toxaris
06-17-2013, 08:36 AM
All I can say is try. I don't have copies of Word 2000 for many years now. It might work. It is not depending on ribbons or alike.

DaleDe
06-17-2013, 02:20 PM
All I can say is try. I don't have copies of Word 2000 for many years now. It might work. It is not depending on ribbons or alike.

It certainly works in Word 2002 (XP). Just to head off yet another question. It is easy to try.

Dale

mrmikel
06-17-2013, 03:02 PM
Sure... It only works on MS Word. It can never work on LO/OO, since that is a totally different program...

I believe OO/LO do macros by Java, so a completely different language.

elibrarian
06-17-2013, 05:23 PM
I believe OO/LO do macros by Java, so a completely different language.

Nope, LO does BeanShell, JavaScript (not Java), Python and LibreOffice Basic, which has a VBA Compatibility option, but the the VBA macros will still need som tweaking to run (I also prefer the original Microsoft VBA - probably because I've done macros in Word for over 15 years (back then VBA was an Excel-only thing, the Word macros was rather different - ah, those memories ...), but only just "dipped my toes" in LibreOffice, and don't find it very intuitive compared to VBA.

But to conclude: Toxaris' brilliant macro will not run as-is - but it is probably doable with a little effort. (Any LibreOffice macromakers around here :) ?) It should be LibreOffice, OpenOffice has a sort of Basic too, but it is not up to par with LO, and the Apache developer team may be brilliant, but they're not fast ...

Regards,

Kim

mmoleon
07-22-2013, 02:55 PM
Hi Tox:
I download your file but I get lost on the other steps.
Should I create a style sheet?
Could you be patient and explain step by step?
I really appreciate it. Thanks.
I try to import with "filtered html" but is a mess.
Look at this:
<p class="MsoListParagraphCxSpFirst" style="margin-left:0in"><span style="font-size:12.0pt;line-height:115%;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;" xml:lang="ES-TRAD">&nbsp;</span></p>

<p class="MsoListParagraphCxSpLast" style="margin-left:0in"><span style="font-size:12.0pt;line-height:115%;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;" xml:lang="ES-TRAD">Vivir en Miami Dade County y trabajar en Miami Dade College suponen convivir a diario con personas de toda América Latina y el Caribe, que llegaron a Estados Unidos en busca de seguridad y prosperidad y terminaron asentándose en una ciudad que es hoy más latina que anglosajona.</span></p>

<p class="MsoNormal"><span style="font-size:12.0pt;line-height:115%; font-family:&quot;Times New Roman&quot;,&quot;serif&quot;" xml:lang="ES-TRAD">Miami Dade es uno de los condados de Estados Unidos que más alta representación demográfica de latinos muestra en comparación con el número total de habitantes. El Censo de 2010 indicaba que el 70% de los habitantes de Miami Dade County son de origen hispano. En el Miami Dade College más del 80% de los estudiantes tienen igual procedencia.</span></p>

<p class="MsoListParagraphCxSpFirst" style="margin-left:0in"><span style="font-size:12.0pt;line-height:115%;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;" xml:lang="ES-TRAD">En Miami cohabitan, como en cualquier urbe latinoamericana, el empresario hispano exitoso y acaudalado con otros emigrados humildes, como el trabajador de un restaurante, el profesor de una escuela o universidad, los artistas y locutores de la radio y TV local hispana y los refugiados recién llegados de países donde ocurren desastres naturales, o donde dictaduras arbitrarias los empujan a buscar mejores perspectivas en otro país. Miami es una suerte de “Ellis Island hispano” de EEUU a inicios del nuevo milenio.</span></p>

Toxaris
07-22-2013, 03:28 PM
I don't know what you did, but you did not use my macro. That output cannot come out of it. This seems regular filtered HTML output from Word.

You need to run the macro in order to have the clean output. A stylesheet needs to be provided by you, the macro will not create it.

You can also try my add-in. It has much more features and can prepare an ePUB. The ePUB needs to be touched up (or at least I think so), but the start is there. That one will also provide a simple stylesheet or you can provide your own.

jation
11-14-2013, 09:24 AM
Thanks, everything works fine. But I have one question:

Is possible to keep scene changes from original docx
Example:
Test line 1. Test line 1. Test line 1. Test line 1.

Test line 2. Test line 2. Test line 2. Test line 2.

I always get:

Test line 1. Test line 1. Test line 1. Test line 1.
Test line 2. Test line 2. Test line 2. Test line 2.

Hitch
11-14-2013, 03:13 PM
Thanks, everything works fine. But I have one question:

Is possible to keep scene changes from original docx
Example:
Test line 1. Test line 1. Test line 1. Test line 1.

Test line 2. Test line 2. Test line 2. Test line 2.

I always get:

Test line 1. Test line 1. Test line 1. Test line 1.
Test line 2. Test line 2. Test line 2. Test line 2.

Hi, jation:

You know, when you use the macro, it will indicate scene-breaks (like what you've displayed) with a "[scbreak]" which you can optionally highlight in Red. I leave them in, and then regex them out, using the correct style for the following paragraph, in either the output HTML or the output ePUB (whichever way you go).

Does that help? I actually find the [scbreak] indicator more useful than simply leaving an extra empty paragraph which would have to be coded properly to work in any event.

HTH,
Hitch

Toxaris
11-14-2013, 03:17 PM
Thanks, everything works fine. But I have one question:

Is possible to keep scene changes from original docx
Example:
Test line 1. Test line 1. Test line 1. Test line 1.

Test line 2. Test line 2. Test line 2. Test line 2.

I always get:

Test line 1. Test line 1. Test line 1. Test line 1.
Test line 2. Test line 2. Test line 2. Test line 2.
Depends. If it done via margins, then no. I do not check that kind of formatting at all. If it is done by a simple enter, then yes.
However, there is another way. If you have that, you probably also have named styles. You can retain those stylenames in the HTML. If you then also setup the margins in your stylesheet, the margins are retained.

I would actually recommend not using the macro anymore, but the add-in if you can. That has many more features and will be updated every now and then.

Hitch
11-14-2013, 03:32 PM
Depends. If it done via margins, then no. I do not check that kind of formatting at all. If it is done by a simple enter, then yes.
However, there is another way. If you have that, you probably also have named styles. You can retain those stylenames in the HTML. If you then also setup the margins in your stylesheet, the margins are retained.

I would actually recommend not using the macro anymore, but the add-in if you can. That has many more features and will be updated every now and then.

OOOPS!

Sorry--I answered about the wrong program of Toxaris'. My bad. I clicked here from my CP and didn't realize that this was about the clean HTML code, as opposed to the ePUB Tools program. @jation: please ignore my last, but it's accurate about Toxaris' ePUB Tools program.

Sorry again,
Hitch

jation
11-15-2013, 07:07 AM
@Toxaris
Thanks for info and great macro/add-in, I didn't notice e-Book Tools thread when searching for cleaning html. I will try today.

Thanks!!!

MalcolmRM
05-19-2014, 11:44 AM
Hi Toxaris: I took your advice, downloaded Transform_html2.8.eng.bas, and installed it in Normal.dot. But I get a syntax error on the very first line! Word doesn't like the word Attribute. Am I meant to delete the first 3 lines and start right in on the first Sub?
MalcolmRM

Toxaris
05-19-2014, 03:22 PM
You can delete the Attribute line if you want, it is just name the file right. The other lines I would not delete to prevent issues.

However, I would advise not using the macro at all to be honest. I don't actively maintain it anymore. If you can, I advise you to look at my add-in. It has much, much, much more features and can create an ePUB directly as final step. The add-in is maintained and improved regularly. An update can even be expected within a week (I just need to update the manual before the release).
Be aware that it will only work if you work on Windows and have Office 2007 or later.

MalcolmRM
05-23-2014, 06:32 AM
Thanks, Toxaris. Hitch emailed me directly with the same advice so I downloaded the manual yesterday! I'll keep an eye peeled for the new version and I guess I'll just have to bite the bullet and move to a ribbon version of Office.

boki
08-08-2014, 07:13 AM
I translate macro in slovenian and add code for create xHTML compatibile head.
Macro is wor with no problem.