Glad to be of help, GT. I am fairly new to ebooks, HTML and Calibre myself and have spent many a long hour trying to find ways to get nice, neat HTML out of Word so I could store it in Calibre. The solution I suggested was the biggest help. I'll also pass on a few more findings in case they are of use to you or anyone else.
1. If you style your main text paragraphs with the Word Style "Normal(Web)" rather than "Normal" then the HTML output will look like
<p>This is my paragraph</p>
instead of
<p class=MsoNormal>This is my paragraph</p>
which is easier to read if you have to edit the HTML afterwards.
2. Make sure you apply Word's built-in Styles "Heading 1", "Heading 2" etc to style your Titles and Chapter headings, as these will result in neat HTML like
<h1>My Book Title</h1>
<h2>Chapter 1</h2>
You can then use these h1, h2 etc tags to tell Calibre how to detect chapters and page breaks.
If you don't like the existing Word "Heading n" styles then modify the Word style, don't be tempted to modify your text directly.
3. If you want to go a step further, you can create your own standard CSS file containing all your styling info. Once you've got it just right for your needs you don't need to touch it again. Just link to it in each Word doc before saving as type Web-filtered. Then you can strip out all the generated HTML that Word produces between (and including) the
<style> and </style> tags
(and that can be an awful lot of code!) which makes for a smaller HTML file.
4. I didn't find RTF very satisfactory as a format for storing in Calibre. The main reasons being that some formatting was lost when converted to LRF/EPUB, namely, centre- and right-alignment, graphics, line-breaks. I don't know whether this is still the case.
Jackie
P.S. I think the problems you were having with the Street names etc were to do with Word's smart-tags "features". You could investigate switching these off.
Last edited by jackie_w; 10-15-2009 at 07:20 AM.
|