View Single Post
Old 12-31-2009, 05:24 PM   #5
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,252
Karma: 16544692
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
Hi PhyrePhox, Me again ...

Quote:
Originally Posted by PhyrePhox View Post
It appears that the original HTML was produced by Word, which has a reputation for producing gnarly code. A poor source indeed!
In my opinion, MSWord only produces poor HTML if you let it. HTML output can be greatly improved by
  1. Using MSWord styles correctly.
  2. Removing any incorrect hard line breaks before saving.
  3. Making sure the file is saved as type "WebPage-Filtered" to get simpler HTML without some of the MS "excess baggage".

Quote:
Originally Posted by PhyrePhox View Post
Is there a summary somewhere here of what html tags are meaningful for ebooks? Also, how can I feed the resulting html back into Calibre to convert to epub?
I'm afraid I know nothing about editing on a Mac as I have PC/Windows setup, but if you used MSWord as your editor-of-choice these would be the steps I'd take. Perhaps some of it can be "translated" into Mac steps.
  1. Open a new blank Word doc and import the Calibre-output HTML file you've already got.

  2. Try to remove the hard line breaks using the editor's Find-and-Replace for mass changes. If you're lucky, the "real" end-of-paragraphs may have a blank line immediately following, or the "real" start-of-paragraphs may have some leading blank spaces. I could elaborate on this if it was relevant to your particular file.

  3. Use one (or more) of the Word built-in Heading styles (e.g. Heading 2) to mark your chapter headings. Any paragraphs styled as "Heading 2" in Word are created with
    <h2> ... </h2> tags in the HTML output.
    Similarly, "Heading 1" creates <h1>...</h1> tags etc. Calibre can use these <h1>, <h2> etc tags during conversion to EPUB to specify the TOC.

    Any paragraph styled as "Normal" in Word outputs as
    <p class=MsoNormal>...</p> in the HTML output.

    Any paragraph styled as "Normal (Web)" in Word outputs as
    <p>...</p> in the HTML output.

    Any paragraph styled as "Plain Text" in Word outputs as
    <p class="MsoPlainText">...</p> in the HTML output -- which you've already come across. I'd restyle all of these as "Normal" or "Normal (Web)"

    Any text marked as Italic or Bold in Word is output as
    <i>...</i> or <b>...</b> in the HTML output.

    I tend to use <h1> for Book Title and Author and <h2>, <h3> for Chapters, Sub-titles.

  4. Save the doc as HTML (as detailed above)

  5. If you're proficient with CSS files I'd then open up the HTML file in a text editor and remove everything between the <style>...</style> tags and put in a link to an external CSS file which would contain all the styling I wanted, e.g. lines like :-
    body {font-size: 100%; font-family: serif; ... ...}
    h1 {...}
    h2 {...}
    p {...}
    .MsoNormal {text-indent: 1.5em; ...}

    If you're not good with CSS then leave the HTML alone.

  6. Once you're happy with the HTML then reimport to Calibre by drag-and-drop in the normal way or via the Edit-Metadata feature if you've already set up the book's metadata. Calibre will zip up the HTML file with any linked CSS file and/or images.

  7. Convert away ... Don't forget to specify the appropriate h1 h2 h3 levels in the "Structure detection" option.

Anyway, that's enough from me for the time being. I don't know how much is relevant for your circumstances but feel free to ask if you think I could help.

Happy New Year
jackie_w is offline   Reply With Quote