View Single Post
Old 07-25-2019, 08:48 PM   #7
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by lizarddreaming View Post
[...] Copy and paste the Word doc into the Book Viewer as "plain" text, find and replace all the extraneous <p><br/><p> and such, and it was really good at getting rid of the other crap that Word tries to insert.

[...]

it would take me days to remove all the extra stuff it had in there from Word when I loaded it into Sigil).
First, if you're using Word, you should be learning how to use Styles. In this post, I linked to a few videos on why/how to use Styles.

Along with making your life so much easier, it will also help any of these methods with outputting cleaner HTML from your Word documents.

Method #1 (Highly Recommended)

Best way from Microsoft Word->EPUB is Toxaris's EPUB Tools:

https://www.mobileread.com/forums/sh...d.php?t=213372

Whenever I work from DOCX, this is the method that I use.

Toxaris's conversion mentality is to strip out as much garbage as possible, and give you super clean, minimal HTML.

Note: Sadly, this is a Windows-only addon. Word on Mac =/= Word on Windows.

Method #2

There are also a few Sigil plugins that help with DOCX import + cleanup:

DOCXImport
CustomCleanerPlus

Method #3

You can also feed that DOCX->EPUB using Calibre.

If you don't use Styles, you'll get a mess of calibre## styles. And if you did use Styles (which you should), you get pretty clean code out of it.

Calibre's conversion mentality is more: Garbage In, Garbage Out.

This is definitely a few steps above Word Filtered HTML though.

Method #4 (Not really recommended)

If you want another more minimal conversion, save your DOCX as RTF, and convert RTF->EPUB via Calibre.

That would carry over the bold/italics, while throwing out a lot of the other extraneous formatting.

This is a method I used to use, but you'll still have to go adding in headings and other more complicated formatting.

It'll probably be a million times better than your current method though, where you're manually readding the bold/italics.

But this would be better handled by using Word Styles in the first place, and cleaning up your source documents.

Quote:
Originally Posted by lizarddreaming View Post
Any advice or suggestions would be GREATLY GREATLY appreciated!
Learn Styles! Learn Styles! Learn Styles!

Quote:
Originally Posted by lizarddreaming View Post
Okay, I did get the macro to work after some research (never used macros on Word before, so had to figure out how to get it in there and then to run it).
You keep saying "the macro". What macro?

Last edited by Tex2002ans; 07-26-2019 at 04:40 AM.
Tex2002ans is offline   Reply With Quote