MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   ePub (https://www.mobileread.com/forums/forumdisplay.php?f=179)
-   -   Clean HTML from word For EPub (https://www.mobileread.com/forums/showthread.php?t=224396)

holdit 10-09-2013 02:33 AM

Clean HTML from word For EPub
 
I have a word doc which I saved as html, I need to know how I can cleanup the html while retaining the same styling that was originally done in word so I can create an Epub in Sigil

Any help?

HI

theducks 10-09-2013 05:41 AM

Moderator Notice
This is still not a Sigil question. This is a WORD-EPUB question. Moving to EPUB per your reply to the previous thread

Toxaris 10-09-2013 06:28 AM

You can clean it up manually (there are some guides out there) or use other tooling to create either a clean HTML or create an ePUB out of Word.

mrmikel 10-10-2013 09:34 AM

Toxaris has modestly refrained from tooting his own horn for a macro he has created which is shown in his signature line and which a number have reported useful.

It does NOT have the capacity to change the essential nature of epubs....nothing does. Epub text is reflowable, so if you want something on a specific position on a certain page you are SOL (simply out of luck.) There are other limitations such as the aggravation of tables and the fact it takes a enormous amount of work to index because of all the links which have to be created and indexing to multiple return locations is an invitation to insanity. Sigil has a function which helps with indexing. Everything on epubs will vary on different devices and this is a headache for Hitch who heads a company which produces epubs.

For some devices there are fixed layout epubs, which are not full featured. It is a bit like going to beach and complaining that the sand is not solid concrete.

Toxaris 10-10-2013 01:02 PM

I would actually not recommend the macro, but the add-in if you can run it. It offers much more features and will actually allow you to create an ePUB from a Word document, ready to be finalized with Sigil.

bladex01 10-17-2013 03:36 AM

I use TextPipe to automatize cleaning process. :thumbsup:

Notjohn 10-18-2013 01:34 PM

I run my Word docs through word2cleanhtml.com online. Requires a template and preferably a style sheet, both of which are on my blog:

The blog: <a href="http://notjohnkdp.blogspot.com">Notjohn's KDP Guide</a>

DaleDe 10-18-2013 01:47 PM

You can use Atlantis Word Processor to read you doc file and create an ePub directly from the app. It will retain your formatting and make a clean ePub. You can read about it in our wiki and I am working on a review AWP Review.

mncowboy 10-20-2013 10:03 AM

If you are going to use the cleaned up file in Sigil, or another tool that uses external style sheets, the process is quite easy.
Make sure that all (I mean ALL!) text in the Word document has been formatted using styles.
Create an externall .css file that contains the styles you used in the Word document.
Save the Word document as a filtered web file.
Open up the resulting file in an editor and delete everything from <style> to </style>.
Insert a link to the stylesheet and you are done!
Bob

Toxaris 10-20-2013 01:08 PM

Quote:

Originally Posted by mncowboy (Post 2660592)
If you are going to use the cleaned up file in Sigil, or another tool that uses external style sheets, the process is quite easy.
Make sure that all (I mean ALL!) text in the Word document has been formatted using styles.
Create an externall .css file that contains the styles you used in the Word document.
Save the Word document as a filtered web file.
Open up the resulting file in an editor and delete everything from <style> to </style>.
Insert a link to the stylesheet and you are done!
Bob

That is not clean. It is still full of unneccesary spans and font statements at least.

mncowboy 10-21-2013 08:00 AM

Toxaris: I have not found that to be true. As long as I do everything using styles, there are now extra spans or fonts, even having Word create the TOC and endnotes.
The only extra thing I deal with is replacing <p class = "msnormal"> with just <p>.
I am using Word 2010 on a Windows machine.
Bob


All times are GMT -4. The time now is 09:10 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.