Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 10-02-2010, 10:57 AM   #1
PatNY
Zennist
PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.
 
PatNY's Avatar
 
Posts: 1,022
Karma: 47809468
Join Date: Jul 2010
Device: iPod Touch, Sony PRS-350, Nook HD+ & HD
MS Word "crap" at beginning of html files

Is there an easy way to clean up the styling "crap" that Word puts at the top of all html files? Like maybe through a regex search and replace?

Why does Word put a gazillion font items at the start of html files even when those fonts are not being used in the file? It makes it much harder to edit the files in Sigil or any other editor. I manually selected the Word stuff in one file and deleted it and there didn't seem to be any negative impact, so I assume it's safe to delete it. But how does one do it for every file?

I try to stay away from Word for epubs, but I find if a file needs heavy editing and also needs a new TOC, it's the easiest thing to use. I save my files as filtered html.

Maybe if I run the file through another utility first it will clean up that "crap'?

Anyone have any suggestions?
PatNY is offline   Reply With Quote
Old 10-02-2010, 11:04 AM   #2
Fabe
Dylanologist
Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.
 
Fabe's Avatar
 
Posts: 200
Karma: 146754
Join Date: Apr 2010
Location: Hanover, New Hampshire, USA
Device: none/all/any
PatNY - I use Word but never its HTML. After I do the editing and add the tags I want in Word, clear all the formatting, select all, copy, and paste it into a simple text editor where I save the file as Unicode (UTF-8) with an HTML extension. I then open this file in Sigil and finish my ePub work there.

Does this idea help? - Fabe
Fabe is offline   Reply With Quote
Advert
Old 10-02-2010, 11:30 AM   #3
PatNY
Zennist
PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.
 
PatNY's Avatar
 
Posts: 1,022
Karma: 47809468
Join Date: Jul 2010
Device: iPod Touch, Sony PRS-350, Nook HD+ & HD
Fabe, I am not sure what you are suggesting. Do you mean you add html code to the text while in Word? If so, that would not be easy for me as I am not very experienced with html coding.

The reason I use Word is because of its style gallery and easy way of applying styles (headers) which is a snap. The document map pane also makes it easy to navigate, and I like its user-friendly search and replace.

But if I use the gallery to apply header styles, then clearing all the formatting as you suggest will undo that, correct?

Also, some html books I have are linked to image files. If I do what you are suggesting, won't I lose all those links?

BTW, I love your part of the country. I was in Hanover just this summer and had a wonderful time. It's beautiful up there.
PatNY is offline   Reply With Quote
Old 10-02-2010, 11:57 AM   #4
Fabe
Dylanologist
Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.
 
Fabe's Avatar
 
Posts: 200
Karma: 146754
Join Date: Apr 2010
Location: Hanover, New Hampshire, USA
Device: none/all/any
Quote:
Originally Posted by PatNY View Post
(1)I am not sure what you are suggesting. Do you mean you add html code to the text while in Word?

(2)I use Word because of its easy way of applying styles (headers). The document map pane also makes it easy to navigate,

(3)I like its user-friendly search and replace.

(4)if I use the gallery to apply header styles, then clearing all the formatting as you suggest will undo that, correct?

(5)Also, some html books I have are linked to image files. If I do what you are suggesting, won't I lose all those links?
PatNY -
1. Yes, I add some of the HTML code in Word first. A prime example is paragraph marks. Example:
Here is a body of text pasted into Word and made generic by clearing all formatting. Here is a body of text pasted into Word and made generic by clearing all formatting. Here is a body of text pasted into Word and made generic by clearing all formatting.
Here is a body of text pasted into Word and made generic by clearing all formatting. Here is a body of text pasted into Word and made generic by clearing all formatting. Here is a body of text pasted into Word and made generic by clearing all formatting.
Here is the same text with paragraph markings added with Find and Replace:
<p>Here is a body of text pasted into Word and made generic by clearing all formatting. Here is a body of text pasted into Word and made generic by clearing all formatting. Here is a body of text pasted into Word and made generic by clearing all formatting.</p>

<p>Here is a body of text pasted into Word and made generic by clearing all formatting. Here is a body of text pasted into Word and made generic by clearing all formatting. Here is a body of text pasted into Word and made generic by clearing all formatting.</p>
Find = ^p
Replace = </p>^p^p<p>

2. Yes, I can not argue with the ease of use, but Word was designed for the printed page with HTML web pages added as an after thought, hence the "crap." Word does not generate "good" html for Sigil to turn into ePubs.

3. Yes, I primarily use Word for its Find & Replace sophistication. Several people have urged me to use TextWrangler instead. It is a fine program, but I feel like I need an engineering degree to use it, so I stick with Word.

4. Yes, "clear formatting" will undo styles.

5. I handle graphics in Sigil. But I must say, I stay away from embedded graphics as much as possible.

- Fabe
Fabe is offline   Reply With Quote
Old 10-02-2010, 05:12 PM   #5
rlauzon
Wizard
rlauzon put the bomp in the bomp-a-bomp-a-bomp.rlauzon put the bomp in the bomp-a-bomp-a-bomp.rlauzon put the bomp in the bomp-a-bomp-a-bomp.rlauzon put the bomp in the bomp-a-bomp-a-bomp.rlauzon put the bomp in the bomp-a-bomp-a-bomp.rlauzon put the bomp in the bomp-a-bomp-a-bomp.rlauzon put the bomp in the bomp-a-bomp-a-bomp.rlauzon put the bomp in the bomp-a-bomp-a-bomp.rlauzon put the bomp in the bomp-a-bomp-a-bomp.rlauzon put the bomp in the bomp-a-bomp-a-bomp.rlauzon put the bomp in the bomp-a-bomp-a-bomp.
 
rlauzon's Avatar
 
Posts: 1,018
Karma: 67827
Join Date: Jan 2005
Device: PocketBook Era
Quote:
Originally Posted by PatNY View Post
Is there an easy way to clean up the styling "crap" that Word puts at the top of all html files?

Anyone have any suggestions?
I usually run it through Open Office.org. it cleaned it up nicely.
rlauzon is offline   Reply With Quote
Advert
Old 10-04-2010, 09:13 AM   #6
DTM
Intentionally Left Blank
DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.
 
DTM's Avatar
 
Posts: 172
Karma: 300106
Join Date: Feb 2006
Location: Royal Oak, MI, USA
Device: Nook STR
Quote:
Originally Posted by Fabe View Post
PatNY - I use Word but never its HTML. After I do the editing and add the tags I want in Word, clear all the formatting, select all, copy, and paste it into a simple text editor where I save the file as Unicode (UTF-8) with an HTML extension. I then open this file in Sigil and finish my ePub work there.

Does this idea help? - Fabe
I do exactly the same thing. The attached file will unzip into a Word template file that contains several macros. Use this for the document you want to convert to HTML, then run the macro called Word2HTML. It will clean up the double-paragraph markers and end-of-line paragraph markers you commonly get in text documents, mark word heading1 - heading5 with <h1> - <h5>, replace special characters with escape codes, double-hyphens with em dashes, and more. (If you don't want to do all of the operations--see CAUTION, below--you can run the other individual macros one at a time, if you prefer.)

Now save the document as a text file, add the proper <html>, <body>, etc. tags at the top and bottom, and you'll have something fit for a clean import into Sigil. Hope this helps.

CAUTION: This is quite useful, but not perfect and so is provided as-is, no warranty, use at your own risk, and the other usual disclaimers. It assumes you have a double-paragraph mark between paragraphs, as is common for Gutenberg and other text files. If you actually have just a single paragraph marker at the end of each paragraph, it'll turn the whole document into one huge paragraph. It also clears "unnecessary" white space, so if you have a table or tabs/spaces at the start of a paragraph, or other such formatting, you'll lose it. This is basically intended for documents that are paragraphs of text with chapter headings.
Attached Files
File Type: zip Word2HTML.zip (34.8 KB, 311 views)
DTM is offline   Reply With Quote
Old 10-04-2010, 11:22 AM   #7
Fabe
Dylanologist
Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.Fabe has survived committing the World's Second Greatest Blunder.
 
Fabe's Avatar
 
Posts: 200
Karma: 146754
Join Date: Apr 2010
Location: Hanover, New Hampshire, USA
Device: none/all/any
Thanks DTM. I have never been good creating Word macros, so I DO waste time with repetitive tasks. - Fabe
Fabe is offline   Reply With Quote
Old 10-06-2010, 08:05 PM   #8
PatNY
Zennist
PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.
 
PatNY's Avatar
 
Posts: 1,022
Karma: 47809468
Join Date: Jul 2010
Device: iPod Touch, Sony PRS-350, Nook HD+ & HD
rlauzon, thanks for the tip about open office. I tried it this evening and it cleaned up the html very well. Not only that, but it kept all the links to the images and everything imported intact into Calibre very nicely. And all you have to do is import the file into Open Office and then save it. Nifty trick!

dtm, thanks for the macros, I will give them a try, though I'm more inclined to use the Open Office fix as it's simpler.
PatNY is offline   Reply With Quote
Old 10-06-2010, 09:40 PM   #9
DTM
Intentionally Left Blank
DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.
 
DTM's Avatar
 
Posts: 172
Karma: 300106
Join Date: Feb 2006
Location: Royal Oak, MI, USA
Device: Nook STR
I don't have Open Office. Tried it years ago and was not impressed. Maybe it's time to try it again.
DTM is offline   Reply With Quote
Old 10-07-2010, 02:00 PM   #10
edbro
Banned
edbro is fluent in JavaScript as well as Klingon.edbro is fluent in JavaScript as well as Klingon.edbro is fluent in JavaScript as well as Klingon.edbro is fluent in JavaScript as well as Klingon.edbro is fluent in JavaScript as well as Klingon.edbro is fluent in JavaScript as well as Klingon.edbro is fluent in JavaScript as well as Klingon.edbro is fluent in JavaScript as well as Klingon.edbro is fluent in JavaScript as well as Klingon.edbro is fluent in JavaScript as well as Klingon.edbro is fluent in JavaScript as well as Klingon.
 
Posts: 640
Karma: 4911
Join Date: Jul 2007
Location: Grapevine, TX
Device: iPad4
I use Word for most of my ebook creations. Word files saved as Filtered Html and imported into Calibre. My epubs usually come out looking perfect so I gotta ask; what is the problem with having all the extra crap that Word adds?
edbro is offline   Reply With Quote
Old 10-07-2010, 02:11 PM   #11
DTM
Intentionally Left Blank
DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.
 
DTM's Avatar
 
Posts: 172
Karma: 300106
Join Date: Feb 2006
Location: Royal Oak, MI, USA
Device: Nook STR
Hey, if you're happy, go with it.

But there are a couple of problems that can result. First, it can hard-code font sizes, which you probably don't want to have happen. There's another thread all about that sort of problem.

And second, having the style information at the top of each file makes it a major headache to change anything. You have to do it in every chapter. It's much better to put it all in an external CSS file.
DTM is offline   Reply With Quote
Old 10-07-2010, 02:31 PM   #12
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,799
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by edbro View Post
I use Word for most of my ebook creations. Word files saved as Filtered Html and imported into Calibre. My epubs usually come out looking perfect so I gotta ask; what is the problem with having all the extra crap that Word adds?
Another point, is the files become massively bloated (as bad as 3x) and slow down performance on portable readers.
theducks is offline   Reply With Quote
Old 10-12-2010, 03:04 AM   #13
sassanik
Guru
sassanik ought to be getting tired of karma fortunes by now.sassanik ought to be getting tired of karma fortunes by now.sassanik ought to be getting tired of karma fortunes by now.sassanik ought to be getting tired of karma fortunes by now.sassanik ought to be getting tired of karma fortunes by now.sassanik ought to be getting tired of karma fortunes by now.sassanik ought to be getting tired of karma fortunes by now.sassanik ought to be getting tired of karma fortunes by now.sassanik ought to be getting tired of karma fortunes by now.sassanik ought to be getting tired of karma fortunes by now.sassanik ought to be getting tired of karma fortunes by now.
 
sassanik's Avatar
 
Posts: 774
Karma: 1211741
Join Date: May 2008
Location: Oregon
Device: EB1150, iPhone, Cool-er Purple, Pocketbook 360, Kindle Fire
save as rtf

I just save the document as a rtf instead of a word doc or a html file, then import to Calibre and convert to epub. This probably takes a bit longer since I can't directly import directly to Sigil, but it seems to work with minimal problems.

You are right that word does tend to add a ton of junk to its html.


Amy
sassanik is offline   Reply With Quote
Old 10-12-2010, 06:51 AM   #14
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 11,462
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
Quote:
Originally Posted by DTM View Post
I do exactly the same thing. The attached file will unzip into a Word template file that contains several macros. Use this for the document you want to convert to HTML, then run the macro called Word2HTML. It will clean up the double-paragraph markers and end-of-line paragraph markers you commonly get in text documents, mark word heading1 - heading5 with <h1> - <h5>, replace special characters with escape codes, double-hyphens with em dashes, and more. (If you don't want to do all of the operations--see CAUTION, below--you can run the other individual macros one at a time, if you prefer.)

Now save the document as a text file, add the proper <html>, <body>, etc. tags at the top and bottom, and you'll have something fit for a clean import into Sigil. Hope this helps.

CAUTION: This is quite useful, but not perfect and so is provided as-is, no warranty, use at your own risk, and the other usual disclaimers. It assumes you have a double-paragraph mark between paragraphs, as is common for Gutenberg and other text files. If you actually have just a single paragraph marker at the end of each paragraph, it'll turn the whole document into one huge paragraph. It also clears "unnecessary" white space, so if you have a table or tabs/spaces at the start of a paragraph, or other such formatting, you'll lose it. This is basically intended for documents that are paragraphs of text with chapter headings.
DTM:

Is this for Word/Office 2007? My older Word 2003 for XP doesn't recognize the template format. thought I'd try it on a itchy file I have, but it looks like I'll have to run it through the usual hoops--thanks anyway.

Thanks,

Hitch
Hitch is offline   Reply With Quote
Old 10-12-2010, 08:18 AM   #15
DTM
Intentionally Left Blank
DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.
 
DTM's Avatar
 
Posts: 172
Karma: 300106
Join Date: Feb 2006
Location: Royal Oak, MI, USA
Device: Nook STR
Quote:
Originally Posted by Hitch View Post
Is this for Word/Office 2007? My older Word 2003 for XP doesn't recognize the template format.
Yes, that was created under Word 2007. I opened it, saved it "down" to 97-2003 format, and have attached the resulting file. I have not tried it.
Attached Files
File Type: zip Word2HTML.zip (30.3 KB, 288 views)
DTM is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Content Kindle's "Topaz" Format Looks Like Crap Gideon Amazon Kindle 21 10-07-2010 10:35 PM
Kindle DX optimal "page" size - PDF or Word template guiyoforward Amazon Kindle 12 09-28-2010 07:05 PM
Sigil 024 and regular expressions on "all HTML files" WS64 Sigil 4 08-13-2010 07:33 PM
Microsoft Reader plugin "Read in" for Word doesn't load anymore K-Thom Reading and Management 15 04-17-2009 05:52 AM
"Beginning Ruby: From Novice to Professional" $10 teamonkey Deals and Resources (No Self-Promotion or Affiliate Links) 0 06-20-2008 03:05 PM


All times are GMT -4. The time now is 08:24 AM.


MobileRead.com is a privately owned, operated and funded community.