![]() |
#16 |
Sir Penguin of Edinburgh
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 12,375
Karma: 23555235
Join Date: Apr 2007
Location: DC Metro area
Device: Shake a stick plus 1
|
|
![]() |
![]() |
![]() |
#17 |
Boo-Frickety-Hoo-Erizer
![]() ![]() ![]() ![]() ![]() ![]() Posts: 251
Karma: 686
Join Date: Oct 2007
Device: Kobo Glo HD!
|
Did someone say Word?
(Bahoo-hoohoo-haha-haha). "Clean" html from Word isn't all that possible. Now, this isn't to say one can't use word to produce "viable" files that can (and do) convert well into ebook formats. But "Clean"? Noo, not in my observations. Personally, I got over being clean. I am most of the time happy to let Word mangle the styles it wants to embed as "css" into the html file all it wants. There's just waaay to much other usefulness in Word to overcome my fear of evil. Saving the file as [Web Page, Filtered] goes a long way of extracting the extra junk word generically implants - that's all there and well and fine if you need to reconstruct an actual Word Document with all of Word's formatting tricks intact from the html file. Which isn't the usual goal here - MobiCreator, if you import a real Word Document, converts it to a filtered html file before it converts it to a mobi file. If you just have to use CSS in Word, remember, it's css 1 only, and there are weirdnesses in css you can't construct using Word Properly (see my thread in the epub forum about using css in Word to make a drop cap work in an epub - it doesn't look like a drop cap in Word, but it works out ok in the epub). And Word is all too anxious to over-impose changes into the embedded overlay of the html file - just TRY to redefine "Normal" in css without forcing it into normal.dot and see what happens in your html file. Unless you intend to hand-re-edit the htm file after you've made it in Word, what what do you care if it's "clean"? Is it oh-so-much smaller? Is it really worth your time? Wouldn't better metatags be more useful in the long run when the formats change (again)? Or more care toward managing your stylesets and where to use them? pennypenny. -bjc p.s. be sure to check in on the word document properties - you might be surprised that Word could be embedding your work computer name, company name, logon name, things you really maybe don't want in the html meta-info. If, you know, you use work machines to do any of this. |
![]() |
![]() |
Advert | |
|
![]() |
#18 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,707
Karma: 32763414
Join Date: Dec 2008
Location: Krewerd
Device: Pocketbook Inkpad 4 Color; Samsung Galaxy Tab S6
|
Quote:
|
|
![]() |
![]() |
![]() |
#19 |
Boo-Frickety-Hoo-Erizer
![]() ![]() ![]() ![]() ![]() ![]() Posts: 251
Karma: 686
Join Date: Oct 2007
Device: Kobo Glo HD!
|
Not to pick on Sweetpea, but let's try something.
In the attached test.zip are html files of Sweetpea's post conjured by copying and pasting into Word, and the resultant mobi files. I saved them as [Full Web Page], and [Web Page, Filtered] out of Word. Sure enough, the html file for [Full Web Page] is twice as big as the [Web Page, Filtered] file. Funny thing: When I try to open the files in a browser, in the [Full Web Page] file I can't see the picture. Same thing in the mobi files - that's why the mobi file for [Full Web Page] is smaller. But, when I look at the html code in the filtered file, it's not too bad - the styles names are longer than [h1] etc., and since the styles straight off the web site are being expressed as modifications of existant styles on the fly, sure we could trim out some file size by defining the styles better. How much time do I have to do that? (zilch) But give up Word because we hate evil so bad? Not a chance.....in Notepad (or vi, or textpad, ted, whatever) I get to miss out on Selecting by Style, search and replace on invisible characters like hard vs soft carriage returns (do I remember the code? is this western or unicode?), grammar check, spell check, multi-columns, tables, picture embedding by drag & drop, MACROS, automated TOCs, just to scratch the surface. To make a toc in Notepad, I get to hand code it. To make a table in Notepad, I get to hand code it. To embed a picture in Notepad, I get to hand code it. To change all instances of a style in Notepad, I get to search and replace. I get to be the spellchecker/grammar check (semi colon rules, anyone?) I can go on all day. I'd rather have the machine assist me through my own ineptitudes than say "oh, i don't need any help here" and do it the hard way.....being all lazy and all as I am...... ![]() -bjc |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
clean HTML or PDF before mobi conversion in Calibre | mark235 | Calibre | 9 | 12-25-2010 09:37 PM |
BookDesigner HTML0 to clean HTML conversion utility | Pablo | Workshop | 15 | 08-24-2010 12:05 PM |
Clean and compress HTML before making ebook | eping | Workshop | 4 | 01-13-2010 07:51 PM |
Tool to easily clean and refurbish html-text before conversion | Pulp | Workshop | 3 | 10-13-2008 10:16 AM |
Docvert 2.0 converts MS Word files to clean HTML | Alexander Turcic | Lounge | 0 | 03-16-2006 04:50 AM |