View Single Post
Old 04-26-2017, 03:53 PM   #55
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 11,462
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
Quote:
Originally Posted by elibrarian View Post
I just couldn't resist - so here are my findings:

I think I've found the explanation. Whenever you apply some formatting - be it by keyboard shortcuts, setting a style or by a macro/vba program/whatever - Word (2013/2016, I'm on 2016) internally resets the style to the "DefaultParagraphFont" or in french, "Policepardfaut" style, before applying the new style. This is not visible to the naked eye, but you will see it when exporting to .odf and importing in Sigil (you can see something similar when exporting to unfiltered .htm and even in the internal .xml, if you feel brave enough to unzip a .docx-file and analyse the contents).

The DefaultParagraphFont (which is not the msoNormal-style as I asumed in a previous post) is special, and does not behave like any other (ordinary) style. It's normally hidden (in 2013/2016 anyway), and it can't be deleted. It's used for resetting text more or less to the bare necessities used in the actual paragraph.

There's an article concerning the DefaultParagraphFont here:

http://word.mvps.org/faqs/customization/DefParaFont.htm

with a link to an article about customizing the standard settings for the various Word versions up to and including 2016.

I enclose a couple of screenshots (please substitute "Policepardfaut"/"Defaultparagraphfont" for "Standardskrifttypeiafsnit". The first one is the unedited word file (I've used Roger Farney: Les Anekphantes from "Ebooks libres et gratuits"), exported to odt and imported in Sigil using the ODTImport 0.3.1 plugin by Doitsu. In shot nr. 2 I've done a little editing in the original word document before exporting (setting bold and italics using a style in the first paragraph and using keyboard shortcuts in the next). The third is an unfiltered htm-export from the same Word-document used as source for no. 2 (- just for the sake of illustration)

Attachment 156378

Attachment 156379

Attachment 156380.

So it seems you can't avoid the "Policepardefault" in the Word-exports to ODT (I tried importing some of the odt-files from "Ebooks libres et gratuits", and the "Policepardfaut" are also found in some of them, as is also seen in screenshot no. 1) The best solution - since the source of the problem isn't your template or the VBA-macro, but Word itself - is probably making a saved regex in Sigil and run it on all html-files before doing anything else.

Regards,

Kim

Kim, I have mad respect for your skills, but we do this ALL DAY LONG, and we don't get those spans, in exporting to HTML/XML/etc.. You didn't get them, unless I've misread your post, in the third image--where you exported directly to HTML. Right?

Are you saying that the exports to .odt format can't avoid the spans?

My take on this is that these are effectively junk spans, created because the paragraphs are unstyled. Period. If the file is made correctly, using normal named styles, these should not appear.

What happens if these files are styled as they should be, before these save-as and export experiments? Rather than just using "Normal?"

Here's my stupid question, then: why is the .odt file being subsequently exported to .odt, anyway? Hasn't the process been described as .odt-->Macro-->Word-->HTML?

The only time we use the default para (class) is, as Barnhill notes, when we're swapping out direct styling for spans and vice-versa, because it's useful in Word searches, if those are needing to be done there, instead of in HTML and/or INDD. We don't see this, unless we suffer some type of brain-seizure and export an uncleaned, unfixed client's file to HTML, and even THEN, we typically don't see them.

Of course, we're not working in .odt, as I see no reason to do so.

Whatever. As long as Roger's issue or non-issue, or whatever, is solved.

Hitch
Hitch is offline   Reply With Quote