View Single Post
Old 09-22-2021, 02:19 PM   #107
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by Hitch View Post
Okay. That wouldn't be one of ours (our production I mean) but it's entirely possible that I have files like that, from other designers that we used for export to ePUB or HTML and subsequent conversion. I will take a look. I mean, to be clear--I know we have had those, but I don't know if I still have one in-house that would be available to Borkify for you. I'll check.
I just PMed KevinH 3 of my examples.

2 InDesign EPUBs -> Merged -> Calibre EPUB->EPUB conversion.

This is where you can see 1 CSS file per chapter + overlapping class names:

Spoiler:

CSS #1:

Code:
p.ParaOverride-1 {
	margin-bottom:0px;
}
p.ParaOverride-2 {
	margin-top:1px;
	text-indent:18px;
}
p.ParaOverride-3 {
	text-indent:14px;
}
p.ParaOverride-4 {
	text-indent:18px;
}
span.CharOverride-1 {
	font-size:1.454em;
}
span.CharOverride-2 {
	font-size:1.091em;
}
span.CharOverride-3 {
	font-size:58%;
	vertical-align:super;
}
CSS #2:

Code:
p.ParaOverride-1 {
	text-align:center;
}
p.ParaOverride-2 {
	margin-top:0px;
	text-align:center;
	text-indent:0px;
}
p.ParaOverride-3 {
	text-align:center;
	text-indent:0px;
}
p.ParaOverride-4 {
	margin-top:2px;
	text-align:center;
	text-indent:0px;
}
span.CharOverride-1 {
	font-family:"Myriad Pro Semibold", sans-serif;
	font-size:1.801em;
	font-style:normal;
	font-weight:normal;
}
span.CharOverride-2 {
	font-family:"Minion Pro Medium";
	font-size:0.909em;
	font-style:normal;
	font-weight:normal;
}
span.CharOverride-3 {
	font-family:"Myriad Pro Semibold", sans-serif;
	font-style:normal;
	font-weight:normal;
}


1 Word -> HTML -> Calibre EPUB->EPUB conversion.

This is where you can see a typical CSS mess:

Spoiler:

Code:
.calibre7 {
    font-family: "Times New Roman", serif
    }

[...]

.calibre12 {
    font-size: 1em
    }
.calibre13 {
    font-family: "Times New Roman", serif;
    font-size: 1em
    }
[...]

.calibre14 {
    font-size: 1.125em;
    line-height: 1.2
    }
.calibre15 {
    color: black;
    font-family: "Garamond", serif;
    font-size: 1em;
    line-height: 1.2
    }
[...]

.calibre17 {
    line-height: 1.2
    }
.calibre18 {
    color: black;
    display: none;
    text-decoration: none
    }
[...]
.calibre20 {
    color: black;
    display: block;
    font-family: "Garamond", serif;
    font-size: 1.48148em;
    font-weight: normal;
    line-height: 1.2;
    page-break-after: avoid;
    text-align: center;
    text-autospace: none;
    margin: 30pt 0
    }
.calibre21 {
    color: black;
    display: block;
    font-family: "Garamond", serif;
    font-size: 1.25926em;
    font-weight: normal;
    line-height: 1.2;
    page-break-after: avoid;
    text-align: justify;
    text-autospace: none;
    margin: 20pt 0
    }


Quote:
Originally Posted by KevinH View Post
I am looking for an epub created from InDesign that uses individual stylesheets (one per chapter) with many chapters (many stylesheets) that I can use to test some ideas for techniques to merge the large number of stylesheets down into a small hand full of stylesheets and in the process remap styles if possible.
InDesign's EPUB export actually only outputs a single CSS file.

When designing a print book, one type of workflow is:

- individual "chapter file"s
- then link them together into a single "book file".

(This allows you to easily swap/remove chapters, auto-renumber pages/endnotes, etc.)

In my case though, as a converter, I don't have that single "book file"... I only get the 20 separate "chapter file"s.

So, when I'm exporting, I export each individual chapter -> EPUB... hence the 20 different similar-but-not-quite CSS files.

Mix Direct Formatting and lots of other cruft in there, and you get a giant, conflicting mess on your hands.

IF I had the monolithic "book file", I'd be able to export a single EPUB... but you'd still have a spaghetti mess, but no conflicting names. :P

(Same as cleaning up Word->HTML, etc. etc.)

Quote:
Originally Posted by KevinH View Post
All hopefully *without* having to convert all selectors to class selectors with non-mnemonic numbered names that end up littering the html.
Yeah, I don't believe InDesign or Word/LibreOffice generates complicated selectors.

I think they all just break it down to individual classes.

So the bulk of consolidate/cleanup would probably be this simple conversion cruft:

Code:
.class1 {
	text-align: center;
}
.class2 {
	text-align: center;
	font-size: 1em;
}
.class3 {
	text-align: center;
	font-size: .9em;
}
not necessarily trying to tackle all the advanced CSS3 selectors, etc.

Quote:
Originally Posted by KevinH View Post
If In-Design can handle style mapping from .docx styles, why isn't this handled by In-Design when inputting the .docx files?
Hmm... the Export (Styles Mapper) is definitely there.

I'm not familiar with Import. (I don't actually use InDesign, I only know enough to get text OUT OF IT as soon as possible.)

I believe it's built-in. See this video as one example:

Nukefactory: "How to import text into InDesign without losing basic formatting"

But as usual, the thing is:
  • 99+% of people don't use Styles
  • they don't use them consistently
    • lots of Direct Formatting
  • and InDesign Styles =/= Word Styles
    • InDesign is much more powerful.
  • Print-focused designers probably don't have one clue about HTML or ebooks
    • That's just technical gobbledeegook. Everything looks fine with my eyes!
    • And hey, great, InDesign "exports" EPUBs. Looks "perfect" on my iPad!!! What's the problem?

So each stage in the conversion workflow has the potential to introduce nonsense or lose key information.

And again, as a converter... I don't have control over what these people are doing in intermediate steps. I just have to clean up the cruft and create the ultimate ebook.

Minor Rant:

Spoiler:
Grumble, grumble.

My latest is trying to get them to understand the text:

Code:
For more information, click here and here.
might be 'usable' in a web article... but this type of text CANNOT be used in a physical book (and is very very bad in an ebook).


Quote:
Originally Posted by KevinH View Post
I am thinking of using ngram scoring to try to identify the most similar set of selector properties (after a filtering step) and presenting those for the userto approve of, then doing the merge.
Yeah, I was thinking of something similar. A similarity score.

You click on a class, it ranks everything that's close.

Then you can Shift+Click or Ctrl+Click and merge the classes together.

* * *

Usually, I try to do this stripping/consolidating in passes. Clean up:
  • Fonts
  • Colors
  • font-size
  • italics
  • superscripts
  • [...]

and at each stage, I try to merge what I can to my "normalized" (human-readable) classes:
  • All classes with "vertical-align: super"
    • I'll try to convert to class="super" or <sup>.
  • Many classes with "font-variant: italic"
    • I'll try to convert to class="italics" or <i>/<em>.
  • Colors (black text + blue links), I'll instantly strip.
    • Then take a closer look at oddities (red, orange, green text, etc.).
      • Sometimes these things slip in (especially when authors are doing "Track Changes").
    • Commonly see very dark gray text instead of black.
      • CMYK -> RGB or copy/paste-from-other-source issue.
      • Once I spot the shade of gray and see it's irrelevant, I strip it.
  • All classes with "font-size: 1em;"
    • I remove that line.
  • Most fonts
    • I'll remove the CSS for main text font, then take a look at classes that DON'T use that font.
    • For example, the book is "Times New Roman", but there's a few classes with "Arial" or "Symbol" or something different. I'll take a closer look to see exactly where/how that was used.
      • Very common when there's Greek letters or Maths symbols.
  • [...]

This is where I got excited when I stumbled upon that Calibre "Transform Styles" tab.

It will allow me to at least come up with a set of some property-stripping rules that would save some time.

But the frustrating thing about Calibre EPUB->EPUB is it changes the class names.

And it's hard to know ahead-of-time what junk is going to be in this specific book! Each one will introduce their own unique niggles:

Like one book might use font-size: .88889em, another might have .888em and .8em.

One book might be typeset in "Times New Roman" with "Arial" crept in, another book "Arial" as the main with "Times New Roman" crept in.

This is why I mostly do CSS consolidation as THE VERY FIRST STEP after merging, then do successive rounds of EPUB->EPUB to make sure I get down to more bare bones.

But, of course, at later stages, when looking at CSS details, that's when you spot more consolidation that could've been done.

(Hence, a nice GUI, CSS Comparison/Merger, Style Mapper, etc.)

Quote:
Originally Posted by KevinH View Post
I am thinking that by paretos rule we should be able to take a large number of stylesheets and merge them into a much small number but keep most of the individuality present.

Last edited by Tex2002ans; 09-22-2021 at 03:55 PM.
Tex2002ans is offline   Reply With Quote