Quote:
Originally Posted by Hitch
Okay. That wouldn't be one of ours (our production I mean) but it's entirely possible that I have files like that, from other designers that we used for export to ePUB or HTML and subsequent conversion. I will take a look. I mean, to be clear--I know we have had those, but I don't know if I still have one in-house that would be available to Borkify for you. I'll check.
|
I just PMed KevinH 3 of my examples.
2 InDesign EPUBs -> Merged -> Calibre EPUB->EPUB conversion.
This is where you can see 1 CSS file per chapter + overlapping class names:
Spoiler:
CSS #1:
Code:
p.ParaOverride-1 {
margin-bottom:0px;
}
p.ParaOverride-2 {
margin-top:1px;
text-indent:18px;
}
p.ParaOverride-3 {
text-indent:14px;
}
p.ParaOverride-4 {
text-indent:18px;
}
span.CharOverride-1 {
font-size:1.454em;
}
span.CharOverride-2 {
font-size:1.091em;
}
span.CharOverride-3 {
font-size:58%;
vertical-align:super;
}
CSS #2:
Code:
p.ParaOverride-1 {
text-align:center;
}
p.ParaOverride-2 {
margin-top:0px;
text-align:center;
text-indent:0px;
}
p.ParaOverride-3 {
text-align:center;
text-indent:0px;
}
p.ParaOverride-4 {
margin-top:2px;
text-align:center;
text-indent:0px;
}
span.CharOverride-1 {
font-family:"Myriad Pro Semibold", sans-serif;
font-size:1.801em;
font-style:normal;
font-weight:normal;
}
span.CharOverride-2 {
font-family:"Minion Pro Medium";
font-size:0.909em;
font-style:normal;
font-weight:normal;
}
span.CharOverride-3 {
font-family:"Myriad Pro Semibold", sans-serif;
font-style:normal;
font-weight:normal;
}
1 Word -> HTML -> Calibre EPUB->EPUB conversion.
This is where you can see a typical CSS mess:
Quote:
Originally Posted by KevinH
I am looking for an epub created from InDesign that uses individual stylesheets (one per chapter) with many chapters (many stylesheets) that I can use to test some ideas for techniques to merge the large number of stylesheets down into a small hand full of stylesheets and in the process remap styles if possible.
|
InDesign's EPUB export actually only outputs a single CSS file.
When designing a print book, one type of workflow is:
- individual "chapter file"s
- then link them together into a single "book file".
(This allows you to easily swap/remove chapters, auto-renumber pages/endnotes, etc.)
In my case though, as a converter, I don't have that single "book file"... I only get the 20 separate "chapter file"s.
So, when I'm exporting, I export each individual chapter -> EPUB... hence the 20 different similar-but-not-quite CSS files.
Mix Direct Formatting and lots of other cruft in there, and you get a giant, conflicting mess on your hands.
IF I had the monolithic "book file", I'd be able to export a single EPUB... but you'd still have a spaghetti mess, but no conflicting names. :P
(Same as cleaning up Word->HTML, etc. etc.)
Quote:
Originally Posted by KevinH
All hopefully *without* having to convert all selectors to class selectors with non-mnemonic numbered names that end up littering the html.
|
Yeah, I don't believe InDesign or Word/LibreOffice generates complicated selectors.
I think they all just break it down to individual classes.
So the bulk of consolidate/cleanup would probably be this simple conversion cruft:
Code:
.class1 {
text-align: center;
}
.class2 {
text-align: center;
font-size: 1em;
}
.class3 {
text-align: center;
font-size: .9em;
}
not necessarily trying to tackle all the advanced CSS3 selectors, etc.
Quote:
Originally Posted by KevinH
If In-Design can handle style mapping from .docx styles, why isn't this handled by In-Design when inputting the .docx files?
|
Hmm... the
Export (Styles Mapper) is
definitely there.
I'm not familiar with Import. (I don't actually use InDesign, I only know enough to get text OUT OF IT as soon as possible.)
I believe it's built-in. See this video as one example:
Nukefactory: "How to import text into InDesign without losing basic formatting"
But as usual, the thing is:
- 99+% of people don't use Styles
- they don't use them consistently
- lots of Direct Formatting
- and InDesign Styles =/= Word Styles
- InDesign is much more powerful.
- Print-focused designers probably don't have one clue about HTML or ebooks
- That's just technical gobbledeegook. Everything looks fine with my eyes!
- And hey, great, InDesign "exports" EPUBs. Looks "perfect" on my iPad!!! What's the problem?
So each stage in the conversion workflow has the potential to introduce nonsense or lose key information.
And again, as a converter... I don't have control over what these people are doing in intermediate steps. I just have to clean up the cruft and create the ultimate ebook.
Minor Rant:
Spoiler:
Grumble, grumble.
My latest is trying to get them to understand the text:
Code:
For more information, click here and here.
might be 'usable' in a web article... but this type of text CANNOT be used in a physical book (and is very very bad in an ebook).
Quote:
Originally Posted by KevinH
I am thinking of using ngram scoring to try to identify the most similar set of selector properties (after a filtering step) and presenting those for the userto approve of, then doing the merge.
|
Yeah, I was thinking of something similar. A similarity score.
You click on a class, it ranks everything that's close.
Then you can Shift+Click or Ctrl+Click and merge the classes together.
* * *
Usually, I try to do this stripping/consolidating in passes. Clean up:
- Fonts
- Colors
- font-size
- italics
- superscripts
- [...]
and at each stage, I try to merge what I can to my "normalized" (human-readable) classes:
- All classes with "vertical-align: super"
- I'll try to convert to class="super" or <sup>.
- Many classes with "font-variant: italic"
- I'll try to convert to class="italics" or <i>/<em>.
- Colors (black text + blue links), I'll instantly strip.
- Then take a closer look at oddities (red, orange, green text, etc.).
- Sometimes these things slip in (especially when authors are doing "Track Changes").
- Commonly see very dark gray text instead of black.
- CMYK -> RGB or copy/paste-from-other-source issue.
- Once I spot the shade of gray and see it's irrelevant, I strip it.
- All classes with "font-size: 1em;"
- Most fonts
- I'll remove the CSS for main text font, then take a look at classes that DON'T use that font.
- For example, the book is "Times New Roman", but there's a few classes with "Arial" or "Symbol" or something different. I'll take a closer look to see exactly where/how that was used.
- Very common when there's Greek letters or Maths symbols.
- [...]
This is where I got excited when I stumbled upon that Calibre "Transform Styles" tab.
It will allow me to at least come up with a set of
some property-stripping rules that would save
some time.
But the frustrating thing about Calibre EPUB->EPUB is it changes the class names.
And it's hard to know ahead-of-time what junk is going to be in this specific book! Each one will introduce their own unique niggles:
Like one book might use font-size: .88889em, another might have .888em and .8em.
One book might be typeset in "Times New Roman" with "Arial" crept in, another book "Arial" as the main with "Times New Roman" crept in.
This is why I mostly do CSS consolidation as THE VERY FIRST STEP after merging, then do successive rounds of EPUB->EPUB to make sure I get down to more bare bones.
But, of course, at later stages, when looking at CSS details, that's when you spot more consolidation that could've been done.
(Hence, a nice GUI, CSS Comparison/Merger, Style Mapper, etc.)
Quote:
Originally Posted by KevinH
I am thinking that by paretos rule we should be able to take a large number of stylesheets and merge them into a much small number but keep most of the individuality present.
|