MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Sigil (https://www.mobileread.com/forums/forumdisplay.php?f=203)
-   -   Adding a limited Automate Feature To Sigil (https://www.mobileread.com/forums/showthread.php?t=341347)

phillipgessert 09-21-2021 12:17 PM

There’s surely a more elegant way, but I’d just pull em all out via unzip and concatenate them outside of Sigil, go back to Sigil to trash all 206 within it, pull in the new megastylesheet, and finally run a find/replace on all the xhtml files to repair the now-broken stylesheet link/s in the head.

Tex2002ans 09-21-2021 01:29 PM

Quote:

Originally Posted by RbnJrg (Post 4156224)
By the way, by pure chance anyone knows about a tool to consolite stylesheets? Because I noted that there are many styles, that are equals, but they are in different sheets with different names.

Calibre EPUB->EPUB conversion.

It's what I do when I export individual chapters as InDesign EPUBs.

So let's say there's 26 different chapters, 26 different EPUBs, all have almost-the-same-but-slightly-different CSS, all with conflicting class names.

I merge all EPUBs together (using Calibre's EPUBMerge plugin), then a Calibre EPUB->EPUB.

This will convert each unique CSS into 1 class.

So these matching classes:

Spoiler:
CSS #1:

Code:

span.CharOverride-3 {
        font-family:"Adobe Garamond Pro Regular", sans-serif;
        font-size:1.181em;
        font-style:normal;
        font-variant:small-caps;
        font-weight:normal;
        text-transform:none;
}

CSS #2:

Code:

span.CharOverride-4 {
        font-family:"Adobe Garamond Pro Regular", sans-serif;
        font-size:1.181em;
        font-style:normal;
        font-variant:small-caps;
        font-weight:normal;
        text-transform:none;
}



will convert into a single "charoverride3".

Side Note: If you've used human-readable names for CSS, you'll probably get a big mess in your HTML, since Calibre will convert everything to the very first matching name:

Code:

<p class="first">This is a first line.</p>
<p class="noindent">This is a typical no indent paragraph.</p>

Code:

p.first {
        text-indent: 0;
}

p.noindent {
        text-indent: 0;
}

after Calibre EPUB->EPUB conversion would turn into:

Code:

<p class="first">This is a first line.</p>
<p class="first">This is a typical no indent paragraph.</p>

But if you already have a giant spaghetti mess of auto-generated classes, it'll make it infinitely easier. :)

RbnJrg 09-21-2021 03:43 PM

Quote:

Originally Posted by Tex2002ans (Post 4156269)
Calibre EPUB->EPUB conversion.

It's what I do when I export individual chapters as InDesign EPUBs.

So let's say there's 26 different chapters, 26 different EPUBs, all have almost-the-same-but-slightly-different CSS, all with conflicting class names.

I merge all EPUBs together (using Calibre's EPUBMerge plugin), then a Calibre EPUB->EPUB.

This will convert each unique CSS into 1 class.

Thanks a lot! I did what said and Calibre made a great job. It was not perfect, but I can fix the minor "issues" originated by the conversion EPUB -> EPUB. And, the most important, now I have ONLY ONE stylesheet. I didn't know about that feature of Calibre; thanks for sharing the info. :thanks:

Tex2002ans 09-21-2021 03:51 PM

Quote:

Originally Posted by RbnJrg (Post 4156322)
Thanks a lot! I did what said and Calibre made a great job. It was not perfect, but I can fix the minor "issues" originated by the conversion EPUB -> EPUB. And, the most important, now I have ONLY ONE stylesheet. I didn't know about that feature of Calibre; thanks for sharing the info. :thanks:

:thumbsup:

There's also:

Convert books > Convert individually > Look & Feel > Styling

At the very bottom, there's a "Filter Style Information" section with checkboxes for:
  • Fonts
  • Margins
  • Padding
  • Floats
  • Colors

This would remove all that cruft from the CSS files on conversion as well.

Might be helpful for condensing down even more of those "unique" CSS classes.

I only noticed those options a few weeks ago though, so I haven't done in-depth testing.

Might also help convert those hundreds of styles down to a few dozen.

Tex2002ans 09-21-2021 06:08 PM

Quote:

Originally Posted by phillipgessert (Post 4156235)
There’s surely a more elegant way, but I’d just pull em all out via unzip and concatenate them outside of Sigil, go back to Sigil to trash all 206 within it, pull in the new megastylesheet, and finally run a find/replace on all the xhtml files to repair the now-broken stylesheet link/s in the head.

Be very careful, that wouldn't work with conflicting class names:

class="blockquote" in CSS#1

might not be the same as

class="blockquote" in CSS#2.

The Calibre EPUB->EPUB approach would thoroughly go through all HTML+CSS and merge/rename everything for you.

If the two classes are exactly the same, great.

If the two classes are same name, but different CSS, Calibre will merge/create a new class + make sure to properly update the HTML too:

Spoiler:
Before:

Book #1:

Code:

<blockquote>
<p class="blockquote">This is an example.</blockquote>
</blockquote>

Code:

p.blockquote {
        margin-top: 1em;
        margin-bottom: 1em;
        margin-left: 5%;
        margin-right: 5%;
}

Book #2:

Code:

<blockquote>
<p class="blockquote">This is a second example.</blockquote>
</blockquote>

Code:

p.blockquote {
        margin-left: 5%;
}

After:

Book #1:

Code:

<blockquote>
<p class="blockquote">This is an example.</blockquote>
</blockquote>

Book #2:

Code:

<blockquote>
<p class="blockquote2">This is a second example.</blockquote>
</blockquote>

Code:

p.blockquote {
        margin-top: 1em;
        margin-bottom: 1em;
        margin-left: 5%;
        margin-right: 5%;
}

p.blockquote2 {
        margin-left: 5%;
}



Side Note: Similar logic applies to "Removing Unused Styles". You have to pay very close attention to what's happening with edge cases.

In 2021: "Indesign-epub-kindle formatting problem: footnotes export with massive indent", I also explained a more "surgical" approach + discussed a few things to look out for (like accidentally stripping important/busted font information).

phillipgessert 09-21-2021 06:52 PM

Quote:

Originally Posted by Tex2002ans (Post 4156377)
Be very careful, that wouldn't work with conflicting class names:

<snip>

That's a good point, my suggestion would give undue priority to any same-named stuff pulled in from the later sheets. That fix you proposed seems incredibly useful.

KevinH 09-21-2021 07:52 PM

So are you saying a possible plugin to "merge" stylesheets might be useful?

Does it work on all selectors or only class selectors? How does it treat element selectors that differ across the sheets?

Hmm ... it would also need to:

- split all selector lists out

- find all styles/selectors with identical property value lists and assign them a common class name

- make sure the selectors for unique styles are themselves unique.

- create a class name mapping to fix up the assigned classes in all html files

Anything else?

RbnJrg 09-21-2021 08:49 PM

Quote:

Originally Posted by Tex2002ans (Post 4156327)
:thumbsup:

There's also:

Convert books > Convert individually > Look & Feel > Styling

At the very bottom, there's a "Filter Style Information" section with checkboxes for:
  • Fonts
  • Margins
  • Padding
  • Floats
  • Colors

This would remove all that cruft from the CSS files on conversion as well.

Might be helpful for condensing down even more of those "unique" CSS classes.

I only noticed those options a few weeks ago though, so I haven't done in-depth testing.

Might also help convert those hundreds of styles down to a few dozen.

Many thanks for that too!

RbnJrg 09-21-2021 09:26 PM

Quote:

Originally Posted by KevinH (Post 4156395)
So are you saying a possible plugin to "merge" stylesheets might be useful?

Imagine :) To have to work with 206 stylesheets or with only one. Calibre saved my day but it would be nice for Sigil to have that feature too (even by means of a plugin).

Quote:

Does it work on all selectors or only class selectors?
I think it should work on all selectors.

Quote:

How does it treat element selectors that differ across the sheets?
Good question. I suppose that you are refering to selectors based on tag names (because those ones based on #id or classes are not problematic. because if they have the same properties, they must be treat them as a same style; otherwise, as different styles). But if you in one sheet has styles for p, h*, blockquote, etc., etc. and in another sheet different styles for those same selectors, that can be an issue. I can't see another way to solve the problem that to assign them a class (p.sheet1 or p.s1, p.sheet2 or p.s2 and so on).

Quote:

Hmm ... it would also need to:

- split all selector lists out

- find all styles/selectors with identical property value lists and assign them a common class name

- make sure the selectors for unique styles are themselves unique.

- create a class name mapping to fix up the assigned classes in all html files

Anything else?
In principle, it seems that that plan covers all points.

Tex2002ans 09-21-2021 11:21 PM

2 Attachment(s)
Quote:

Originally Posted by KevinH (Post 4156395)
So are you saying a possible plugin to "merge" stylesheets might be useful?

Perhaps...

I still think Style Mapping would be much more powerful.

I discussed that + "Consolidate Stylesheet" a few months back:

2021: "What Features or Tools does Sigil Still Need Yet?" (Post #163+)

InDesign has such a mapping function when doing EPUB Export.

For import, I also believe it also maps Word (DOCX) Styles -> InDesign Styles, so when you import those documents, you can quickly go through a table and say what gets assigned to what.

It speeds up the to-Print workflow dramatically.

Don't see why it couldn't speed up the to-clean-EPUB workflow as well.

Quote:

Originally Posted by KevinH (Post 4156395)
Does it work on all selectors or only class selectors? How does it treat element selectors that differ across the sheets?

Unsure.

I rarely use anything beyond very basic classes in my ebooks, so I haven't done extensive testing into Calibre's innards to see exactly what it does with more complicated selectors.

During a Calibre EPUB->EPUB, I think it converts everything down to individual "calibre##" classes. For example:

Spoiler:

Code:

  <p>Testing</p>

  <blockquote>
    <p>This is an example</p>
    <p>of a larger blockquote.</p>
  </blockquote>

  <p>Testing</p>

Code:

p {
        margin-top: 0;
        margin-bottom: 0;
        text-align: justify;
        text-indent: 2em;
}

blockquote > p:first-child {
        background-color: red;
        margin-top: 1em;
        margin-bottom: 1em;
        text-indent: 0;
}

blockquote > p {
        background-color: yellow;
        padding-top: 1em;
        margin-bottom: 1em;
}



Attachment 189347

Calibre EPUB->EPUB turned into:

Spoiler:

Code:

  <p class="calibre1">Testing</p>

  <blockquote class="calibre2">
    <p class="calibre3">This is an example</p>
    <p class="calibre4">of a larger blockquote.</p>
  </blockquote>

  <p class="calibre1">Testing</p>

Code:

.calibre {
    display: block;
    font-size: 1em;
    padding-left: 0;
    padding-right: 0;
    margin: 0 5pt
    }
.calibre1 {
    display: block;
    text-align: justify;
    text-indent: 2em;
    margin: 0
    }
.calibre2 {
    display: block;
    margin: 1em
    }
.calibre3 {
    background-color: yellow;
    display: block;
    padding-top: 1em;
    text-align: justify;
    text-indent: 0;
    margin: 1em 0
    }
.calibre4 {
    background-color: yellow;
    display: block;
    padding-top: 1em;
    text-align: justify;
    text-indent: 2em;
    margin: 0 0 1em
    }



(Side Note: The red background-color went poof. Suspecting it's a conversion bug.)

Quote:

Originally Posted by KevinH (Post 4156395)
Hmm ... it would also need to:

Hmmm... similar to those Calibre checkboxes, it would be nice to completely strip/ignore certain properties.

Nice to have broad/easy-mode checkbox categories like "Colors" + "Margins" + "Floats".

But also a surgical/advanced-mode where you could specify attributes to strip:
  • letter-spacing
  • orphans
  • widows
  • text-transform
  • [...]

(Maybe a live list of all currently used properties within the CSS?)

So some InDesign cruft like this:

Spoiler:
Code:

p.Block-indent {
        color:#000000;
        font-family:"Minion Pro Medium", sans-serif;
        font-size:0.917em;
        font-style:normal;
        font-variant:normal;
        font-weight:normal;
        line-height:1.182;
        margin-bottom:5px;
        margin-left:36px;
        margin-right:36px;
        margin-top:5px;

        orphans:2;
        page-break-after:auto;
        page-break-before:auto;
        text-align:justify;
        text-decoration:none;
        text-indent:0;
        text-transform:none;
        widows:2;
}

Code:

p.Body-text {
        color:#000000;
        font-family:"Minion Pro Medium", sans-serif;
        font-size:0.917em;
        font-style:normal;
        font-variant:normal;
        font-weight:normal;
        line-height:1.2;
        margin-bottom:0;
        margin-left:0;
        margin-right:0;
        margin-top:1px;

        orphans:2;
        page-break-after:auto;
        page-break-before:auto;
        text-align:justify;
        text-decoration:none;
        text-indent:18px;
        text-transform:none;
        widows:2;
}



* * *

If you check 3... Remove:
  • Margins
  • line-height
  • text-indent

Those classes would now be considered equivalent.

So consolidate .Body-text -> Block-indent in CSS:

Spoiler:
Code:

p.Block-indent {
        color:#000000;
        font-family:"Minion Pro Medium", sans-serif;
        font-size:0.917em;
        font-style:normal;
        font-variant:normal;
        font-weight:normal;
        orphans:2;
        page-break-after:auto;
        page-break-before:auto;
        text-align:justify;
        text-decoration:none;
        text-transform:none;
        widows:2;
}



+ go through and update any HTML:

<p class="Block-indent"> -> <p class="Body-text">

* * *

If you say... Remove:
  • Margins
  • line-height

they'd be extremely close, but at least you'll strip/remove some trash:

Spoiler:
Code:

p.Block-indent {
        color:#000000;
        font-family:"Minion Pro Medium", sans-serif;
        font-size:0.917em;
        font-style:normal;
        font-variant:normal;
        font-weight:normal;
        orphans:2;
        page-break-after:auto;
        page-break-before:auto;
        text-align:justify;
        text-decoration:none;
        text-indent:0;
        text-transform:none;
        widows:2;
}

p.Body-text {
        color:#000000;
        font-family:"Minion Pro Medium", sans-serif;
        font-size:0.917em;
        font-style:normal;
        font-variant:normal;
        font-weight:normal;
        orphans:2;
        page-break-after:auto;
        page-break-before:auto;
        text-align:justify;
        text-decoration:none;
        text-indent:18px;
        text-transform:none;
        widows:2;
}



Update CSS, but do not update the HTML.

* * *

Would be Helpful: After this stage, if you had a "Style Mapper", you'd be able to select these 2 classes, then see their CSS compared side-by-side, highlighting the diffs.

Then you'd be able to:
  • Edit
    • Remove the "text-indent:18px;" line
    • Sigil updates the CSS.
      • (Optionally checks again to see if there's any matching classes that it can consolidate into now.)
  • Merge Left/Right
    • Be able to say:
      • Block-indent -> Body-text
      • OR Block-indent <- Body-text
    • Sigil updates CSS + HTML.
  • Rename
    • Block-indent now called "normal"
    • Sigil updates CSS + HTML:
      • p.Block-indent -> p.normal
      • <p class="Block-indent"> -> <p class="normal">

* * *

Side Note: Like I mentioned, I only ran across those Look & Feel screens in Calibre very recently, so I believe there's a way to do this property stripping already...

In Calibre, there is the Transform Styles tab (right next to the Styling tab):

Attachment 189346

... but documentation is sparse + I don't exactly know how useful it would be (yet), since I'm typically dealing with all types of nonsense on a per-book level. (I do see Import/Export button + a GUI to create rules though.)

Right now, I do CSS cleanup manually (using regex) + multiple rounds of Calibre EPUB->EPUB conversions... until I'm satisfied and have a relatively clean base to work from.

But to have a quicker way to:
  • strip/consolidate CSS
  • compare CSS
  • convert/map to human-readable/standard class names

would be absolutely fantastic.

Many times, I'm just looking through tons and tons of cruft only to finally spot the single difference being a:

- font-variant: italic;

then I know: "Oh, this should just be a class="italic" (or <i> / <em>)."

Then I do a simple S&R or open up Diap's Toolbag and convert it.

KevinH 09-22-2021 10:16 AM

IMHO, adding all those unneeded classes to the code is really a shame. It creates quite the mess.

And it converts all selectors into class selectors with *non* mnemonic class names destroying the structure of the original css completely.

Perhaps something that simply identifies and removes identical classes would be better / safer / cleaner.

As for a properties filter, that should be easy to now do with a SavedSearch Group and run with the target set to all css stylesheets, all in one command.

That could be run first with then something that identifies and removes extra identical selectors might be enough.

But I have never seen anything with 206 stylesheets, I must admit.

If In-Design can handle style mapping from .docx styles, why isn't this handled by In-Design when inputting the .docx files?

KevinH 09-22-2021 11:39 AM

It would be nice to see what one of these real epubs with lots and lots of stylesheets from InDesign looks like to test cleanup ideas on.

If you have access to such an epub *before* the css was converted by Calibre, please run the Borkify Epub plugin on it and post it here or privately PM me with a link so that I can see some of the issues involved and test some approaches.

Thanks

Hitch 09-22-2021 11:47 AM

Quote:

Originally Posted by KevinH (Post 4156550)
It would be nice to see what one of these real epubs with lots and lots of stylesheets from InDesign looks like to test cleanup ideas on.

If you have access to such an epub *before* the css was converted by Calibre, please run the Borkify Epub plugin on it and post it here or privately PM me with a link so that I can see some of the issues involved and test some approaches.

Thanks

Kevin:

Sorry, late to this party. Or...was here, had a work crisis, back. What exactly do you need from INDD?

(I freely admit that Tex's work might be better suited to help you, but I have a fairly huge collection of INDD files...)

Hitch

KevinH 09-22-2021 12:09 PM

I am looking for an epub created from InDesign that uses individual stylesheets (one per chapter) with many chapters (many stylesheets) that I can use to test some ideas for techniques to merge the large number of stylesheets down into a small hand full of stylesheets and in the process remap styles if possible.

All hopefully *without* having to convert all selectors to class selectors with non-mnemonic numbered names that end up littering the html.

I am thinking of using ngram scoring to try to identify the most similar set of selector properties (after a filtering step) and presenting those for the userto approve of, then doing the merge.

I am thinking that by paretos rule we should be able to take a large number of stylesheets and merge them into a much small number but keep most of the individuality present.

Hitch 09-22-2021 01:14 PM

Quote:

Originally Posted by KevinH (Post 4156561)
I am looking for an epub created from InDesign that uses individual stylesheets (one per chapter) with many chapters (many stylesheets) that I can use to test some ideas for techniques to merge the large number of stylesheets down into a small hand full of stylesheets and in the process remap styles if possible.

All hopefully *without* having to convert all selectors to class selectors with non-mnemonic numbered names that end up littering the html.

I am thinking of using ngram scoring to try to identify the most similar set of selector properties (after a filtering step) and presenting those for the userto approve of, then doing the merge.

I am thinking that by paretos rule we should be able to take a large number of stylesheets and merge them into a much small number but keep most of the individuality present.

Okay. That wouldn't be one of ours (our production I mean) but it's entirely possible that I have files like that, from other designers that we used for export to ePUB or HTML and subsequent conversion. I will take a look. I mean, to be clear--I know we have had those, but I don't know if I still have one in-house that would be available to Borkify for you. I'll check.

Hitch


All times are GMT -4. The time now is 08:11 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.