MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Plugins (https://www.mobileread.com/forums/forumdisplay.php?f=268)
-   -   [REQ] A plugin that delete class in the chapter (https://www.mobileread.com/forums/showthread.php?t=278615)

Auramazda 09-19-2016 03:40 PM

[REQ] A plugin that delete class in the chapter
 
Hi all, I have a request, sometime I must re-edit an old epub or a epub give me by someone and I find the code full of class and span, for example purely coincidental: calibre1, calibre2, calibre3, calibre52,...
Normally if there isn't too much editing I copy the text like a txt in a new epub but sometime I cant do it so I must delete them whit the find.
I use CSS Remove Unused Selector that delete the unused class in css (very usefull), there is a possibility to delete, by a similar menù, the class in the chapter?

Toxaris 09-19-2016 04:30 PM

What is wrong with a simple search and replace?

nabsltd 09-19-2016 05:51 PM

With just a regex, there's nothing simple about a search for a tag with a specific "class=" and then removing that tag (and it's closing tag). For example, build a general-purpose regex that successfully deletes the tag with class "deleteme", along with the matching closing tag:
Code:

<span class="keep01">Here <span class="deleteme">is <span class="alsokeep">the</span> text <span class="dontdelete">in the</span> book</span> that should all remain intact.</span>
This operation is trivial with a real HTML editor (I use Expression Web), but Sigil (and Calibre) don't have anything that can do this, and that's a simple operation compared to what you can do with a real editor.

Terisa de morgan 09-19-2016 05:53 PM

But you have a plugin for calibre editor which does exactly that (and more).

DiapDealer 09-19-2016 06:08 PM

Quote:

Originally Posted by Terisa de morgan (Post 3396215)
But you have a plugin for calibre editor which does exactly that (and more).

There's a Sigil version of the very same plugin. ;)

Quote:

Originally Posted by nabsltd (Post 3396214)
With just a regex, there's nothing simple about a search for a tag with a specific "class=" and then removing that tag (and it's closing tag). For example, build a general-purpose regex that successfully deletes the tag with class "deleteme", along with the matching closing tag:
Code:

<span class="keep01">Here <span class="deleteme">is <span class="alsokeep">the</span> text <span class="dontdelete">in the</span> book</span> that should all remain intact.</span>
This operation is trivial with a real HTML editor (I use Expression Web), but Sigil (and Calibre) don't have anything that can do this, and that's a simple operation compared to what you can do with a real editor.

First off: you're right. Regex is not the best tool for removing tags based on attribute values. But I don't think that's what the OP was asking for (nor what Toxaris was suggesting using regex for). The OP is asking for something that removes classes (classes that are no longer being used by CSS) from attribute strings. For this purpose, regex is quite safe/sufficient (as Toxaris noted).

By the way: both the TagMechanic plugin for Sigil, and "Diap's Editing Toolbag" for calibre make it trivial to successfully remove the tag with the class "deleteme" (along with the matching closing tag), per your example. ;)

JSWolf 09-19-2016 06:28 PM

In Calibre's eBook editor, "remove unused CSS rules" also removes classes from the xHTML that are not used.

DiapDealer 09-19-2016 06:37 PM

There you go.

Auramazda 09-20-2016 07:54 PM

Quote:

Originally Posted by Toxaris (Post 3396158)
What is wrong with a simple search and replace?

When you open a css and read .calibre92 you want a button that can delete every calibre in the book in one click, and yes I have already search and replace it 92 times with a very great happyness, so I hope for the next time they send me this thing I have something more faster to correct them

Quote:

Originally Posted by DiapDealer (Post 3396228)
First off: you're right. Regex is not the best tool for removing tags based on attribute values. But I don't think that's what the OP was asking for (nor what Toxaris was suggesting using regex for). The OP is asking for something that removes classes (classes that are no longer being used by CSS) from attribute strings. For this purpose, regex is quite safe/sufficient (as Toxaris noted).

Yes this is my dream

Quote:

By the way: both the TagMechanic plugin for Sigil, and "Diap's Editing Toolbag" for calibre make it trivial to successfully remove the tag with the class "deleteme" (along with the matching closing tag), per your example.
I have missed this plugin, is a great tool (I will use it in the near future), but is a manually tool and I will always have to write all 92 calibre class to delete them.

I find something a little extreme with smoothRemove plugin that can resolve (delete every class and span but not i and b), with TagMechanic convert the class in italic and bold and the rest in the mincer of smoothRemove

nabsltd 09-20-2016 09:45 PM

Quote:

Originally Posted by DiapDealer (Post 3396228)
First off: you're right. Regex is not the best tool for removing tags based on attribute values. But I don't think that's what the OP was asking for (nor what Toxaris was suggesting using regex for). The OP is asking for something that removes classes (classes that are no longer being used by CSS) from attribute strings.

I didn't get that from the post, as he was asking to delete unused stuff from the HTML file. If you just want to delete an attribute on an HTML element, then it's not too hard (although you have to account for things like
Code:

class="first second"
That might leave you with a span that does nothing effective, but if it has other attributes, it won't be deleted as "empty" by the various tools. In other words, I couldn't figure out how he ended up with attributes that do nothing so they can be deleted safely, so I assumed he meant deleting the span/div/etc.

Quote:

By the way: both the TagMechanic plugin for Sigil, and "Diap's Editing Toolbag" for calibre make it trivial to successfully remove the tag with the class "deleteme" (along with the matching closing tag), per your example. ;)
Definitely a good start, but I'm spoiled by Expression Web and the selection process it has for elements to modify (with/without attribute, containing/not containing text/tag, inside/not inside tag, with infinite nesting of these rules) plus what it can do (replace tag/contents, add before/after start/end tag, remove tag/contents, change tag, change/remove attribute), and all of it can be regexed.

DiapDealer 09-20-2016 09:58 PM

Quote:

Originally Posted by nabsltd (Post 3397013)
Definitely a good start, but I'm spoiled by Expression Web and the selection process it has for elements to modify (with/without attribute, containing/not containing text/tag, inside/not inside tag, with infinite nesting of these rules) plus what it can do (replace tag/contents, add before/after start/end tag, remove tag/contents, change tag, change/remove attribute), and all of it can be regexed.

By all means, stick with what you're comfortable with. Sigil is never going to be a full-featured html editor. There's no need to reinvent the wheel after all. Plenty of those already exist if you require that sort of ability.

Toxaris 09-21-2016 03:58 AM

Quote:

Originally Posted by Auramazda (Post 3396947)
When you open a css and read .calibre92 you want a button that can delete every calibre in the book in one click, and yes I have already search and replace it 92 times with a very great happyness, so I hope for the next time they send me this thing I have something more faster to correct them

...

I have missed this plugin, is a great tool (I will use it in the near future), but is a manually tool and I will always have to write all 92 calibre class to delete them.

I find something a little extreme with smoothRemove plugin that can resolve (delete every class and span but not i and b), with TagMechanic convert the class in italic and bold and the rest in the mincer of smoothRemove

If you use for example .calibre\d+ you would catch all classes. If you do not use the actual tag, it will leave it there. For example spans would become empty and can easily be removed in one go with the plugin from DiapDealer.

So, search for 'class=".calibre\d+"' and replace it by nothing. Then run TagMechanic and remove empty span. Two actions only.

Another option could be is to import the ePUB into Word, do some cleaning/fixing if needed (check quotation marks for example) and then export the ePUB from Word. That would also remove the .calibre classes if you want. It is also possible to keep the classes you do want.

I have to say that removing all the calibre classes in one go is tricky in all cases. You could easily get rid of formatting that way that you don't want to loose. That is the problem with a generic class name, you don't know what it is about. It is however a side-effect from conversion that usually cannot be prevented. I would personally never remove all calibre classes in one go.

theducks 09-21-2016 11:05 AM

Quote:

Originally Posted by Toxaris (Post 3397128)
If you use for example .calibre\d+ you would catch all classes. If you do not use the actual tag, it will leave it there. For example spans would become empty and can easily be removed in one go with the plugin from DiapDealer.

So, search for 'class=".calibre\d+"' and replace it by nothing. Then run TagMechanic and remove empty span. Two actions only.

Another option could be is to import the ePUB into Word, do some cleaning/fixing if needed (check quotation marks for example) and then export the ePUB from Word. That would also remove the .calibre classes if you want. It is also possible to keep the classes you do want.

I have to say that removing all the calibre classes in one go is tricky in all cases. You could easily get rid of formatting that way that you don't want to loose. That is the problem with a generic class name, you don't know what it is about. It is however a side-effect from conversion that usually cannot be prevented. I would personally never remove all calibre classes in one go.

You can use that REGEX (calibre\d+) within Diaps Tag tool. Just tick the REGEX box at the end if the pattern

JSWolf 09-21-2016 11:14 AM

Quote:

Originally Posted by Auramazda (Post 3396947)
When you open a css and read .calibre92 you want a button that can delete every calibre in the book in one click, and yes I have already search and replace it 92 times with a very great happyness, so I hope for the next time they send me this thing I have something more faster to correct them

And if you remove ever Calibre named class, you will have no idea what those classes did. You may be able to guess at some and get it correct, but no way you'd get them all correct.

What I do it go through the classes and replacement with the code I want and the names I want so I'll have a better idea of what the class does by the name. A class name of calibre12 on it's own doesn't say that it does.

JSWolf 09-21-2016 11:17 AM

If you have a class like say <span class="doesnotexist"> and you load the ePub into Calibre's editor, you can use the tool to remove unused CSS and it will remove the class from the span if the class is not in the CSS. Then you can use the modify ePub plugin to remove empty spans.

DiapDealer 09-21-2016 11:57 AM

Quote:

Originally Posted by JSWolf (Post 3397333)
If you have a class like say <span class="doesnotexist"> and you load the ePub into Calibre's editor, you can use the tool to remove unused CSS and it will remove the class from the span if the class is not in the CSS. Then you can use the modify ePub plugin to remove empty spans.

I let the first few instances go, Jon (because information is always good), but it's now time to stop promoting calibre solutions in a discussion thread about a Sigil plugin request. That there are ways to accomplish the OP's request in both Sigil and calibre have been established and noted. Now it's time to stay on topic (Sigil), or bow out.


All times are GMT -4. The time now is 08:54 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.