Quote:
Originally Posted by eschwartz
|
OK - so I looked at my current reads-in-progress for another test case.
your code did not fix the example below because there's a class after the <a
That's always going to be the case always if the book has gone through a calibre epub to epub conversion? , because calibre will add classes to every tag.
The previous example I gave was from a completely unedited/ unconverted retail book, but my usual workflow for making a personal reading version is to load original into calibre & immediately convert it epub-to-epub , then tweak only within the resulting copy, never touch the original_epub backup.
I use the convert to add extra CSS so as to zap hyphenation & zap widows & orphans at the same time.
Code:
<h1 class="calibre10" id="rw-h1_319849-00001"><a class="calibre7" href="../Text/9780857900135_toc.html">4</a></h1>
I'd want to reduce all that to
<h1 class="calibre10">4</h1>
the ID tag is redundant i.e. does not impact the reading experience in any way ?
this find worked ok though:
Code:
<a class="calibre\d" href="[^<>]*">((?:(?!<(?:a|/a)).)*)</a>