View Single Post
Old 11-11-2020, 10:37 AM   #16
G2B
Enthusiast
G2B began at the beginning.
 
Posts: 28
Karma: 10
Join Date: Feb 2018
Device: PC / iPad
Quote:
Originally Posted by retiredbiker View Post
There is actually a sticky for that: https://www.mobileread.com/forums/sh...d.php?t=237181
Thank you for that.

I do realize that calibre does not always use the same tags for the same situations. It depends on how many books I merge together, which means some searches have to be edited.
But I do run a few search/replace commands that are the same for every book. This is my list, copied from a saved search file.
Spoiler:

{
"searches": [
{
"case_sensitive": false,
"dot_all": false,
"find": "<body [^>]*>",
"mode": "regex",
"name": "1-calibre-body",
"replace": "<body class=\"calibre\">"
},
{
"case_sensitive": false,
"dot_all": false,
"find": " id=\"[^\\\"]*\"",
"mode": "regex",
"name": "2-no-IDs",
"replace": ""
},
{
"case_sensitive": false,
"dot_all": false,
"find": "<a [^>]*>",
"mode": "regex",
"name": "3-no-links",
"replace": ""
},
{
"case_sensitive": false,
"dot_all": false,
"find": "<h([0-9]) [^>]*>",
"mode": "regex",
"name": "clean-Htags",
"replace": "<h\\1>"
},
{
"case_sensitive": false,
"dot_all": false,
"find": "<b [^>]*>",
"mode": "regex",
"name": "clean-b",
"replace": "<b>"
},
{
"case_sensitive": false,
"dot_all": false,
"find": "<i [^>]*>",
"mode": "regex",
"name": "clean-i",
"replace": "<i>"
},
{
"case_sensitive": false,
"dot_all": false,
"find": "<em [^>]*>",
"mode": "regex",
"name": "clean-em",
"replace": "<em>"
},
{
"case_sensitive": false,
"dot_all": false,
"find": "<em>",
"mode": "regex",
"name": "em-to-i-a",
"replace": "<i>"
},
{
"case_sensitive": false,
"dot_all": false,
"find": "</em>",
"mode": "regex",
"name": "em-to-i-b",
"replace": "</i>"
},
{
"case_sensitive": false,
"dot_all": false,
"find": "<h([0-9])><b>",
"mode": "regex",
"name": "clean-Htags-b",
"replace": "<h\\1>"
},
{
"case_sensitive": false,
"dot_all": false,
"find": "<h([0-9])><span [^>]*>",
"mode": "regex",
"name": "clean-Htags-c",
"replace": "<h\\1>"
},
{
"case_sensitive": false,
"dot_all": false,
"find": "<br ([^<]*)>",
"mode": "regex",
"name": "clean-br",
"replace": "<br/>"
},
{
"case_sensitive": false,
"dot_all": false,
"find": "<hr ([^<]*)>",
"mode": "regex",
"name": "clean-hr",
"replace": "<hr/>"
},
{
"case_sensitive": false,
"dot_all": false,
"find": "<blockquote ([^<]*)>",
"mode": "regex",
"name": "clean-blockquote",
"replace": "<blockquote>"
},
{
"case_sensitive": false,
"dot_all": false,
"find": "<span class=\"italic\">([^<]*)</span>",
"mode": "regex",
"name": "clean-span-italics",
"replace": "<i>\\1</i>"
},
{
"case_sensitive": false,
"dot_all": false,
"find": "<span class=\"bold\">([^<]*)</span>",
"mode": "regex",
"name": "clean-span-bold-a",
"replace": "<b>\\1</b>"
},

],
"version": 1
}

I use <i>, <b> and <u> <br/> and <hr/> without classes.
The above alone can get rid of dozens of classes and sometimes reduce the size of an epub by half.


Then there is this one which gets rid of all the remaining DIV tags, but this can only be done after checking that any text in DIV tags has been changed to <p> tags. If text has <div> tags, this one will leave a book without any tags at all.

{
"case_sensitive": false,
"dot_all": false,
"find": "<span class=\"b\">([^<]*)</span>",
"mode": "regex",
"name": "clean-span-bold-b",
"replace": "<b>\\1</b>"
}


Then there is also the ([a-zI])linebreak-tags([a-zA-Z]) to \1 \2 and similar with comma a.o. to remove mid-sentence line breaks.

I search for chapter headers, change them to <h2> for 'parts' and <h3> for all the regular, I use <h4> for subchapters that I don't want in the TOC. Since those will be different with each book, I cannot make a general rule for them.

After that it becomes stylesheet line searches to remove the remaining nonsense rules, until I am left with only my standard ones which are the same for every book I edit. Some ebooks have multiple tags for every line that cancel each other out = waste of space that only increases the size of the book needlessly.

Lately, I have started merging all the novels or series from one author together and processing that in bulk. I find it goes faster than doing each one separately. I rename files per book with copyright date. After cleaning up css, I export everything, bring everything up to top-level and because of the renaming, I get the novels sorted chronologically and the series by name.

I am doing this to get rid of the interminable listings in my ebook-reader when all the books are separate. This way I get one listing for novels (I used to do by 5 or 6, but even that can be quite long)

Last edited by theducks; 11-11-2020 at 02:01 PM. Reason: spoilered
G2B is offline   Reply With Quote