Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 11-11-2020, 10:37 AM   #16
G2B
Enthusiast
G2B began at the beginning.
 
Posts: 28
Karma: 10
Join Date: Feb 2018
Device: PC / iPad
Quote:
Originally Posted by retiredbiker View Post
There is actually a sticky for that: https://www.mobileread.com/forums/sh...d.php?t=237181
Thank you for that.

I do realize that calibre does not always use the same tags for the same situations. It depends on how many books I merge together, which means some searches have to be edited.
But I do run a few search/replace commands that are the same for every book. This is my list, copied from a saved search file.
Spoiler:

{
"searches": [
{
"case_sensitive": false,
"dot_all": false,
"find": "<body [^>]*>",
"mode": "regex",
"name": "1-calibre-body",
"replace": "<body class=\"calibre\">"
},
{
"case_sensitive": false,
"dot_all": false,
"find": " id=\"[^\\\"]*\"",
"mode": "regex",
"name": "2-no-IDs",
"replace": ""
},
{
"case_sensitive": false,
"dot_all": false,
"find": "<a [^>]*>",
"mode": "regex",
"name": "3-no-links",
"replace": ""
},
{
"case_sensitive": false,
"dot_all": false,
"find": "<h([0-9]) [^>]*>",
"mode": "regex",
"name": "clean-Htags",
"replace": "<h\\1>"
},
{
"case_sensitive": false,
"dot_all": false,
"find": "<b [^>]*>",
"mode": "regex",
"name": "clean-b",
"replace": "<b>"
},
{
"case_sensitive": false,
"dot_all": false,
"find": "<i [^>]*>",
"mode": "regex",
"name": "clean-i",
"replace": "<i>"
},
{
"case_sensitive": false,
"dot_all": false,
"find": "<em [^>]*>",
"mode": "regex",
"name": "clean-em",
"replace": "<em>"
},
{
"case_sensitive": false,
"dot_all": false,
"find": "<em>",
"mode": "regex",
"name": "em-to-i-a",
"replace": "<i>"
},
{
"case_sensitive": false,
"dot_all": false,
"find": "</em>",
"mode": "regex",
"name": "em-to-i-b",
"replace": "</i>"
},
{
"case_sensitive": false,
"dot_all": false,
"find": "<h([0-9])><b>",
"mode": "regex",
"name": "clean-Htags-b",
"replace": "<h\\1>"
},
{
"case_sensitive": false,
"dot_all": false,
"find": "<h([0-9])><span [^>]*>",
"mode": "regex",
"name": "clean-Htags-c",
"replace": "<h\\1>"
},
{
"case_sensitive": false,
"dot_all": false,
"find": "<br ([^<]*)>",
"mode": "regex",
"name": "clean-br",
"replace": "<br/>"
},
{
"case_sensitive": false,
"dot_all": false,
"find": "<hr ([^<]*)>",
"mode": "regex",
"name": "clean-hr",
"replace": "<hr/>"
},
{
"case_sensitive": false,
"dot_all": false,
"find": "<blockquote ([^<]*)>",
"mode": "regex",
"name": "clean-blockquote",
"replace": "<blockquote>"
},
{
"case_sensitive": false,
"dot_all": false,
"find": "<span class=\"italic\">([^<]*)</span>",
"mode": "regex",
"name": "clean-span-italics",
"replace": "<i>\\1</i>"
},
{
"case_sensitive": false,
"dot_all": false,
"find": "<span class=\"bold\">([^<]*)</span>",
"mode": "regex",
"name": "clean-span-bold-a",
"replace": "<b>\\1</b>"
},

],
"version": 1
}

I use <i>, <b> and <u> <br/> and <hr/> without classes.
The above alone can get rid of dozens of classes and sometimes reduce the size of an epub by half.


Then there is this one which gets rid of all the remaining DIV tags, but this can only be done after checking that any text in DIV tags has been changed to <p> tags. If text has <div> tags, this one will leave a book without any tags at all.

{
"case_sensitive": false,
"dot_all": false,
"find": "<span class=\"b\">([^<]*)</span>",
"mode": "regex",
"name": "clean-span-bold-b",
"replace": "<b>\\1</b>"
}


Then there is also the ([a-zI])linebreak-tags([a-zA-Z]) to \1 \2 and similar with comma a.o. to remove mid-sentence line breaks.

I search for chapter headers, change them to <h2> for 'parts' and <h3> for all the regular, I use <h4> for subchapters that I don't want in the TOC. Since those will be different with each book, I cannot make a general rule for them.

After that it becomes stylesheet line searches to remove the remaining nonsense rules, until I am left with only my standard ones which are the same for every book I edit. Some ebooks have multiple tags for every line that cancel each other out = waste of space that only increases the size of the book needlessly.

Lately, I have started merging all the novels or series from one author together and processing that in bulk. I find it goes faster than doing each one separately. I rename files per book with copyright date. After cleaning up css, I export everything, bring everything up to top-level and because of the renaming, I get the novels sorted chronologically and the series by name.

I am doing this to get rid of the interminable listings in my ebook-reader when all the books are separate. This way I get one listing for novels (I used to do by 5 or 6, but even that can be quite long)

Last edited by theducks; 11-11-2020 at 02:01 PM. Reason: spoilered
G2B is offline   Reply With Quote
Old 11-11-2020, 10:58 AM   #17
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,078
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
Quote:
I use <i>, <b> and <u> <br/> and <hr/> without classes.
The above alone can get rid of dozens of classes and sometimes reduce the size of an epub by half.
1. I recently reformatted a book for my Kindle that had 4 "font-size: ..." classes just for the basic text. Some were so small that I couldn't read them on the eReader.

Made them all 1em

2. I would prefer to just have <p> for 99% of the text, and only add a class= when needed

3. Same for <Hx> - let the style sheet do all the heavy lifting for the formatting, and only tweak where absolutely needed

Last edited by phossler; 11-11-2020 at 11:01 AM. Reason: Can't type
phossler is offline   Reply With Quote
Old 11-11-2020, 12:26 PM   #18
hobnail
Running with scissors
hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.
 
Posts: 1,552
Karma: 14325282
Join Date: Nov 2019
Device: none
Quote:
Originally Posted by phossler View Post
2. I would prefer to just have <p> for 99% of the text, and only add a class= when needed

3. Same for <Hx> - let the style sheet do all the heavy lifting for the formatting, and only tweak where absolutely needed
Clever use of css combinators should let you get rid of 99.9% of all the class= warts. I do have a few in my standard css file but I rarely use them.
hobnail is offline   Reply With Quote
Old 11-12-2020, 12:12 AM   #19
G2B
Enthusiast
G2B began at the beginning.
 
Posts: 28
Karma: 10
Join Date: Feb 2018
Device: PC / iPad
Quote:
Originally Posted by phossler View Post
I would prefer to just have <p> for 99% of the text, and only add a class= when needed
I like an indent of 2 em for new paragraphs. I find it easier for me to read than when there is no-indent thoughout.

I have plenty of books that have only the 'calibre' for page layout and calibre1 for the line indent.
G2B is offline   Reply With Quote
Old 11-12-2020, 12:23 AM   #20
G2B
Enthusiast
G2B began at the beginning.
 
Posts: 28
Karma: 10
Join Date: Feb 2018
Device: PC / iPad
I noticed yesterday that one can also enter regex search/replace during the convert process. I thought why not do it all in advance. I entered the same listing, (preferences +> Common options +> Search/replace) That didn't seem to work. None of the commands were executed. I am probably not doing it right. But the saved searches in the editor work fine. Batching those saves me a lot of repetion and time.
G2B is offline   Reply With Quote
Old 11-23-2020, 07:56 PM   #21
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Quote:
Originally Posted by phossler View Post
The Saved Searches list box is really one dimensional. I wish that there was a way to make it into an expandable 'tree' to improve the organization structure.
There is a very handy way to find what you need in the saved searches window.

You can call saved searches according to their names. Say, you just type "w" in the top "filter" field, and you just see all your saved searches which have a "w" in their name. So, it's up to you to just name your saved searches in a custom way.
roger64 is offline   Reply With Quote
Old 11-24-2020, 09:52 PM   #22
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,078
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
Quote:
Originally Posted by roger64 View Post
There is a very handy way to find what you need in the saved searches window.
I haven't been taking full advantage of the filter

I have been using my own convention to give the first word a 'key' and grouping the saved searches that I typically run together next to each other, like DELETE, TAG, and Hx

One thing -- I'd prefer that filter started matching just from the left, and not anywhere in the string (personal opinion)
Attached Thumbnails
Click image for larger version

Name:	Capture.JPG
Views:	180
Size:	53.3 KB
ID:	183588  
phossler is offline   Reply With Quote
Reply

Tags
batch-process commands, editor


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Multiple Search/Replace as a batch process idf560 Library Management 7 05-22-2020 12:30 AM
Bug? No --search-replace processing in Windows TechnoCat Conversion 3 06-12-2017 11:28 AM
Regex in search problems (NOT Search&Replace; the search bar) lairdb Calibre 3 03-15-2017 07:10 PM
Regex: Search and Replace Thomas_AR Calibre 2 03-31-2016 06:23 PM
need regex help search and replace schuster Calibre 4 01-10-2011 09:00 AM


All times are GMT -4. The time now is 06:38 PM.


MobileRead.com is a privately owned, operated and funded community.