Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 08-20-2015, 12:02 PM   #1
slantybard
my parent's oops...
slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.
 
Posts: 493
Karma: 1477572
Join Date: Feb 2009
Device: Vx->Handera->Clie-> Axim->505->650->KPW/Aura ->L2->iOS/CBW
Regex search and replace in bulk metadata help

I am trying to remove all html formatting from the comments in my library metadata so that when I make catalogs, the html span, div, font, stuff doesn't make the resulting catalog epub look messy.

I tried the "remove" formatting button individually, but that doesn't remove the html code from the comments section.

Then, after searching the forum, I found a pretty good, but not perfect solution, where using the S&R under the bulk metadata edit function, I used the search regex:
Code:
<[^<>]*>
and replaced it with nothing. This successfully removed 99% of the formatting html code littering my comments metadata. Unfortunately, the <br> tags were not removed and since these often show up in the comments metadata within sentences, the comments still get messed up. I have tried S&R with <br> and <\s*?[^>(br)]+\s*?>, but the <br> tags still show up.

My regex skills are not the best nor the worst, but I was wondering if anyone has any ideas that could help me remove the <br> tags in bulk?
slantybard is offline   Reply With Quote
Old 08-20-2015, 12:15 PM   #2
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
I think your regex is fine...

calibre reformats the comments, putting them into a final div IIRC, and if you have multi-paragraph comments those need to be divided up somehow.

Paragraph == anytime there is a line-break.
In HTML mode line-breaks are collapsed into whitespace and paragraphs respect the <p> tags, but after nuking the tags if you have line-breaks calibre will do it's best to recreate a basic structure, and probably inserted hard line-breaks where it thought () you wanted them.
eschwartz is offline   Reply With Quote
Advert
Old 08-20-2015, 01:49 PM   #3
slantybard
my parent's oops...
slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.
 
Posts: 493
Karma: 1477572
Join Date: Feb 2009
Device: Vx->Handera->Clie-> Axim->505->650->KPW/Aura ->L2->iOS/CBW
eschwartz, your reply makes total sense. However, the "line breaks" not having a <br> tag in the html comments are not <br>'d by calibre during the S&R process. This only occurs with the comments that already contain <br> tags interspersed within the sentences from the downloaded metadata. This means that calibre must be ignoring the <br> tags when searched from the edit metadata dialog.

I'm super happy that the S&R removes all the other html junk and will live with manually editing out the <br> as necessary, I just thought that I was doing something wrong or there was an easy answer I was missing.
slantybard is offline   Reply With Quote
Old 08-20-2015, 03:58 PM   #4
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,241
Karma: 61360164
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by slantybard View Post
eschwartz, your reply makes total sense. However, the "line breaks" not having a <br> tag in the html comments are not <br>'d by calibre during the S&R process. This only occurs with the comments that already contain <br> tags interspersed within the sentences from the downloaded metadata. This means that calibre must be ignoring the <br> tags when searched from the edit metadata dialog.

I'm super happy that the S&R removes all the other html junk and will live with manually editing out the <br> as necessary, I just thought that I was doing something wrong or there was an easy answer I was missing.
<br /> removal needs care
choices are replace with:
a space (long line)
replace with the parent tag pair reversed: </p> <p>, </h3> <h3> ... (multiple similar styled lines)
theducks is offline   Reply With Quote
Old 08-20-2015, 04:04 PM   #5
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,241
Karma: 61360164
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
You might avoid this in the future by making a metadat download setting change
Attached Thumbnails
Click image for larger version

Name:	Save as Text.jpg
Views:	243
Size:	57.8 KB
ID:	141263  
theducks is offline   Reply With Quote
Advert
Old 08-20-2015, 04:20 PM   #6
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
First:: I *thought* calibre would turn hard line breaks into <br>'s, but now I am not so sure...


Anyway. So I entered this into the comments column in an empty book (test library ):

Code:
<p>I think your regex is fine...
<br><br>
calibre reformats the comments, putting them into a final div IIRC, and if you have multi-paragraph comments those need to be divided up somehow.
<br><br>
:bulb2: Paragraph == anytime there is a line-break.
<br><br>
In HTML mode line-breaks are collapsed into whitespace and paragraphs respect the &lt;p&gt; tags, but after nuking the tags if you have line-breaks calibre will do it's best to recreate a basic structure, and probably inserted hard line-breaks where it thought (:rolleyes:) you wanted them.</p>
Entered Bulk S&R
Search:
Code:
<br>
Replace: one standard space

Result:
Code:
<p>I think your regex is fine...
  
calibre reformats the comments, putting them into a final div IIRC, and if you have multi-paragraph comments those need to be divided up somehow.
  
:bulb2: Paragraph == anytime there is a line-break.
  
In HTML mode line-breaks are collapsed into whitespace and paragraphs respect the &lt;p&gt; tags, but after nuking the tags if you have line-breaks calibre will do it's best to recreate a basic structure, and probably inserted hard line-breaks where it thought (:rolleyes:) you wanted them.</p>
No <br>'s in sight.
eschwartz is offline   Reply With Quote
Old 08-20-2015, 06:22 PM   #7
slantybard
my parent's oops...
slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.
 
Posts: 493
Karma: 1477572
Join Date: Feb 2009
Device: Vx->Handera->Clie-> Axim->505->650->KPW/Aura ->L2->iOS/CBW
Quote:
Originally Posted by theducks View Post
You might avoid this in the future by making a metadat download setting change
Yes, I have indeed made that change to prevent this in the future. I may end up re-downloading the comments as needed to fix this issue. Thanks.
slantybard is offline   Reply With Quote
Old 08-20-2015, 06:30 PM   #8
slantybard
my parent's oops...
slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.
 
Posts: 493
Karma: 1477572
Join Date: Feb 2009
Device: Vx->Handera->Clie-> Axim->505->650->KPW/Aura ->L2->iOS/CBW
Quote:
Originally Posted by eschwartz View Post

No <br>'s in sight.
Interesting since when I try the same S&R, the <br>'s are left alone. I am going to update my calibre version from 2.24 to the latest and see if the behaviour changes.
slantybard is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Edit metadata in bulk vs search and replace inl1ner Library Management 6 07-14-2014 06:58 PM
regex search/replace - how to? Alt68er Sigil 1 03-11-2014 08:53 PM
regex search/replace Sharlene Sigil 10 01-28-2012 04:14 AM
need regex help search and replace schuster Calibre 4 01-10-2011 09:00 AM
Setting series index in bulk metadata search&replace bubak Calibre 4 12-19-2010 04:04 PM


All times are GMT -4. The time now is 06:26 PM.


MobileRead.com is a privately owned, operated and funded community.