Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 03-01-2020, 07:32 AM   #1
MerlinMama
Evangelist
MerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beauty
 
MerlinMama's Avatar
 
Posts: 498
Karma: 32554
Join Date: May 2014
Location: Canada
Device: Kobo Sage
Help creating possible Regex-Function

**If this shouldn't be here, please move or delete, and you have my apologies**



I have been trying to understand Python to create my own regex-functions, but even after a year, I'm clueless. I hope that someone can help me create one...or tell me if what I want is even possible.

I have a very long, created text, where the author included a lot of sections where each paragraph is wrapped in tags for italics. That's fine. But they also wrapped all the sections in a tag which automatically makes those paragraphs italic.

I have been trying regular search and replace expressions, but I can't get anything that works whether there is one paragraph or eight paragraphs to remove the italics tags from. It's either too greedy, or not greedy enough.

I would like help to do the following:
  1. select and mark all text between the two tags <form></form>
  2. delete the italics tags from the beginnings and endings of each paragraph
  3. if possible (probably not, but...), change the italics tags INSIDE each paragraph to bold tags

In any case, I'd appreciate being directed to an online tutorial type of place that would be easy enough for me to understand (maybe easier than "Python for Dummies" at the rate I'm going ) so I can eventually learn to do it myself.

If someone would prefer to help off-forum - messages - I don't mind that either.
MerlinMama is offline   Reply With Quote
Old 03-01-2020, 08:10 AM   #2
stumped
Wizard
stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.
 
Posts: 3,305
Karma: 10259306
Join Date: May 2016
Device: kobo forma, Kobo Libra, Huawei media Tab, fire HD10, PW3 HDX8.9,
do it in two passes.
1 remove italic tags from paragraphs
then
2 remove the section tags

if you post a sample section it will be easier to help

e.g. to remove the form tags
find <form>(.*)</form)
replace with \1
stumped is offline   Reply With Quote
Old 03-01-2020, 08:43 AM   #3
MerlinMama
Evangelist
MerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beauty
 
MerlinMama's Avatar
 
Posts: 498
Karma: 32554
Join Date: May 2014
Location: Canada
Device: Kobo Sage
Quote:
Originally Posted by stumped View Post
do it in two passes.
1 remove italic tags from paragraphs
then
2 remove the section tags

if you post a sample section it will be easier to help

e.g. to remove the form tags
find <form>(.*)</form)
replace with \1
Oh, I want to keep the section tags, just remove the italics from within the section tags (I have over 100 different sections in the text, and they can have either just one paragraph or many paragraphs).

Here's an example:
Spoiler:

Start with:

<form>
<p><em>Blah, blah, blah.</em></p>
<p><em>"Blah, blah-blah, blaaah."</em></p>
</form>

End with:

<form>
<p>Blah, blah, blah.</p>
<p>"Blah, blah-blah, blaaah."</p>
</form>


And more difficult, but if possible:
Spoiler:

Start with:

<form>
<p><em>Blah, blah, blah.</em></p>
<p><em>"Blah, </em>blah-blah<em>, blaaah."<em></p>
</form>

End with:

<form>
<p>Blah, blah, blah.</p>
<p>"Blah, <strong>blah-blah</strong>, blaaah."</p>
</form>
MerlinMama is offline   Reply With Quote
Old 03-01-2020, 09:04 AM   #4
stumped
Wizard
stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.
 
Posts: 3,305
Karma: 10259306
Join Date: May 2016
Device: kobo forma, Kobo Libra, Huawei media Tab, fire HD10, PW3 HDX8.9,
ok for the 1st one
find
<p><em>(.*)</em></p>
replace <p>\1</p>
for the 2nd one, use 2 passes - first remove the not-needed inner bits
which start with a close em tag followed by an open em tag:

find </em>(.*)<em>
replace \1
then use a 2nd pass to change em to strong
the trick is to use several simple expressions not one very complicated one, and review results after each stage.
make a backup before risking a replace all
if you have the patience, step through using find replace to do single operations and then move on to the next candidate, that way you can skip past any o you want to leave unchanged

NB I do all this using sigil - syntax may be different for other tools
stumped is offline   Reply With Quote
Old 03-01-2020, 09:45 AM   #5
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,241
Karma: 61360164
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
If you were doing ALL italic, I would suggest the Edit Spans and Divs plugin.
Don't believe the name. it does many more tag types...ONE type at a time (It is Diaps toolbag in Sigil)

For the stop trying to do all in ONE PASS. All you do is give Murphy a leg up

(.*?) reduces the greedyness of th (.*)

<p><i>What!</i> will happen if this <i>code appears?</i></p>
theducks is offline   Reply With Quote
Old 03-01-2020, 11:13 AM   #6
MerlinMama
Evangelist
MerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beauty
 
MerlinMama's Avatar
 
Posts: 498
Karma: 32554
Join Date: May 2014
Location: Canada
Device: Kobo Sage
I think I am misunderstanding you. From what I understand, your suggestions would remove the <em></em> tags, regardless if they are found between the <form></form> tags, which is NOT what I want. Anything outside the <form></form> tags should stay as is. The <form> tag contains formatting which has italics in it, so the addition of <em> tags are unnecessary.

I'll insert another, maybe better, example for you to comment on so I can understand. Although I'm starting to think it can't be done, or I'm just missing something obvious.

I'll mark tags I want to keep in blue, tags I don't want to keep in red (only those I'm asking about; <p> tags I won't touch.)

Spoiler:

<p>What if he was <em>right outside</em> the door? What if he came back into the room?</p>

<p class="centered">oOOo</p>

<form>
<p><em>The creature crept around the room, either missing or ignoring the small boy huddled beneath the covers on the bed. His regular blankets were placed normally, while Kevin was wrapped in the ratty black blanket he found in the treehouse in the woods.</em></p>
<p><em>He watched as the large form ambled to the door and slipped out, allowing the door to ease shut behind it. Kevin had </em>never<em> felt so relieved in his short life.</em></p>
</form>

<p class="centered">oOOo</p>

<p><em>He felt a tear slip from his eye as he came out of the memory. </em></p>

<p class="centered">oOOo</p>

<form>
<p><em>"You'd best go right to sleep, Kevin," Nana scolded. "If you don't, the Great Wolf will come in and eat us all up."</em></p>
<p><em>Poppy frowned. "Don't scare the boy, you old crone." He ushered her out and shut the door behind them.</em>
<p><em>Nana was so silly, there was no such thing as a Great Wolf that ate people, Daddy said so, Kevin thought.</em>
</form>

<p class="centered">oOOo</p>

I don't mind doing multiple passes, but as it is, I haven't been able to do anything except check each one almost individually. That's why I though that creating a Regex-Function was the way to go.

Ideally, I had hoped to be able to have something that says: "change <p><em></em></p> to just <p></p> when between <form></form> tags". I wouldn't even mind if it was "remove all <em></em> tags when between <form></form> tags".
MerlinMama is offline   Reply With Quote
Old 03-01-2020, 11:33 AM   #7
stumped
Wizard
stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.
 
Posts: 3,305
Karma: 10259306
Join Date: May 2016
Device: kobo forma, Kobo Libra, Huawei media Tab, fire HD10, PW3 HDX8.9,
Quote:
Originally Posted by MerlinMama View Post
I think I am misunderstanding you. From what I understand, your suggestions would remove the <em></em> tags, regardless if they are found between the <form></form> tags, which is NOT what I want. Anything outside the <form></form> tags should stay as is. The <form> tag contains formatting which has italics in it, so the addition of <em> tags are unnecessary.

I'll insert another, maybe better, example for you to comment on so I can understand. Although I'm starting to think it can't be done, or I'm just missing something obvious.

I'll mark tags I want to keep in blue, tags I don't want to keep in red (only those I'm asking about; <p> tags I won't touch.

Ideally, I had hoped to be able to have something that says: "change <p><em></em></p> to just <p></p> when between <form></form> tags". I wouldn't even mind if it was "remove all <em></em> tags when between <form></form> tags".
Well that is doable, just add to the previous suggestions so it matches only on a non greedy Form tag open followed by .* followed by prev example followed by .* followed by close form tag. I am typing on tablet and I lack the characters to show sample code
stumped is offline   Reply With Quote
Old 03-01-2020, 11:46 AM   #8
stumped
Wizard
stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.
 
Posts: 3,305
Karma: 10259306
Join Date: May 2016
Device: kobo forma, Kobo Libra, Huawei media Tab, fire HD10, PW3 HDX8.9,
ps. you bracket the text fragments you want to keep, so you can refer to them as \1 \2 \3 in the replace forumala
so ( back on a pc keyboard now... find
<form>(.*)<em>(.*)</em>(.*)</form>

that finds stuff that is in em tags which are within form tags, and you have 3 text fragments which will be preserved
now assemble how you want it to look without the em tags so replace with
<form>\1\2\3</form>
stumped is offline   Reply With Quote
Old 03-01-2020, 12:44 PM   #9
MerlinMama
Evangelist
MerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beautyMerlinMama does all things with Zen-like beauty
 
MerlinMama's Avatar
 
Posts: 498
Karma: 32554
Join Date: May 2014
Location: Canada
Device: Kobo Sage
THANK YOU


I knew I was missing something simple. Leave it to me to complicate everything. I was using something similar to that (<form>(.*?)<em>|</em>), but as you can see, I would need multiple passes, And it would also jump to different sections on occasion. I got very annoyed and frustrated.

Using your expression, even if I still have to check, it will be immensely easier. I can also tweak it to only remove at the beginning and end of paragraphs, and then go through and change the other <em> tags to <strong> tags.

I'm babbling. Sorry.

Thanks again!
MerlinMama is offline   Reply With Quote
Old 03-01-2020, 12:58 PM   #10
stumped
Wizard
stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.
 
Posts: 3,305
Karma: 10259306
Join Date: May 2016
Device: kobo forma, Kobo Libra, Huawei media Tab, fire HD10, PW3 HDX8.9,
Instead of \1\2\3 for the replace you can optionally put new tags either side of the \2

E.g. \1<strong>\2</strong>\3
stumped is offline   Reply With Quote
Old 03-01-2020, 01:03 PM   #11
stumped
Wizard
stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.
 
Posts: 3,305
Karma: 10259306
Join Date: May 2016
Device: kobo forma, Kobo Libra, Huawei media Tab, fire HD10, PW3 HDX8.9,
Ps I only know a small subset of what can be done with regex, mostly learned by asking here!
You got lucky in that you wanted something similar to what I had done in another book tweak.

I use the sigil editor rather than the calibre one, and I think there is a thread of regex examples in The sigil forum
stumped is offline   Reply With Quote
Old 03-02-2020, 03:47 PM   #12
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 80,677
Karma: 150249619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
However, the change from <em> to <strong> can be done with Diaps Editing Toolbag editor plugin. You do have to configure it to add in strong to what em can be replaced with. And once done, you won't need regex.
JSWolf is offline   Reply With Quote
Old 03-02-2020, 06:40 PM   #13
Brett Merkey
Not Quite Dead
Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.
 
Posts: 195
Karma: 654170
Join Date: Jul 2015
Device: Paperwhite 4; Galaxy Tab
One thing to consider in similar situations such as presented by the OP is to use CSS contextual selectors rather than subject the text to regex.

As I understand, there was a problem with italicized text within forms. A style rule could deal with that instantly:

form em {font-style: normal}

This would un-italicize anything within em tags which are in a form—while ignoring all other em tag content.
Brett Merkey is offline   Reply With Quote
Old 03-03-2020, 02:04 AM   #14
stumped
Wizard
stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.
 
Posts: 3,305
Karma: 10259306
Join Date: May 2016
Device: kobo forma, Kobo Libra, Huawei media Tab, fire HD10, PW3 HDX8.9,
that's useful to know, and interesting.
I don't think I have ever seen a <form> tag when tweaking novels though.
what is the proper/normal use of <form> in book CSS ?

google tells me that <form> in HTML is used for user iput forms, which I guessed would be the case, but that make no sense in an EPUB ?

Last edited by stumped; 03-03-2020 at 02:09 AM.
stumped is offline   Reply With Quote
Old 03-03-2020, 05:53 AM   #15
Brett Merkey
Not Quite Dead
Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.Brett Merkey ought to be getting tired of karma fortunes by now.
 
Posts: 195
Karma: 654170
Join Date: Jul 2015
Device: Paperwhite 4; Galaxy Tab
Quote:
don't think I have ever seen a <form> tag when tweaking novels though
LOL. I deliberately bit my tongue and did not venture into that issue. However, we can encounter all sorts of beasts in the HTML jungle. I once corrected a book that was done entirely in classed and nested <blockquote> tags. That was a fun learning experience, since blockquote tags can be nested but <p> tags cannot...
Brett Merkey is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Predefined regex for Regex-function sherman Editor 3 01-19-2020 05:32 AM
regex function replacement The_book Sigil 5 12-09-2019 09:45 AM
Random number in Regex Function? nqk Editor 2 05-23-2017 11:47 PM
RegEx-Function and hyphenation problem scratch Editor 4 01-28-2017 12:44 PM
Regex Function about «» and “” senhal Editor 8 04-06-2016 02:12 AM


All times are GMT -4. The time now is 09:09 PM.


MobileRead.com is a privately owned, operated and funded community.