Regex examples - Page 43

Mister L · 04-20-2020, 10:12 PM

Quote:

Originally Posted by Tex2002ans

Definitely don't ever do a Replace All with something like that though, you won't know what sort of rogue madness might happen.

(And I didn't test on an odd number of en dashes.)

Just the thought of the chaos if I did replace all makes my hair stand on end.

Quote:

Originally Posted by Tex2002ans

Find: – (\w*[^\.\?!]+?) –
Replace: – \1 –

Hopefully it works, and it will at least save you a lot of time. The rest can probably then be found with a simple:

Find: – <--- Put a space before or after the en dash

[/QUOTE]
Yes that is my method although I am making epub3 so my replace is
– \1 –

Previously I was searching for
– (.*) –
but very often you can have several dashes in the same paragraph but not the same sentence and as I said, I frequently work on books with hundreds of dashes so the more efficient and refined my search is the better.

I search first for a complete set followed by a comma, then complete set no comma, then one dash + comma and then one dash alone. I might be able to combine some of those steps now that I have the more refined search, we'll see.

Mister L · 04-24-2020, 12:48 AM

Quote:

Originally Posted by Mister L

...I am making epub3 so my replace is
–*\1*–

Oops, just noticed that my code got eaten... my replace is –# 160 ;\1# 160 ;– only without the spaces, obviously.

Just did a book today with 2313 dashes in it (x_x) so the new and improved regex was greatly appreciated.

doubleshuffle · 04-24-2020, 02:48 AM

Quote:

Originally Posted by Mister L

Oops, just noticed that my code got eaten...

Code:

Hey, we've got code tags!!

Tex2002ans · 04-24-2020, 05:50 AM

Quote:

Originally Posted by Mister L

Just did a book today with 2313 dashes in it (x_x) so the new and improved regex was greatly appreciated.

Quote:

Originally Posted by Mister L

Oops, just noticed that my code got eaten... my replace is –# 160 ;\1# 160 ;– only without the spaces, obviously.

You can get around this restriction by using the [ noparse] [/noparse] tag, but you have to "break" the entity in the middle (I usually shove it right after the ampersand):

Code:

–&[ noparse]#160;[/noparse]\1&[ noparse]#160;[/noparse]–

which will get you this:

– \1 –

The forum "helpfully" decides to substitute characters, but in this case, we don't want them, so we tell it "not to parse".

doubleshuffle · 04-24-2020, 06:56 AM

Jeez, even the code tag eats stuff.

Wasn't aware of that. It shouldn't, should it?

DiapDealer · 04-24-2020, 10:18 AM

Quote:

Originally Posted by doubleshuffle

Jeez, even the code tag eats stuff.

Wasn't aware of that. It shouldn't, should it?

When it comes to certain entities, yes. They will always require noparse tags to properly render them as entities. Even within code tags.

estevam · 05-20-2020, 04:06 AM

Hi!

I have 200 ".xhtml" files inside an epub and I need to delete the line 14 of each one of them.

Is it possible to do this with Regex?

Doitsu · 05-20-2020, 05:26 AM

Quote:

Originally Posted by estevam

I have 200 ".xhtml" files inside an epub and I need to delete the line 14 of each one of them.Is it possible to do this with Regex?

That depends on the contents of line 14. If it follows a predictable pattern, someone might suggest a regex for it.

theducks · 05-20-2020, 11:00 AM

Quote:

Originally Posted by estevam

Hi!

I have 200 ".xhtml" files inside an epub and I need to delete the line 14 of each one of them.

Is it possible to do this with Regex?

REGEX is about Patterns, not line numbers.

The simplest is select by the text and replace with 'x' ('x' can be nil )
Can you post the offending

line?
Is it always the same? If not, what part is different?

wrCisco · 05-20-2020, 04:03 PM

If "line 14" means the fourteenth line of code (as opposed to the book's textual content), the regex could be:

Code:

Find: ((?:.*?\R){13}).*?\R
Replace: \1

(The search must be executed WITHOUT the DotAll option selected and must start at the beginning of each file).

That's a risky operation, since newlines can be easily added or removed by many automatic formatting processes, but if one is absolutely certain that the text between the thirteenth and the fourteenth newline in the code must be deleted (along with the fourteenth newline), that's one way to do it.

If, instead, "line 14" is not a reference to the lines of xhtml code, you can ignore this contribution and should provide some meaningful pattern as the others already said.

estevam · 05-20-2020, 06:49 PM

This is what I want to delete (marked as red):

As you can see I have a duplicate title in every file, so what I want is to delete that specific line (14) in all of my xhtml files.

theducks · 05-20-2020, 09:17 PM

you can delete the (</h1>)\s+<p>.+?</p>
replace just with the \1 <<I never perfected the use without capture. I just put back the trigger.

You can replace the .+? with precise text

estevam · 05-20-2020, 11:34 PM

Quote:

Originally Posted by theducks

you can delete the (</h1>)\s+<p>.+?</p>
replace just with the \1 <<I never perfected the use without capture. I just put back the trigger.

You can replace the .+? with precise text

Thank you!!! It worked really well

d351r3d · 06-05-2020, 07:31 AM

I want to be able to copy and replace with saved text kond of like how (\d+) saves the number or more than 1 number in a row and then you can output it with \1,\2, etc. Is there a way to do this with all text in between tags. An example would be:

Code:

<p><b>20</b> Words and stuff. Why are there words?<br/><b>20</b> Words and stuff. Why are there words?</p>

I realize that I can (\d+) the numbers and replace them elsewhere with \1.

Find

Code:

<p><b>(\d+)</b>\s … <br/><b>\d+</b>\s … </p>

Replace

Code:

<h4>\1</h4></br><p> … </p><p> … </p>

I figured it out.

Find

Code:

<p><b>(\d+)</b>\s(.*?)<br/><b>\d+</b>\s(.*?)</p>

Replace

Code:

<h4>50:\1</h4><p>\2</p><p>\3</p>

Mister L · 06-22-2020, 08:19 AM

Is it possible to make a regex to turn a phrase with "fake small caps" into a sentence-case phrase, whilst also handling the occasional capitalised proper name in the middle? It must:
1. remove the spans;
2. put all the text between the spans into lower case, leaving the letters outside the spans in upper case;
3. (this is the tricky part) there may be one span on the whole phrase OR there may be several on different parts of the phrase, so it may be necessary to do a multi-part regex.

Example:
Find this:

Code:

<span class="Cap">F</span><span class="SmallCap">IRST WORD OF THE SENTENCE IS ALWAYS CAPITALISED,</span> <span class="Cap">O</span><span class="SmallCap">OTHER</span> <span class="Cap">W</span><span class="SmallCap">WORDS IN THE SENTENCE MAY OR MAY NOT BE CAPITALISED</span>

Turn it into this:
First word of the sentence is always capitalised, Other Words in the sentence may or may not be capitalised

If there is just one span I can manage it but since there can be two or three (or potentially more) spans I am not sure how to manage those possibilities.

05-20-2020, 04:03 PM	#640
wrCisco Connoisseur Posts: 50 Karma: 605108 Join Date: Apr 2016 Device: none	If "line 14" means the fourteenth line of code (as opposed to the book's textual content), the regex could be: Code: Find: ((?:.?\R){13}).?\R Replace: \1 (The search must be executed WITHOUT the DotAll option selected and must start at the beginning of each file). That's a risky operation, since newlines can be easily added or removed by many automatic formatting processes, but if one is absolutely certain that the text between the thirteenth and the fourteenth newline in the code must be deleted (along with the fourteenth newline), that's one way to do it. If, instead, "line 14" is not a reference to the lines of xhtml code, you can ignore this contribution and should provide some meaningful pattern as the others already said.

06-05-2020, 07:31 AM	#644
d351r3d Enthusiast Posts: 48 Karma: 10 Join Date: Aug 2017 Device: none	I want to be able to copy and replace with saved text kond of like how (\d+) saves the number or more than 1 number in a row and then you can output it with \1,\2, etc. Is there a way to do this with all text in between tags. An example would be: Code: <p><b>20</b> Words and stuff. Why are there words?<br/><b>20</b> Words and stuff. Why are there words?</p> I realize that I can (\d+) the numbers and replace them elsewhere with \1. Find Code: <p><b>(\d+)</b>\s … <br/><b>\d+</b>\s … </p> Replace Code: <h4>\1</h4></br><p> … </p><p> … </p> I figured it out. Find Code: <p><b>(\d+)</b>\s(.?)<br/><b>\d+</b>\s(.?)</p> Replace Code: <h4>50:\1</h4><p>\2</p><p>\3</p>

06-22-2020, 08:19 AM	#645
Mister L Groupie Posts: 179 Karma: 91148 Join Date: Jun 2010 Device: Sony 350	Is it possible to make a regex to turn a phrase with "fake small caps" into a sentence-case phrase, whilst also handling the occasional capitalised proper name in the middle? It must: 1. remove the spans; 2. put all the text between the spans into lower case, leaving the letters outside the spans in upper case; 3. (this is the tricky part) there may be one span on the whole phrase OR there may be several on different parts of the phrase, so it may be necessary to do a multi-part regex. Example: Find this: Code: <span class="Cap">F</span><span class="SmallCap">IRST WORD OF THE SENTENCE IS ALWAYS CAPITALISED,</span> <span class="Cap">O</span><span class="SmallCap">OTHER</span> <span class="Cap">W</span><span class="SmallCap">WORDS IN THE SENTENCE MAY OR MAY NOT BE CAPITALISED</span> Turn it into this: First word of the sentence is always capitalised, Other Words in the sentence may or may not be capitalised If there is just one span I can manage it but since there can be two or three (or potentially more) spans I am not sure how to manage those possibilities.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Examples of Subgroups	emonti8384	Lounge	32	02-26-2011 07:00 PM
Accessories Pen examples	Gunnerp245	enTourage Archive	15	02-21-2011 04:23 PM
Stylesheet examples?	Skitzman69	Sigil	15	09-24-2010 09:24 PM
Examples	kafkaesque1978	iRiver Story	1	07-26-2010 04:49 PM
Looking for examples of typos in eBooks	Tonycole	General Discussions	1	05-05-2010 05:23 AM

04-24-2020, 06:56 AM	#635
doubleshuffle Unicycle Daredevil Posts: 13,944 Karma: 185432100 Join Date: Jan 2011 Location: Planet of the Pudding Brains Device: Aura HD (R.I.P. After six years the USB socket died.) tolino shine 3	Jeez, even the code tag eats stuff. Wasn't aware of that. It shouldn't, should it?

05-20-2020, 04:06 AM	#637
estevam Junior Member Posts: 3 Karma: 10 Join Date: Jul 2015 Device: Kobo Aura HD	Hi! I have 200 ".xhtml" files inside an epub and I need to delete the line 14 of each one of them. Is it possible to do this with Regex?

05-20-2020, 06:49 PM	#641
estevam Junior Member Posts: 3 Karma: 10 Join Date: Jul 2015 Device: Kobo Aura HD	This is what I want to delete (marked as red): As you can see I have a duplicate title in every file, so what I want is to delete that specific line (14) in all of my xhtml files.

05-20-2020, 09:17 PM	#642
theducks Well trained by Cats Posts: 31,323 Karma: 62025226 Join Date: Aug 2009 Location: The Central Coast of California Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A	you can delete the (</h1>)\s+<p>.+?</p> replace just with the \1 <<I never perfected the use without capture. I just put back the trigger. You can replace the .+? with precise text

Advert

Advert