MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Sigil (https://www.mobileread.com/forums/forumdisplay.php?f=203)
-   -   Regex examples (https://www.mobileread.com/forums/showthread.php?t=167971)

Mister L 04-20-2020 10:12 PM

Quote:

Originally Posted by Tex2002ans (Post 3977320)
Definitely don't ever do a Replace All with something like that though, you won't know what sort of rogue madness might happen. :D (And I didn't test on an odd number of en dashes.)

Just the thought of the chaos if I did replace all makes my hair stand on end. :p

Quote:

Originally Posted by Tex2002ans (Post 3977320)
Find: – (\w*[^\.\?!]+?) –
Replace: – \1 –

Hopefully it works, and it will at least save you a lot of time. The rest can probably then be found with a simple:

Find: – <--- Put a space before or after the en dash

[/QUOTE]
Yes that is my method although I am making epub3 so my replace is
– \1 –

Previously I was searching for
– (.*) –
but very often you can have several dashes in the same paragraph but not the same sentence and as I said, I frequently work on books with hundreds of dashes so the more efficient and refined my search is the better.

I search first for a complete set followed by a comma, then complete set no comma, then one dash + comma and then one dash alone. I might be able to combine some of those steps now that I have the more refined search, we'll see.

Mister L 04-24-2020 12:48 AM

Quote:

Originally Posted by Mister L (Post 3978831)
...I am making epub3 so my replace is
–*\1*–


Oops, just noticed that my code got eaten... my replace is –# 160 ;\1# 160 ;– only without the spaces, obviously. :p

Just did a book today with 2313 dashes in it (x_x) so the new and improved regex was greatly appreciated.

doubleshuffle 04-24-2020 02:48 AM

Quote:

Originally Posted by Mister L (Post 3980073)
Oops, just noticed that my code got eaten...

Code:

Hey, we've got code tags!!
:xmas::xmas::xmas::xmas:

Tex2002ans 04-24-2020 05:50 AM

Quote:

Originally Posted by Mister L (Post 3980073)
Just did a book today with 2313 dashes in it (x_x) so the new and improved regex was greatly appreciated.

:thumbsup:

Quote:

Originally Posted by Mister L (Post 3980073)
Oops, just noticed that my code got eaten... my replace is –# 160 ;\1# 160 ;– only without the spaces, obviously. :p

You can get around this restriction by using the [ noparse] [/noparse] tag, but you have to "break" the entity in the middle (I usually shove it right after the ampersand):

Code:

–&[ noparse]#160;[/noparse]\1&[ noparse]#160;[/noparse]–
which will get you this:

–&#160;\1&#160;–

The forum "helpfully" decides to substitute characters, but in this case, we don't want them, so we tell it "not to parse". :D

doubleshuffle 04-24-2020 06:56 AM

Jeez, even the code tag eats stuff. :eek: Wasn't aware of that. It shouldn't, should it?

DiapDealer 04-24-2020 10:18 AM

Quote:

Originally Posted by doubleshuffle (Post 3980159)
Jeez, even the code tag eats stuff. :eek: Wasn't aware of that. It shouldn't, should it?

When it comes to certain entities, yes. They will always require noparse tags to properly render them as entities. Even within code tags.

estevam 05-20-2020 04:06 AM

Hi!

I have 200 ".xhtml" files inside an epub and I need to delete the line 14 of each one of them.

Is it possible to do this with Regex?

Doitsu 05-20-2020 05:26 AM

Quote:

Originally Posted by estevam (Post 3990383)
I have 200 ".xhtml" files inside an epub and I need to delete the line 14 of each one of them.Is it possible to do this with Regex?

That depends on the contents of line 14. If it follows a predictable pattern, someone might suggest a regex for it.

theducks 05-20-2020 11:00 AM

Quote:

Originally Posted by estevam (Post 3990383)
Hi!

I have 200 ".xhtml" files inside an epub and I need to delete the line 14 of each one of them.

Is it possible to do this with Regex?

REGEX is about Patterns, not line numbers.

The simplest is select by the text and replace with 'x' ('x' can be nil )
Can you post the offending :rofl: line?
Is it always the same? If not, what part is different?

wrCisco 05-20-2020 04:03 PM

If "line 14" means the fourteenth line of code (as opposed to the book's textual content), the regex could be:
Code:

Find: ((?:.*?\R){13}).*?\R
Replace: \1

(The search must be executed WITHOUT the DotAll option selected and must start at the beginning of each file).

That's a risky operation, since newlines can be easily added or removed by many automatic formatting processes, but if one is absolutely certain that the text between the thirteenth and the fourteenth newline in the code must be deleted (along with the fourteenth newline), that's one way to do it.

If, instead, "line 14" is not a reference to the lines of xhtml code, you can ignore this contribution and should provide some meaningful pattern as the others already said.

estevam 05-20-2020 06:49 PM

This is what I want to delete (marked as red):

https://i.imgur.com/5fcJ9mQ.gif

As you can see I have a duplicate title in every file, so what I want is to delete that specific line (14) in all of my xhtml files.

theducks 05-20-2020 09:17 PM

you can delete the (</h1>)\s+<p>.+?</p>
replace just with the \1 <<I never perfected the use without capture. I just put back the trigger.

You can replace the .+? with precise text

estevam 05-20-2020 11:34 PM

Quote:

Originally Posted by theducks (Post 3990726)
you can delete the (</h1>)\s+<p>.+?</p>
replace just with the \1 <<I never perfected the use without capture. I just put back the trigger.

You can replace the .+? with precise text

Thank you!!! It worked really well :thumbsup:

d351r3d 06-05-2020 07:31 AM

I want to be able to copy and replace with saved text kond of like how (\d+) saves the number or more than 1 number in a row and then you can output it with \1,\2, etc. Is there a way to do this with all text in between tags. An example would be:

Code:

<p><b>20</b> Words and stuff. Why are there words?<br/><b>20</b> Words and stuff. Why are there words?</p>
I realize that I can (\d+) the numbers and replace them elsewhere with \1.

Find
Code:

<p><b>(\d+)</b>\s … <br/><b>\d+</b>\s … </p>
Replace
Code:

<h4>\1</h4></br><p> … </p><p> … </p>
I figured it out.

Find
Code:

<p><b>(\d+)</b>\s(.*?)<br/><b>\d+</b>\s(.*?)</p>
Replace
Code:

<h4>50:\1</h4><p>\2</p><p>\3</p>

Mister L 06-22-2020 08:19 AM

Is it possible to make a regex to turn a phrase with "fake small caps" into a sentence-case phrase, whilst also handling the occasional capitalised proper name in the middle? It must:
1. remove the spans;
2. put all the text between the spans into lower case, leaving the letters outside the spans in upper case;
3. (this is the tricky part) there may be one span on the whole phrase OR there may be several on different parts of the phrase, so it may be necessary to do a multi-part regex.

Example:
Find this:
Code:

<span class="Cap">F</span><span class="SmallCap">IRST WORD OF THE SENTENCE IS ALWAYS CAPITALISED,</span> <span class="Cap">O</span><span class="SmallCap">OTHER</span> <span class="Cap">W</span><span class="SmallCap">WORDS IN THE SENTENCE MAY OR MAY NOT BE CAPITALISED</span>
Turn it into this:
First word of the sentence is always capitalised, Other Words in the sentence may or may not be capitalised


If there is just one span I can manage it but since there can be two or three (or potentially more) spans I am not sure how to manage those possibilities.


All times are GMT -4. The time now is 07:52 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.