MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Sigil (https://www.mobileread.com/forums/forumdisplay.php?f=203)
-   -   Regex examples (https://www.mobileread.com/forums/showthread.php?t=167971)

Doitsu 08-19-2019 02:15 PM

1 Attachment(s)
Quote:

Originally Posted by roger64 (Post 3880478)
@Doitsu
I did not find any magic button to reverse the order of the files.

I was referring to the Name arrow in the Select Files dialog box.

roger64 08-19-2019 03:52 PM

:smack:

Thank you Doitsu. Indeed... You know me too well by now... :D

BillPearl 10-11-2019 02:37 PM

Here are a pile of 'code error' corrections I have accumulated over time. Few are mine, most are from generous people who have shared their efforts. Thanks to all of you.
Suggest you copy and paste into a new text file.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~
FIND / REPLACE text (use with tags)
For a string of letters and numbers
([^>]+)(.*?)
eg.
<a name="Chapter_LIII" id="Chapter_LIII"></a>
<a([^>]+)(.*?)></a>
or
<body id="0-a5e9337bbdff40f4b38c8f20e5723a9a" class="calibre">
Find id="0-a5e9337bbdff40f4b38c8f20e5723a9a"
With id=([^>]+)(.*?) class

some text like, id=, then ([^>]+)(.*?) and then something to end string of letters & numbers

Find number in <b> in Regex mode
<b>[0-9]+</b>

Find Roman Numerals
lower or UPPER CASE
[xvi]+
[XVI]+ \>I[XVI]+

[1 space]
\[\s]

Find I, II, III
<p>[I]+</p>

Find Pg ### in Regex mode (?DotAll)
[P][g] (\d+)
[P][g] [xvi]+

Find Page_394 in Regex mode (?DotAll)
\Q"Page_\E(\d+)"

Find id="sigil_toc_id_3"
\Qid="sigil_toc_id_\E(\d+)"


[^\.] will match anything but . eg [^\.>]</

[,;:] will match any punctuation except period
[^,;:], where ^ stands for NOT in the character set.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~

DiapDealer 10-11-2019 02:55 PM

Can you please remove all but the first few regex examples/solutions? This thread is for regex examples/tips/issues only.

doubleshuffle 10-11-2019 05:58 PM

But can you then post the rest in its own thread? Looks quite useful as well. Thanks.

DiapDealer 10-11-2019 07:16 PM

Quote:

Originally Posted by doubleshuffle (Post 3902240)
But can you then post the rest in its own thread? Looks quite useful as well. Thanks.

Yes. Sorry. I guess I should have mentioned that! :o

BillPearl 10-11-2019 10:34 PM

Sorry to have to ask but where and how do I edit my post. You're right about my topic going 'off topic'.

DiapDealer 10-12-2019 06:54 AM

Quote:

Originally Posted by BillPearl (Post 3902312)
Sorry to have to ask but where and how do I edit my post. You're right about my topic going 'off topic'.

If you don't see an Edit button at the bottom of your post, it may be due to a temporary restriction on new members. If you create a new thread with all the non-regex tips in it, I'll take care of editing the post in this thread for you.

Skydancer 04-13-2020 06:35 AM

I need some help... After OCR there are occasional punctuation errors like this in the text:

Wrong:
Quote:

»Lorem »ipsum dolor‘ sit amet, consectetur adipiscing elit.«
AND:
Lorem »ipsum dolor‘ sit amet, consectetur adipiscing elit.
Correct:
Quote:

»Lorem ,ipsum dolor‘ sit amet, consectetur adipiscing elit.«
AND:
Lorem ,ipsum dolor‘ sit amet, consectetur adipiscing elit.
The quotation marks used here (‘…’) are curvy/smart, not dumb.

I got this far:
Code:

Find:
».*?([a-z])\‘

... but it's probably too greedy, and I can't get the Replace to work properly.

Doitsu 04-13-2020 06:58 AM

@Skydancer:

The following should work for you:

Find:»([^»|‘]+)‘
Replace:,\1‘

Skydancer 04-13-2020 07:57 AM

Thank you, @Doitsu! :bow2:
That got me off to a good start. I modified your regex just a tiny bit so now it works perfectly:

Code:

»([^»,|‘]+)‘

Mister L 04-16-2020 04:01 PM

I have a feeling the answer for this is going to seem simple once someone tells me but my brain is not working properly at the moment so... if anyone can help...

I want to find sets of em dashes with some text between them that are in the *same sentence*. So the text between the em dashes cannot include .!? but can include ,;: for example.

For example:

Match:

Sanctuaire – là encore situé à Eyralice – abritait

Don't match (the . in this example could be ? or !):

les Ténèbres – et de valoir la mort à qui le possédait. Or c’est dans ce livre à la fois oublié et maudit, que Jall devait lire que le Dernier Sanctuaire – là encore situé à Eyralice

Tex2002ans 04-16-2020 06:57 PM

Quote:

Originally Posted by Mister L (Post 3977177)
I want to find sets of em dashes with some text between them that are in the *same sentence*. So the text between the em dashes cannot include .!? but can include ,;: for example.

Did you mean sets of EN dashes instead? (Like your examples use?)

Something like this might work:

Find: –(\w*[^\.\?!]+?)–
Replace: —\1—

That would replace the en dashes with em dashes, and stick the captured "non-sentence" back in the middle.

I didn't do thorough testing though, so it probably would break in a lot of edge cases, but it did work correctly on your examples.

Mister L 04-16-2020 10:14 PM

Quote:

Originally Posted by Tex2002ans (Post 3977245)
Did you mean sets of EN dashes instead? (Like your examples use?)

Something like this might work:

Find: –(\w*[^\.\?!]+?)–
Replace: —\1—

That would replace the en dashes with em dashes, and stick the captured "non-sentence" back in the middle.

I didn't do thorough testing though, so it probably would break in a lot of edge cases, but it did work correctly on your examples.

Thank you! Yes sets of en dashes sorry for the typo. I don't need to replace the dashes, I need to replace the spaces actually (in French we use non-breaking spaces with dashes but depending where the dash is the space goes either in front or in back so you can't just do "replace all"), but the "not a sentence" part is what was giving me trouble. Some authors like to sprinkle those things around like there was no other punctuation and you can have multiple dashes in one paragraph and not all of them are sets. Now that I see the result I understand it but I don't think I would have come up with it on my own so it's a good thing I asked for help. This should save me quite a bit of time. Today I was working on a book that had over 400 dashes in it and that's not even the record so you can see why I'd want to optimise my searches.

Tex2002ans 04-16-2020 11:02 PM

Quote:

Originally Posted by Mister L (Post 3977299)
Now that I see the result I understand it but I don't think I would have come up with it on my own so it's a good thing I asked for help. This should save me quite a bit of time. Today I was working on a book that had over 400 dashes in it and that's not even the record so you can see why I'd want to optimise my searches.

Definitely don't ever do a Replace All with something like that though, you won't know what sort of rogue madness might happen. :D (And I didn't test on an odd number of en dashes.)

Quote:

Originally Posted by Mister L (Post 3977299)
I don't need to replace the dashes, I need to replace the spaces actually (in French we use non-breaking spaces with dashes but depending where the dash is the space goes either in front or in back so you can't just do "replace all"), but the "not a sentence" part is what was giving me trouble. Some authors like to sprinkle those things around like there was no other punctuation and you can have multiple dashes in one paragraph and not all of them are sets.

Find: – (\w*[^\.\?!]+?) –
Replace: –&nbsp;\1&nbsp;–

Hopefully it works, and it will at least save you a lot of time. The rest can probably then be found with a simple:

Find: – <--- Put a space before or after the en dash


All times are GMT -4. The time now is 07:52 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.