Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 05-30-2026, 11:31 AM   #16
icearch
Groupie
icearch knows what time it isicearch knows what time it isicearch knows what time it isicearch knows what time it isicearch knows what time it isicearch knows what time it isicearch knows what time it isicearch knows what time it isicearch knows what time it isicearch knows what time it isicearch knows what time it is
 
Posts: 157
Karma: 2000
Join Date: Nov 2025
Device: none
Quote:
Originally Posted by ElMiko View Post
See the attached txt file. As I said before, I wasn't able to discern from your screenshots what text editor you were using, but that may be contributing to the issue if they are using a different regex engine (or different version of the same regex engine).
My screenshot all comes from Sigil.

And your code does not work.

Click image for larger version

Name:	31.png
Views:	11
Size:	124.0 KB
ID:	223602

The right bottom says "didn't find any matching to replace"

Click image for larger version

Name:	32.png
Views:	13
Size:	140.7 KB
ID:	223603

You can read this. And this is that I fixed </i> with <\/i> so no error.

Last edited by icearch; 05-30-2026 at 11:38 AM.
icearch is offline   Reply With Quote
Old 05-30-2026, 02:22 PM   #17
ElMiko
Fanatic
ElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileRead
 
ElMiko's Avatar
 
Posts: 570
Karma: 65460
Join Date: Jun 2011
Device: Kindle Voyage, Boox Go 7
Quote:
Originally Posted by icearch View Post
My screenshot all comes from Sigil.

And your code does not work.

Attachment 223602

The right bottom says "didn't find any matching to replace"

Attachment 223603

You can read this. And this is that I fixed </i> with <\/i> so no error.
icearch... you're killing me, bud.

The reason the search didn't match anything is because you changed the text from plain text to html! You stated your original example was plain text so I modified my regex to accommodate that instead of an HTML doc.

All the lines in the new text you showed in the last post are within <p> tags. So of course the search that I modified to match your plain text example won't match anything.

Plain Text:
Click image for larger version

Name:	PlainText.jpg
Views:	8
Size:	307.7 KB
ID:	223607

HTML:
Click image for larger version

Name:	HTML.jpg
Views:	9
Size:	350.7 KB
ID:	223606

As you can see, depending on which version of the regex search i use, i get the same number of matches in both the plain text and html samples. And it doesn't require (nor should it include) a backslash before the "/i" because a forward slash isn't a special character in regex.

---

P.S. What is the regex validation checker you're using?
Attached Files
File Type: txt JoinParagraphs.txt (235 Bytes, 1 views)

Last edited by ElMiko; 05-30-2026 at 02:31 PM.
ElMiko is offline   Reply With Quote
Old 05-31-2026, 05:11 AM   #18
icearch
Groupie
icearch knows what time it isicearch knows what time it isicearch knows what time it isicearch knows what time it isicearch knows what time it isicearch knows what time it isicearch knows what time it isicearch knows what time it isicearch knows what time it isicearch knows what time it isicearch knows what time it is
 
Posts: 157
Karma: 2000
Join Date: Nov 2025
Device: none
My bad, I see you tend to use html tags, so I thought you give me regex works with tags. I'm not a competent regex user so I did not spot that.

And your code works great, thank you.

I use

https://regex101.com/

and

https://regexr.com/

to test regex.
icearch is offline   Reply With Quote
Old 05-31-2026, 02:45 PM   #19
ElMiko
Fanatic
ElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileRead
 
ElMiko's Avatar
 
Posts: 570
Karma: 65460
Join Date: Jun 2011
Device: Kindle Voyage, Boox Go 7
You mentioned you're still polishing your regex skills (as we all are), so I thought I'd simplify the search and try to break down what all the elements of the plain text version of the search are doing.

(\p{L}(|[,;-])|,”|[MD][rs]\.|Mrs\.|(”|—)(?=\n\p{Ll}))\n

It will match any thing that:

(
\p{L}(|[,;-])| — is a letter followed by nothing, a comma, a semi-colon, or a hyphen (FYI, I don't include colons in this search because I find that more often than not, a colon used in a fiction is supposed to be followed by a paragraph break, but if you want to add it in, you can, of course), or
,”| — is a comma followed by a curly closing quote, or
[MD][rs]\.| — is any version of "Mr., Dr., or Ms.", or
Mrs\.| — is "Mrs.", or
(”|—)(?=\n\p{Ll}) — is a closing curly quote or an em dash [provided it is followed by a line break AND a lowercase letter],
)

and

\n — is followed by a line break.

For the first instance of "\p{L}", you should be able to replace it with "\p{Han}" and Chinese equivalents of punctuation marks that shouldn't denote a paragraph break (e.g. the Chinese comma). But it won't work for the second-to-last bit of regex—i.e. (”|—)(?=\n\p{Ll})— because there's no such thing as a lowercase character, and that's a critical limiter in the functioning of that search element.

Last edited by ElMiko; 05-31-2026 at 09:21 PM.
ElMiko is offline   Reply With Quote
Old 05-31-2026, 09:20 PM   #20
icearch
Groupie
icearch knows what time it isicearch knows what time it isicearch knows what time it isicearch knows what time it isicearch knows what time it isicearch knows what time it isicearch knows what time it isicearch knows what time it isicearch knows what time it isicearch knows what time it isicearch knows what time it is
 
Posts: 157
Karma: 2000
Join Date: Nov 2025
Device: none
Thank you for the information, it helps alot.
icearch is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
False paragraph breaks & RegEx ColMac Editor 9 10-21-2022 03:00 PM
Paragraph Regex FDPuthuff Sigil 2 09-27-2020 12:38 PM
How can I fix it when every line is a paragraph? Nyssa Editor 30 12-23-2014 08:23 PM
regex puzzle: finding paragraph before... cybmole Sigil 8 02-24-2012 09:06 AM
Chapters are one giant paragraph. How to fix? bfollowell Conversion 9 02-03-2011 01:20 PM


All times are GMT -4. The time now is 11:16 PM.


MobileRead.com is a privately owned, operated and funded community.