11-15-2023, 05:03 PM | #1 |
Enthusiast
Posts: 39
Karma: 10
Join Date: Jul 2023
Device: none
|
Problems with regex and text flag
I was using this regex: "(.*?)"
the quotes are part of the regex on this p <p class="msonormal3">“Ah,” Pen Rel said again, and inclined his head. "Mostly, it is a matter of temperature control.<br class="calibre13"/>How much simpler, after all, to let the wandering air take the heat away than to condition the dock entire.”</p> and the found text was: "Mostly, it is a matter of temperature control.<br class=" Also ^ and $ don't work how I would expect. They seem to match only after and before newlines in text rather than the start and end of paragraph text. I also find it very difficult to distinguish between “, ", and ” in the find and replace boxes. Last edited by jwes; 11-15-2023 at 05:20 PM. |
11-15-2023, 06:01 PM | #2 |
Bibliophagist
Posts: 35,464
Karma: 145525534
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
One item is that you have a mix of curly quotes around the “Ah,” for example) and straight quotes around the "Mostly, it is a matter of temperature control.<br class=" for instance. If I modify the quotes to curly quotes for the content:
Code:
<p class="msonormal3">“Ah,” Pen Rel said again, and inclined his head. “Mostly, it is a matter of temperature control.<br class="calibre13"/>How much simpler, after all, to let the wandering air take the heat away than to condition the dock entire.”</p> You might want to try smartening the punctuation. BTW, using a <br> to break in the middle of a paragraph is a bad idea. Let the renderer break the lines where it needs to. |
Advert | |
|
11-15-2023, 06:31 PM | #3 | ||
Enthusiast
Posts: 39
Karma: 10
Join Date: Jul 2023
Device: none
|
Quote:
Quote:
|
||
11-15-2023, 09:49 PM | #4 | ||
Bibliophagist
Posts: 35,464
Karma: 145525534
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Quote:
Quote:
For the most part, my first pass at smartening punctuation is using the Modify Epub calibre plugin. Last edited by DNSB; 11-15-2023 at 09:52 PM. |
||
11-16-2023, 06:13 AM | #5 |
A Hairy Wizard
Posts: 3,095
Karma: 18727053
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
Or, since you are using Sigil, the Smarten Punctuation plug-in.
|
Advert | |
|
11-16-2023, 03:59 PM | #6 | |
Enthusiast
Posts: 39
Karma: 10
Join Date: Jul 2023
Device: none
|
Quote:
|
|
11-16-2023, 04:46 PM | #7 |
Sigil Developer
Posts: 7,647
Karma: 5433388
Join Date: Nov 2009
Device: many
|
The text flag is nothing more than the automatic prepending of the following regex:
static const QString REGEX_OPTION_TEXT_ONLY = "<[^<>]*>(*SKIP)(*F)|"; which can be overruled by later regex you add. If should only match text outside of < > chars unless overruled by later regex. Try turning off text and prepending it yourself to try to see what is interfering. |
11-16-2023, 06:13 PM | #8 |
Sigil Developer
Posts: 7,647
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Ah! Your use of ? to indicate Minimal Match (non-greedy) will actually *invert* the Greediness of the Minimal Match Regex Option. The Text flag needs its part to default to greedy.
Try replacing the smart quote at the end of "entire." with a normal quote and then remove the ? that toggles the initial Text only regex to be nongreedy. ie. use "(.*)" and then make sure the Minimal Match and DotAll are both set in the Regex options and make sure the Text box is checked. That seems to work. But using a real parser or the Smarten plugin is probably your best bet here as corner cases will be found. If you do decide to use Search and replace and regex, you should first do a Dry Run using Shift key on the Count (#) button or better yet use Shift on the Replace All button to see a complete table of the potential replacements and allow you to remove any corner cases (filter those changes out) before proceeding. Both Dry Run Replace All and Filtered Replace All are newer Find and Replace tools that really help in situations where unspecified corner cases may exist. Give them a try. Additionally, making a Checkpoint would not hurt either. Last edited by KevinH; 11-16-2023 at 06:39 PM. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Regex for Marking Text? | Turtle91 | Sigil | 13 | 09-18-2018 05:30 PM |
regex parenthesis text formatting question! | ksimpson1986 | Sigil | 3 | 11-10-2016 01:54 AM |
Regex questions (body of text only?) | rosshalde | Sigil | 3 | 10-23-2014 09:02 PM |
Is there a way to remove text from Title with regex | LadyKate | Library Management | 8 | 02-14-2014 04:12 PM |
Is there RegEx to <span> ALL CAPS text? | phossler | Sigil | 4 | 03-10-2013 02:43 PM |