![]() |
#1 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Nov 2010
Device: none
|
find dialogues with missing closing inverted commas
I would like to use regex to find the following error in my book:
«dialogue (missing » at the end of phrase). correct string: «dialogue» Tried some code, but I only be able to find a text between « and ». Thanks for your help. |
![]() |
![]() |
![]() |
#2 |
A Hairy Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,336
Karma: 20171571
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
You should be able to limit your FIND to a single line using the options drop-down. Then use a negative lookahead to limit it to lines that do NOT have the »
For example: Find: «(.*?)(?!») Replace: «\1» Last edited by Turtle91; 04-21-2023 at 07:57 AM. |
![]() |
![]() |
![]() |
#3 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Nov 2010
Device: none
|
In the options of the Find and Replace section at the center bottom of the Sigil window, I cannot find the single line option.
the «(.*?)(?!») code find all phrases that start with « character, with or without » where i'm wrong? |
![]() |
![]() |
![]() |
#4 | |
A Hairy Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,336
Karma: 20171571
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
Sorry, I’m away from my computer, but IIRC, you want to un-check Dot-All.
From the Users guide: Quote:
|
|
![]() |
![]() |
![]() |
#5 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,651
Karma: 5703586
Join Date: Nov 2009
Device: many
|
Uncheck Regex flag "Dot All" "Dot All" means that a "." will match all characters including a new line character. By unchecking it, you limit wildcard matching to a single line.
FWIW, using that regex and Sigil's replacement table, you should be able to quickly scan all matches for the few problem cases. If not, try a different regex to limit things even further. Check out the Sigil Users Guide for more info. |
![]() |
![]() |
![]() |
#6 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Nov 2010
Device: none
|
Thank you both for your help, you have put me on the right track.
The code «(.*?)(?!») don't work despite I unceck "DotAll" flag, so my next step will be to study regex syntax better on the Sigil manual. Have a very nice day and... ![]() |
![]() |
![]() |
![]() |
#7 |
A Hairy Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,336
Karma: 20171571
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
That should work.
![]() Can you post an example of the code you are searching through so we can see where the issue might be?? |
![]() |
![]() |
![]() |
#8 | ||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Search: (“[^”\r\n]*)</p> Instead of using LEFT/RIGHT DOUBLE QUOTES, substitute whatever quotations you need for your language. In your case, you'll use left/right guillemets: Search: («[^»\r\n]*)</p> This will catch lines 1+3: Code:
<p>«This is a test.</p> <p>And this is more «This is a test.» And more.</p> <p>Testing. «This is a test.</p> <p>«This is a test.»</p> - - - Side Note: Nowadays, I use Toxaris's EPUB Tools + "Dialogue Check", which is/was the ultimate way to catch all quotation mark errors. I wrote about it in detail back in: While the pure regex method works in most cases, it will not work on heavily nested (or mismatched) quotation marks. To catch all left/right or outer/inner quotation mark errors, you definitely need something more smart that:
For a little more info on that, see my post in: Sadly, Toxaris's EPUB Tools only works in Microsoft Word... and as of a few years ago, Toxaris stopped maintaining it + his site went down. ![]() I did save a copy + attach it to this 2022 post though. - - - Quote:
If you ever needed to catch/tag all dialogue in a book for some reason, there's your answer too. ![]() Last edited by Tex2002ans; 04-22-2023 at 09:00 PM. |
||
![]() |
![]() |
![]() |
#9 | |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,528
Karma: 145863177
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
You would end up with «Hello she said.» |
|
![]() |
![]() |
![]() |
#10 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Nov 2010
Device: none
|
The solution proposed by Tex2002ans works perfectly and is a great help. However, to be perfect, it should be able to detect errors when two sentences are in the same paragraph. In the following example, only the last absent closed guillemet is detected, but not the first one.
Code:
<p>«Have you been here a long time Terry?» ... «About five days» he replied.</p> Code:
«[^»]*» Perhaps making two passes of the entire text with the two solutions might eliminate errors altogether. |
![]() |
![]() |
![]() |
#11 | |||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Find: («[^»\r\n]*) This will find each of the open quote, all the way to a closing quote OR end of line:
and it will catch stuff like:
But now you'll have an absolute TON of false positives to look through... Quote:
This lets you create and save multiple Find/Replaces + organize them into Groups. As described below, you can come up with quite a few sets of "try to catch missing open/close quotes" Regular Expressions. Those regular expressions alone will carry you like 90% of the way there... but it's that final 10% that's really troublesome. ![]() Quote:
AND you have to also search forwards/backwards. Then, you'll want to expand it:
AND you have to deal with (or skip) false positives. (Is this an apostrophe or a single quotation mark?) AND then you have to deal with all the potential HTML mess in the way too. (<span>, <i>, <em>, class="", [...]) AND then you have to handle inner/outer quotes of all the different languages:
will not work for:
(That's not a complete list of steps though, but only what I can quickly think of off the top of my head!) I went into much more detail in the linked posts/threads... - - - Toxaris already solved all those issues in his "Dialogue Check" though. ![]() What you would need is a smart program—not just pure regex—that handles outer/inner quotes, and lets you select between them all as needed. The logic is effectively the same for all languages, just that the symbols switch. - - - And some, like English or Swedish, will have a ton more false positives to look through. Luckily though, you're using guillemets (French?), which is WAY easier to handle compared to English. In English, you have:
and that 2nd set is the worst, because ’ is used for all sorts of things.
1 + 4 are the actual inner quotes. 2 + 3 are actually apostrophes. If all you're doing is checking Left/Right inner quotes, a dumb algorithm will just think you have:
An algorithm that handles a lot of those edge-cases I mentioned above—plus searching forward/backwards—would catch different errors at different steps. ![]() PLUS it'll minimize the false positives, which is the real time-waster/time-killer. I explained a lot of this in: With the pure regex method, you're wasting so much time looking through thousands of correct quotation marks, only to catch that small minority of actual typos/errors/mistakes. Take it from me... I've probably corrected more quotation mark errors than anyone else on these boards—combined! lol. ![]() Last edited by Tex2002ans; 04-23-2023 at 05:28 PM. |
|||
![]() |
![]() |
![]() |
#12 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,049
Karma: 18821071
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
|
Can you not just search for two opening quotes without a closing quote between them?
|
![]() |
![]() |
![]() |
#13 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
![]() 2 LEFTs in a row: Find: («[^»\r\n]*)«
So, if you want the pure regex method, you'd create a big ol' collection of "Saved Searches", going from the easy-to-catch stuff—like 2 LEFTs or 2 RIGHTs in a row—all the way down to the hardest-to-catch. ![]() Last edited by Tex2002ans; 04-23-2023 at 10:22 PM. |
|
![]() |
![]() |
![]() |
#14 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Nov 2010
Device: none
|
![]()
Thank you very much Tex2002ans for the valuable advice and clarity in expounding it.
As for me, I can only say that I have solved the problem. Thanks again to everyone. |
![]() |
![]() |
![]() |
#15 | |
A Hairy Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,336
Karma: 20171571
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
Edit: This is an answer to Jon's question...not trying to repeat what Tex already answered above...
![]() Quote:
![]() To answer your question, Jon, No. In general, the ? in the capture group forces it to be a minimal match so it would stop capturing at the first legal stopping location. Your example has a closing » so it would not trigger a match. However, there are two points that I would change based on the discussion above (and my chance to test now that I'm home - which I couldn't do on my phone).
Code:
«(.[^»]*?)(?=</p>) You are correct that the «\1» would place guillemets around the entire captured text, but this process was never intended to be completely automatic. There is no way for the Find/Replace to discern what is dialogue and what is not. If you use the "Replacements (Delete Unwanted Replacements)" table (see attached image) you would quickly be able to see which sections have an opening « with no closing » highlighted - with what it would look like After the replace. As KevinH mentioned above, you could quickly remove any lines that didn't match the change you wanted...apply changes...then run the find again with a different replace criteria. ie. the above would not find "<p>«Hello she said. «Hello she said.»</p>" until you ran a different Find with the « in place of the look ahead </p> Like: «(.[^»</p>]*?)« That would probably be the fastest (least # of iterations) way to search through the text to find those types of errors. Last edited by Turtle91; 04-24-2023 at 10:08 PM. |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
find missing classes | kcarscadden | Editor | 4 | 12-10-2019 08:16 PM |
Finding missing Oxford Commas | avantman42 | Writers' Corner | 6 | 07-20-2013 03:29 AM |
Missing Commas & Full Stops | Paxman53 | Sigil | 5 | 01-09-2013 12:53 PM |
find replace - does it auto-fix closing tqags ??? | cybmole | Sigil | 6 | 01-19-2011 02:32 PM |
close "inverted commas" alone on one line | GillianMary | Workshop | 5 | 10-08-2010 01:09 PM |