10-26-2012, 12:18 PM | #1 |
Zealot
Posts: 134
Karma: 10
Join Date: Nov 2009
Location: Okotoks, AB, Canada
Device: iPad V-3
|
Broken dialog regex
I have several cases where inadvertant paragraph breaks are found in quoted text. I have a regex that will search for all sets of typographic (curly) quotes, but it finds all of them in the book. I can't think of or find, and I have searched, a method to locate cases of matched quote sets that also include a </p> or a <p> (either one should do) in the quoted text. Being able to do that would sure shorten the find and fix process.
|
10-26-2012, 05:49 PM | #2 |
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
I would say post a lot of samples of what you want to fix, and the regex you currently are using.
I would probably use something along the lines of this, but unless we know specifics, the perfect regex cannot be built: Code:
([“][^”<]+)</p> <p>([^”]+) Code:
\1 \2 |
Advert | |
|
10-26-2012, 06:19 PM | #3 |
Zealot
Posts: 134
Karma: 10
Join Date: Nov 2009
Location: Okotoks, AB, Canada
Device: iPad V-3
|
The type of thing I have is <p>Some words, “some more words.</p> <p>Some more words.”</p> There are line ends involved as well. Everything between the quotes is one statement. I am using “(.*)” Yes, I do a single find search with dotall and minimal turned on and fix by hand. I'll try the code you gave above.
|
10-26-2012, 07:01 PM | #4 |
Guru
Posts: 655
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
|
Will this do?
Search: Code:
(“[^”]*?)</p>\s+<p>([^“]*?”) Code:
\1 \2 |
10-26-2012, 07:18 PM | #5 |
Grand Sorcerer
Posts: 27,465
Karma: 192992430
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Even though you'll probably not run across it all that often, keep in mind that dialogue that spans multiple paragraphs (same speaker) might not have matching open/close quotes. And that's intentional.
<p>"Blah, blah blah.</p> <p>"Blah, blah, blahblahblah.</p> <p>"Blahblah, blah, blah, blah, blahblahblah."</p> Would be perfectly valid, typographically speaking, yet will likely cause any "hands-free" Find & Replace regex/script/routine (that's predicated on one-to-one matched open/close quotes) to barf pretty hard. Last edited by DiapDealer; 10-26-2012 at 07:20 PM. |
Advert | |
|
10-27-2012, 03:21 AM | #6 |
Evangelist
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
|
A good idea is to first make a regex that matches paragraphs without closing quotes, you usually don't find too many. Check these paragraphs to make sure that they are correct, then use a special character that is not used in the book to make a pseudo-closing mark. Else add in the missing quotation mark. e.g.:
Code:
<p>"Blarg something happened.</p> Code:
<p>"Blarg something happened.~</p>
Now make your regex for replacing the quotes. The trick is to preserve your pseudo-closing. Code:
<p>“Blarg something happened.”~</p> Code:
<p>“Blarg something happened.</p> |
10-27-2012, 09:49 AM | #7 |
Zealot
Posts: 134
Karma: 10
Join Date: Nov 2009
Location: Okotoks, AB, Canada
Device: iPad V-3
|
I do understand the multi-paragraph dialog/quotation structure, that's why I only use a find and fix manually approach to this problem. Also, since the new release of Sigil added the very easy special character insert and stored clips/searches I ensure all books I edit use typographical quotes. I'll try these ideas and see how it works. I'll have to create test cases as I don't have a current text with the problem. Thanks all.
|
10-27-2012, 10:44 AM | #8 |
Zealot
Posts: 134
Karma: 10
Join Date: Nov 2009
Location: Okotoks, AB, Canada
Device: iPad V-3
|
SOLVED
I built a test file and tried the suggestions. The one from Perkin worked like a charm. I think I'm on my way. Thanks to all for the advice and to Perkin for the working code.
|
11-09-2012, 01:52 PM | #9 |
Connoisseur
Posts: 57
Karma: 10
Join Date: Dec 2011
Device: Samsung Tablet
|
I'm have the trouble with a regex I've been using. It's a slightly modified version of one I found in this thread.
Find Code:
(“[^”]*?)</p>\s+<p.+>([^“]*?”) Code:
\1 \2 Sample Code:
<p class="calibre1">“The cat</p> <p class="calibre1">sat on</p> <p class="calibre1">the mat”</p> Code:
<p class="calibre1">“The cat</p> <p class="calibre1">the mat”</p> |
11-09-2012, 02:22 PM | #10 |
Zealot
Posts: 134
Karma: 10
Join Date: Nov 2009
Location: Okotoks, AB, Canada
Device: iPad V-3
|
I do not use this as a search and replace. I only use it as a search and then hand fix the located parts. The replace part is just too variable to work well. For example, a long speech may contain multiple paragraphs, and that is correct. It will be caught by the search as the proper grammer is to have an opening quote for each paragraph in the speech and a closing one only on the last paragraph. Use it to search and then hand fix the areas.
|
11-09-2012, 03:08 PM | #11 |
Grand Sorcerer
Posts: 5,582
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
|
11-09-2012, 04:01 PM | #12 | |
Connoisseur
Posts: 57
Karma: 10
Join Date: Dec 2011
Device: Samsung Tablet
|
Quote:
Thanks |
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
The book dialog game | The Terminator | Lounge | 1357 | 03-11-2015 01:27 PM |
Manage Authors Dialog? | mdibella | Library Management | 7 | 02-23-2011 02:47 AM |
Can plugins have a custom UI dialog? | kiwidude | Plugins | 9 | 01-03-2011 07:15 PM |
New preferences dialog | kovidgoyal | Calibre | 22 | 09-07-2010 01:04 PM |
A dialog with Borders | Taylor514ce | Sony Reader | 45 | 06-19-2008 11:04 PM |