Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 10-26-2012, 12:18 PM   #1
dicknskip
Zealot
dicknskip began at the beginning.
 
Posts: 134
Karma: 10
Join Date: Nov 2009
Location: Okotoks, AB, Canada
Device: iPad V-3
Broken dialog regex

I have several cases where inadvertant paragraph breaks are found in quoted text. I have a regex that will search for all sets of typographic (curly) quotes, but it finds all of them in the book. I can't think of or find, and I have searched, a method to locate cases of matched quote sets that also include a </p> or a <p> (either one should do) in the quoted text. Being able to do that would sure shorten the find and fix process.
dicknskip is offline   Reply With Quote
Old 10-26-2012, 05:49 PM   #2
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
I would say post a lot of samples of what you want to fix, and the regex you currently are using.

I would probably use something along the lines of this, but unless we know specifics, the perfect regex cannot be built:

Code:
([“][^”<]+)</p> <p>([^”]+)
Replace with:

Code:
\1 \2
It might be a little more complex if you want to handle "dumb quotes", since it is tough to tell which is left or right, although I believe you would be going through these manually one by one, and not pulling a "Replace All".
Tex2002ans is offline   Reply With Quote
Advert
Old 10-26-2012, 06:19 PM   #3
dicknskip
Zealot
dicknskip began at the beginning.
 
Posts: 134
Karma: 10
Join Date: Nov 2009
Location: Okotoks, AB, Canada
Device: iPad V-3
The type of thing I have is <p>Some words, “some more words.</p> <p>Some more words.”</p> There are line ends involved as well. Everything between the quotes is one statement. I am using “(.*)” Yes, I do a single find search with dotall and minimal turned on and fix by hand. I'll try the code you gave above.
dicknskip is offline   Reply With Quote
Old 10-26-2012, 07:01 PM   #4
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 655
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
Will this do?
Search:
Code:
(“[^”]*?)</p>\s+<p>([^“]*?”)
Replace:
Code:
\1 \2
Perkin is offline   Reply With Quote
Old 10-26-2012, 07:18 PM   #5
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,465
Karma: 192992430
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Even though you'll probably not run across it all that often, keep in mind that dialogue that spans multiple paragraphs (same speaker) might not have matching open/close quotes. And that's intentional.

<p>"Blah, blah blah.</p>
<p>"Blah, blah, blahblahblah.</p>
<p>"Blahblah, blah, blah, blah, blahblahblah."</p>

Would be perfectly valid, typographically speaking, yet will likely cause any "hands-free" Find & Replace regex/script/routine (that's predicated on one-to-one matched open/close quotes) to barf pretty hard.

Last edited by DiapDealer; 10-26-2012 at 07:20 PM.
DiapDealer is offline   Reply With Quote
Advert
Old 10-27-2012, 03:21 AM   #6
Serpentine
Evangelist
Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.
 
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
A good idea is to first make a regex that matches paragraphs without closing quotes, you usually don't find too many. Check these paragraphs to make sure that they are correct, then use a special character that is not used in the book to make a pseudo-closing mark. Else add in the missing quotation mark. e.g.:
Code:
<p>"Blarg something happened.</p>
Becomes:
Code:
<p>"Blarg something happened.~</p>
Edit your regex to now close with both the quotation and your new mark, you shouldnt hit anything this time round, meaning that there are always closing pairs.

Now make your regex for replacing the quotes. The trick is to preserve your pseudo-closing.
Code:
<p>Blarg something happened.~</p>
Finally remove the closing+following pseudo-closing.
Code:
<p>“Blarg something happened.</p>
Serpentine is offline   Reply With Quote
Old 10-27-2012, 09:49 AM   #7
dicknskip
Zealot
dicknskip began at the beginning.
 
Posts: 134
Karma: 10
Join Date: Nov 2009
Location: Okotoks, AB, Canada
Device: iPad V-3
I do understand the multi-paragraph dialog/quotation structure, that's why I only use a find and fix manually approach to this problem. Also, since the new release of Sigil added the very easy special character insert and stored clips/searches I ensure all books I edit use typographical quotes. I'll try these ideas and see how it works. I'll have to create test cases as I don't have a current text with the problem. Thanks all.
dicknskip is offline   Reply With Quote
Old 10-27-2012, 10:44 AM   #8
dicknskip
Zealot
dicknskip began at the beginning.
 
Posts: 134
Karma: 10
Join Date: Nov 2009
Location: Okotoks, AB, Canada
Device: iPad V-3
SOLVED

I built a test file and tried the suggestions. The one from Perkin worked like a charm. I think I'm on my way. Thanks to all for the advice and to Perkin for the working code.
dicknskip is offline   Reply With Quote
Old 11-09-2012, 01:52 PM   #9
Steadyhands
Connoisseur
Steadyhands began at the beginning.
 
Steadyhands's Avatar
 
Posts: 57
Karma: 10
Join Date: Dec 2011
Device: Samsung Tablet
I'm have the trouble with a regex I've been using. It's a slightly modified version of one I found in this thread.

Find
Code:
(“[^”]*?)</p>\s+<p.+>([^“]*?”)
Replace
Code:
\1 \2

Sample
Code:
<p class="calibre1">“The cat</p>

  <p class="calibre1">sat on</p>

  <p class="calibre1">the mat”</p>
Gives me

Code:
<p class="calibre1">“The cat</p>

  <p class="calibre1">the mat”</p>
ie the middle line gets deleted. Why? I thought it might have been the clean source but I turned that off and restarted with the same results.
Steadyhands is offline   Reply With Quote
Old 11-09-2012, 02:22 PM   #10
dicknskip
Zealot
dicknskip began at the beginning.
 
Posts: 134
Karma: 10
Join Date: Nov 2009
Location: Okotoks, AB, Canada
Device: iPad V-3
I do not use this as a search and replace. I only use it as a search and then hand fix the located parts. The replace part is just too variable to work well. For example, a long speech may contain multiple paragraphs, and that is correct. It will be caught by the search as the proper grammer is to have an opening quote for each paragraph in the speech and a closing one only on the last paragraph. Use it to search and then hand fix the areas.
dicknskip is offline   Reply With Quote
Old 11-09-2012, 03:08 PM   #11
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,582
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by Steadyhands View Post
ie the middle line gets deleted. Why?
<p.+> matches any paragraph (e.g. <p class="calibre1">sat on</p>) and since it doesn't contain any brackets for back-references whatever it matches gets deleted.
Doitsu is offline   Reply With Quote
Old 11-09-2012, 04:01 PM   #12
Steadyhands
Connoisseur
Steadyhands began at the beginning.
 
Steadyhands's Avatar
 
Posts: 57
Karma: 10
Join Date: Dec 2011
Device: Samsung Tablet
Quote:
Originally Posted by dicknskip View Post
I do not use this as a search and replace. I only use it as a search and then hand fix the located parts. The replace part is just too variable to work well. For example, a long speech may contain multiple paragraphs, and that is correct. It will be caught by the search as the proper grammer is to have an opening quote for each paragraph in the speech and a closing one only on the last paragraph. Use it to search and then hand fix the areas.
I don't auto replace, I make the decision each time. I agree about the paragraphs that span many paragraphs. I flip between Find, Replace/Find and manual edits. It's good at finding orphan words that are a single paragraph.
Quote:
Originally Posted by Doitsu View Post
<p.+> matches any paragraph (e.g. <p class="calibre1">sat on</p>) and since it doesn't contain any brackets for back-references whatever it matches gets deleted.
Thanks
Steadyhands is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
The book dialog game The Terminator Lounge 1357 03-11-2015 01:27 PM
Manage Authors Dialog? mdibella Library Management 7 02-23-2011 02:47 AM
Can plugins have a custom UI dialog? kiwidude Plugins 9 01-03-2011 07:15 PM
New preferences dialog kovidgoyal Calibre 22 09-07-2010 01:04 PM
A dialog with Borders Taylor514ce Sony Reader 45 06-19-2008 11:04 PM


All times are GMT -4. The time now is 03:02 AM.


MobileRead.com is a privately owned, operated and funded community.