Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 04-03-2013, 02:23 AM   #1
CyanBC
Junior Member
CyanBC began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Apr 2013
Device: Kindle Touch
Regex or other method to find split quotations ""

I have some epubs that have improper spaces between quotations marks.

example:

<p class="calibre1">"He told me that she walked across the burnings sands</p>
<p class="calibre1">to find her people"</p>

Is there some ... I don't know 'way' to find these simple grammatical errors and correct them `en-mass? By getting rid of the unnecessary paragraph between the quotation?
CyanBC is offline   Reply With Quote
Old 04-03-2013, 02:32 AM   #2
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
Try:

([a-z])</p>\s+<p class="calibre1">

\1 (note the space behind the 1.

This does not take into account if a line ends with a comma or something like that. Adjust as needed.
Toxaris is offline   Reply With Quote
Old 04-03-2013, 02:41 AM   #3
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,241
Karma: 61360164
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by CyanBC View Post
I have some epubs that have improper spaces between quotations marks.

example:

<p class="calibre1">"He told me that she walked across the burnings sands</p>
<p class="calibre1">to find her people"</p>

Is there some ... I don't know 'way' to find these simple grammatical errors and correct them `en-mass? By getting rid of the unnecessary paragraph between the quotation?
Us the sample saved search: 'Join Paragraphs' which includes commas, but not - or ...
theducks is online now   Reply With Quote
Old 04-03-2013, 02:55 AM   #4
CyanBC
Junior Member
CyanBC began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Apr 2013
Device: Kindle Touch
Quote:
Originally Posted by theducks View Post
Us the sample saved search: 'Join Paragraphs' which includes commas, but not - or ...
OMGTHANKYOUThatssoawesomeIthinkIcriedalittle.
CyanBC is offline   Reply With Quote
Old 04-03-2013, 03:01 AM   #5
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
I would run the Regexes I mentioned in this topic for cleaning up some broken Calibre paragraphs:

https://www.mobileread.com/forums/sho...89#post2446589

As to the unclosed quote, this regex works with "smart quotes" with no closing quote (forget which user I grabbed this one from, thank you so much this one is very helpful):

Search:

Code:
(“[^”\r\n]*)</p>\s+<p>
Replace (notice the space after "\1 ")

Code:
\1
If you wanted to dumb this down for dumb quotes (I did not test this, it should work, but it might be VERY ugly):

Search:

Code:
("[^"\r\n]*)</p>\s+<p>
Make sure to use all of these ONE BY ONE, and never do a "Replace All".

If you want these to work for calibre code, change the Red "<p>" into "<p class="calibre[0-9]+">"
Tex2002ans is offline   Reply With Quote
Old 04-03-2013, 04:13 AM   #6
Steadyhands
Connoisseur
Steadyhands began at the beginning.
 
Steadyhands's Avatar
 
Posts: 57
Karma: 10
Join Date: Dec 2011
Device: Samsung Tablet
Another another from some thread here I use, again on a case by case basis, never replace all. But, it does fail on some lines if there 3 lines of text to join or a tag like <i> in the second section. I fix these manually first.
Find
Code:
(“[^”]*?)</p>\s+<p.+>([^“]*?”)
Replace
Code:
\1 \2
Another I use for split speech is
"bla bla bla,"
said Tom

Code:
,”</p>\s+<p.+">
or sometimes ([\,,\?,…,!])”</p>\s+<p.+">
Replace (space after " )
Code:
,”
Steadyhands is offline   Reply With Quote
Old 05-13-2013, 04:01 PM   #7
CyanBC
Junior Member
CyanBC began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Apr 2013
Device: Kindle Touch
I've tried all the regex patterns given so far. I've also tried fiddling with them a bit to see if there was small error for my application.

I've come to a new book which has an annoying issue. It creates a new paragraph between long quoted text, at times. Like this:

Code:
Jacob began to speak, "So we ran toward the still moving silhouette of the scarecrow. But we didn't find anything there.

When we came back the next day there was nothing."
I've been trying to build a regex code that looks for these errors by highlighting any quote that has stuff after it but no end quote.
Code wise it looks like this:

Code:
<p class="calibre2">At this point, James found himself shaking, "So we ran toward the still moving silhouette of the scarecrow. But we didn't find anything there.</p>

    <p class="calibre2">When we came back the next day there was nothing." And so on</p>
All my attempts generally fail to find anything. Or begin to highlighting the coding sections "calibre2"
CyanBC is offline   Reply With Quote
Old 05-13-2013, 06:20 PM   #8
mzmm
Groupie
mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.
 
mzmm's Avatar
 
Posts: 171
Karma: 86271
Join Date: Feb 2012
Device: iPad, Kindle Touch, Sony PRS-T1
are you sure they're not calibre's smart quotes? it would be easier if they were because then you wouldn't have the conflict with the classes/ids.

you could try this, but it may just end up causing more problems than it's worth. maybe saves you a bit of time?

Code:
find:
(?s)(<p[^>]*>)([^"]+".+)</p>\s+?<p[^>]*>([^"]+")

repl:
\1\2 <----- trailing white space
mzmm is offline   Reply With Quote
Old 05-13-2013, 09:13 PM   #9
CyanBC
Junior Member
CyanBC began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Apr 2013
Device: Kindle Touch
Quote:
Originally Posted by mzmm View Post
are you sure they're not calibre's smart quotes? it would be easier if they were because then you wouldn't have the conflict with the classes/ids.

you could try this, but it may just end up causing more problems than it's worth. maybe saves you a bit of time?

Code:
find:
(?s)(<p[^>]*>)([^"]+".+)</p>\s+?<p[^>]*>([^"]+")

repl:
\1\2 <----- trailing white space
Awesome thanks
From the original code you provided I was getting entire sections highlighted, but if I take out the . it works in a passive aggressive way. Good enough for me to find the sections and hand change them.

Code:
(?s)(<p[^>]*>)([^"]+"+)</p>\s+?<p[^>]*>([^"]+")
This highlights the second sentence of an unfinished paragraph but not the first. I'm not sure how to fix it so it gets the whole sections but it does find the errors so that I can fix them. Thank you so much!
CyanBC is offline   Reply With Quote
Old 05-14-2013, 02:52 PM   #10
mzmm
Groupie
mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.
 
mzmm's Avatar
 
Posts: 171
Karma: 86271
Join Date: Feb 2012
Device: iPad, Kindle Touch, Sony PRS-T1
great, glad it's working for you
mzmm is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[Old Thread] Regex "FN LN" to "LN, FN" & reverse? unboggling Library Management 19 11-20-2013 06:44 AM
Kindle Touch "USB Drive" Recovery method. geekmaster Kindle Developer's Corner 10 11-27-2012 07:51 AM
Split long words using the "¬" character (small screens) DSpider Workshop 5 03-16-2012 07:09 AM
George R. R. Martin's "A Dance With Dragons" to be split into separate books. Exer General Discussions 4 04-02-2011 08:50 AM
Any way to revert the "Do No Split On Page Breaks" option? dsana123 Calibre 2 07-10-2010 02:37 PM


All times are GMT -4. The time now is 12:23 PM.


MobileRead.com is a privately owned, operated and funded community.