04-03-2013, 02:23 AM | #1 |
Junior Member
Posts: 8
Karma: 10
Join Date: Apr 2013
Device: Kindle Touch
|
Regex or other method to find split quotations ""
I have some epubs that have improper spaces between quotations marks.
example: <p class="calibre1">"He told me that she walked across the burnings sands</p> <p class="calibre1">to find her people"</p> Is there some ... I don't know 'way' to find these simple grammatical errors and correct them `en-mass? By getting rid of the unnecessary paragraph between the quotation? |
04-03-2013, 02:32 AM | #2 |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
Try:
([a-z])</p>\s+<p class="calibre1"> \1 (note the space behind the 1. This does not take into account if a line ends with a comma or something like that. Adjust as needed. |
Advert | |
|
04-03-2013, 02:41 AM | #3 | |
Well trained by Cats
Posts: 29,768
Karma: 54401244
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
|
|
04-03-2013, 02:55 AM | #4 |
Junior Member
Posts: 8
Karma: 10
Join Date: Apr 2013
Device: Kindle Touch
|
|
04-03-2013, 03:01 AM | #5 |
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
I would run the Regexes I mentioned in this topic for cleaning up some broken Calibre paragraphs:
https://www.mobileread.com/forums/sho...89#post2446589 As to the unclosed quote, this regex works with "smart quotes" with no closing quote (forget which user I grabbed this one from, thank you so much this one is very helpful): Search: Code:
(“[^”\r\n]*)</p>\s+<p>
Code:
\1 Search: Code:
("[^"\r\n]*)</p>\s+<p>
If you want these to work for calibre code, change the Red "<p>" into "<p class="calibre[0-9]+">" |
Advert | |
|
04-03-2013, 04:13 AM | #6 |
Connoisseur
Posts: 57
Karma: 10
Join Date: Dec 2011
Device: Samsung Tablet
|
Another another from some thread here I use, again on a case by case basis, never replace all. But, it does fail on some lines if there 3 lines of text to join or a tag like <i> in the second section. I fix these manually first.
Find Code:
(“[^”]*?)</p>\s+<p.+>([^“]*?”) Code:
\1 \2 "bla bla bla," said Tom Code:
,”</p>\s+<p.+"> or sometimes ([\,,\?,…,!])”</p>\s+<p.+"> Code:
,” |
05-13-2013, 04:01 PM | #7 |
Junior Member
Posts: 8
Karma: 10
Join Date: Apr 2013
Device: Kindle Touch
|
I've tried all the regex patterns given so far. I've also tried fiddling with them a bit to see if there was small error for my application.
I've come to a new book which has an annoying issue. It creates a new paragraph between long quoted text, at times. Like this: Code:
Jacob began to speak, "So we ran toward the still moving silhouette of the scarecrow. But we didn't find anything there. When we came back the next day there was nothing." Code wise it looks like this: Code:
<p class="calibre2">At this point, James found himself shaking, "So we ran toward the still moving silhouette of the scarecrow. But we didn't find anything there.</p> <p class="calibre2">When we came back the next day there was nothing." And so on</p> |
05-13-2013, 06:20 PM | #8 |
Groupie
Posts: 171
Karma: 86271
Join Date: Feb 2012
Device: iPad, Kindle Touch, Sony PRS-T1
|
are you sure they're not calibre's smart quotes? it would be easier if they were because then you wouldn't have the conflict with the classes/ids.
you could try this, but it may just end up causing more problems than it's worth. maybe saves you a bit of time? Code:
find: (?s)(<p[^>]*>)([^"]+".+)</p>\s+?<p[^>]*>([^"]+") repl: \1\2 <----- trailing white space |
05-13-2013, 09:13 PM | #9 | |
Junior Member
Posts: 8
Karma: 10
Join Date: Apr 2013
Device: Kindle Touch
|
Quote:
From the original code you provided I was getting entire sections highlighted, but if I take out the . it works in a passive aggressive way. Good enough for me to find the sections and hand change them. Code:
(?s)(<p[^>]*>)([^"]+"+)</p>\s+?<p[^>]*>([^"]+") |
|
05-14-2013, 02:52 PM | #10 |
Groupie
Posts: 171
Karma: 86271
Join Date: Feb 2012
Device: iPad, Kindle Touch, Sony PRS-T1
|
great, glad it's working for you
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[Old Thread] Regex "FN LN" to "LN, FN" & reverse? | unboggling | Library Management | 19 | 11-20-2013 06:44 AM |
Kindle Touch "USB Drive" Recovery method. | geekmaster | Kindle Developer's Corner | 10 | 11-27-2012 07:51 AM |
Split long words using the "¬" character (small screens) | DSpider | Workshop | 5 | 03-16-2012 07:09 AM |
George R. R. Martin's "A Dance With Dragons" to be split into separate books. | Exer | General Discussions | 4 | 04-02-2011 08:50 AM |
Any way to revert the "Do No Split On Page Breaks" option? | dsana123 | Calibre | 2 | 07-10-2010 02:37 PM |