![]() |
#1 |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30
Karma: 1000
Join Date: Nov 2012
Device: none
|
Need regex help, please
I have an ePub file in which single smart quotes are used to open and close every quotation. I would like to use Sigil to change all the single smart quotes to double smart quotes.
Changing the open quotes is a simple normal search and replace, but a problem arises in changing the end quotes. I'm sure there must be a regex expression that would fix things but the problem is that every apostrophe seemingly would also be affected. For example, in code view: ‘I need help,’ O’Malley answered.</span></p> Any expression that would find the single quote marks throughout the file would also find the apostrophe in all the words like O’Malley. And, of course, checking each find/replace to see if it's a quote or an apostrophe would take forever. I wrote this regex [^A-Za-z]’[^A-Za-z] but when I do a search, the expression finds the smart single end quote but it also captures the end-of-sentence punctuation mark and the opening of the span tag, thus: .’< So when I use the smart double end quote in the replacement field, the end quote is correctly replaced but the period (or any other punctuation) and the open tag < are deleted. I'm barely literate in regex, so I hope this makes sense. Any help in writing a regex that accomplishes what I'd like to do would be greatly appreciated. Thanks! |
![]() |
![]() |
![]() |
#2 |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,899
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Anything in the search is captured is normally deleted and simply needs to be replaced as part of the replace term
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,621
Karma: 29710338
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
I would have assumed Sigil would have had a button/tool to do that, but it appears not.
FWIW 1 - The calibre book editor has a button (Smart Punctuation) to do it. And Diap Dealer is developing an Even Smarter Punctuation plugin for the calibre book editor. You do not have to use the calibre library manager to use the book editor, it can be used stand alone. FWIW 2 - I use both the Sigil and the Calibre editors. BR |
![]() |
![]() |
![]() |
#4 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,357
Karma: 203720150
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
But if they're already single smart quotes, neither calibre's smartening routines nor my editor plugin will help the OP convert single smart quotes to double smart quotes. Neither will regex quite frankly--not a one-size fits all Replace All solution, anyway. It would be multiple passes and stepping through stuff one-by-one to make sure everything went right. An algorithm of some kind would be better suited for that kind of wholesale conversion (and even that would probably never be 100%).
If I was forced to do this using regex only, I'd try to change all apostrophes to some weird string with something like (\pL)’(\pL) replaced with \1~apos~\2 (I'd still need to look for plural possessive apostrophes, and words like ’tis and such). Then once I was satisfied that I'd protected all apostrophes by mangling them into a unique string, it should be relatively simple to replace the opening and closing single smart-quotes with their double smart-quote counterparts. With that done, I could go back and unmangle my apostrophes: replacing ~apos~ with ’ (or an entity). Mostly though, I probably wouldn't bother. ![]() |
![]() |
![]() |
![]() |
#5 |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,621
Karma: 29710338
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,357
Karma: 203720150
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
|
![]() |
![]() |
![]() |
#7 |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30
Karma: 1000
Join Date: Nov 2012
Device: none
|
Thanks for all the help, everyone. I think I'm leaning toward the path of least resistance, the "don't bother" solution, though I'm tempted to try DiapDealer's suggestion of protecting all the apostrophes by changing them to some weird string. Hmm, another way to do it might be normal searches that replace close quotes with all punctuation permutations:
.' with ." !' with !" ,' with ," and so on. |
![]() |
![]() |
![]() |
#8 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,357
Karma: 203720150
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Definitely more than one way to tackle it. But I would think that eliminating ’ used as an apostrophe might be less tedious if done first. Once the ’ preceded and followed by a letter were eliminated, I would think a search for single closing smart quotes that weren't followed by punctuation would cover the bulk of the special-case plural possessives, and ’Tis, and argot-like ’em and ’im (for them and him). There's always going to be the possibility that <span> tags might interfere with the detection of ’ followed by punctuation, but that's always going to be the case no matter how you tackle it. I would think, though, that getting a document into state where the ‘ and ’ represented only opening and closing dialog quotations wouldn't be an impossibly daunting task--if I was motivated enough to want to do it.
![]() |
![]() |
![]() |
![]() |
#9 |
frumious Bandersnatch
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,543
Karma: 19001583
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
I have been known for going through all quote marks and apostrophes in a 1000-page book, one by one, replacing them with the appropriate curly variant, and distinguishing between right single quote and apostrophe. With some preparatory regexp and a couple of single-key macros, it's not too hard
![]() |
![]() |
![]() |
![]() |
#10 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 13,297
Karma: 78876004
Join Date: Nov 2007
Location: Toronto
Device: Libra H2O, Libra Colour
|
Quote:
|
|
![]() |
![]() |
![]() |
#11 |
frumious Bandersnatch
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,543
Karma: 19001583
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
If it's a single quote followed by a letter, not at all, it may be an apostrophe. In other cases, it depends on how much you trust the source, I have often found quotes at the wrong side of a space...
|
![]() |
![]() |
![]() |
#12 |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30
Karma: 1000
Join Date: Nov 2012
Device: none
|
Well, I made the smart quote fixes using normal searches that replaced close quotes with all punctuation permutations:
.' with ." !' with !" ,' with ," and so on. I also searched for punctuation like ellipses and hyphens and em dashes that were followed by a single smart close quote (with and without preceding and/or following spaces), replacing them with a double smart end quote. It really didn't take very long to run all the different searches and I'm quite pleased with the result. My thanks to all who offered their help! |
![]() |
![]() |
![]() |
#13 |
Not who you think I am...
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 374
Karma: 30283
Join Date: Jan 2010
Location: Honolulu
Device: PocketBook 360 -- Ivory
|
Search:
Code:
(\s)‘(.*?[^a-z\s])’(\s) Code:
$1“$2”$3 Search: Code:
(\s)‘(.*?[^a-zA-Z\s])’(\s) Test carefully. Aloha. |
![]() |
![]() |
![]() |
#14 |
frumious Bandersnatch
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,543
Karma: 19001583
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
In any case you cannot have a single infallible regex. You need to understand the words in order to know which "’" is the right closing quote in things like this:
They played ‘Stompin’ at the Savoy’ at Vinicius’ yesterday. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Regex? | weberr | Editor | 3 | 05-12-2014 09:06 PM |
Regex help please | bremler | Workshop | 10 | 04-24-2014 09:46 PM |
Regex help anyone? | seanos | Editor | 17 | 04-02-2014 11:03 AM |
Need help with a regex | mobiuser | Workshop | 15 | 01-19-2014 05:57 PM |
What a regex is | Worldwalker | Calibre | 20 | 05-10-2010 05:51 AM |