View Single Post
Old 05-20-2009, 09:15 AM   #3
pepak
Guru
pepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura about
 
Posts: 608
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
Description: Replace apostrophes with curly quotes where appropriate.
Example-before: <p>'Wha' d'you mean,' he said, 'what's this regexp thing?'</p>
Example-after: <p>&lsquo;Wha' d'you mean,&rsquo he said, &lsquo;what's this regexp thing?&rsquo;</p>
Requirements: This regular expression expects HTML and proper punctuation. The HTML requirement can be removed by a proper rewrite, so the regexp can work as well, but proper punctuation is required - it is used to determine what's an apostrophe and what is a quote disguised as apostrophe.
Faults:
- With improper punctuation, anything can happen.
- Even with proper punctuation, some false-positives can occur; specifically, if a word starting with an apostrophe (e.g. 'tis) precedes actual apostrophe-quotes, the quotes are started at that word.
- Will fail if apostrophe immediately follows a non-paragraph tag. The regexp could be modified to work even then, but it would be a lot more difficult to read.
Regexp-find:
Code:
([>_])’(.*?[^a-z_])’([<_])
(note: use a space instead of underscore. If your editor supports that syntax, you can use \s [any blank character] instead of _)
Regexp-find-translation:
- space or end-of-tag. For plain text, you can use (^|_) (start-of-line or space), but then you will need to modify the replacement string
- apostrophe
- any character string, un-greedy (take as few as possible while maintaining match)
- any character except letters and space
- apostrophe
- space or begin-of-tag. For plaintext, you can use ($|_) (end-of-line or space)
Regexp-replace:
Code:
$1&lsquo;$2&rsquo;$3
Regexp-replace-translation:
- first parenthesis (character just preceding the quote)
- opening quote
- second parenthesis (content of the quote)
- closing quote
- third parenthesis (character following the quote)
Regexp-modifiers: case-insensitive, single-line, un-greedy
Regexp-syntax: FAR Manager's "Regular Expression Search and Replace" plugin. PHP's ereg/eregi needs to use \\1 instead of $1 in replacement string. PHP's preg needs "header" and "footer" ("~" regexp search "~igU")
pepak is offline   Reply With Quote