![]() |
Regex - replace only part of a string - how?
I have often problems with books that were auto-converted by Calibre. Here is one issue:
Text has often wrong line breaks. Example: Code:
<p class="calibre2">This is just a sample</p>Code:
[a-z]</p>Is there a way? DOH! I found it! Search string: (\w+)</p> <p class="calibre2"> Replace with: \1 |
Please let me add one more common problem: wrong hyphen (probably from the PDF)
Search: (\w)-(\w) replace with: \1\2 How can I make it case sensitive? I like to correct 'ele-phant' but not 'John-Bob' ? |
Quote:
([a-z]\w)-(\w) |
Quote:
Seems '([a-z])-([a-z])' with a clicked 'Match Cases' is OK too. |
Use the posix character classes rather, for your example : ([[:lower:]])-([[:lower:]])
List of posix char classes |
[[:lower:]]?
I guess you mean [:lower:] But honestly, I find [a-z] way easier to write. Especially since I have to add some German letters often, like [a-zäöüß] |
Quote:
|
I was used to write [a-z] for lowercase letters too, but since I discovered that unicode properties flag is working in Sigil 0.5 (*UCP), I simply use character classes with it to cover non-ASCII letters.
\w, \W, [:lower:], [:upper:], [:alpha:], [:alnum:] are all affected by (*UCP). |
Quote:
You are (of course) right, I was wrong. I never tried those since I never saw a reason to use them... |
Quote:
[a-zäöüß] will all be captured by [[:lower:]], as well as a load more of edge cases which you might not have thought of. Saving you time, making edits more complete. The punct class is also especially useful - and very often overlooked. The more you know! |
Quote:
|
@WS64: add (*UCP) in front of your pattern like:
Code:
(*UCP)\b[[:lower:]]+ |
| All times are GMT -4. The time now is 07:54 PM. |
Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.