02-20-2012, 08:34 AM | #1 |
Member
Posts: 17
Karma: 10
Join Date: Dec 2011
Device: Sony PRS-T1
|
Regex - replace only part of a string - how?
I have often problems with books that were auto-converted by Calibre. Here is one issue:
Text has often wrong line breaks. Example: Code:
<p class="calibre2">This is just a sample</p> <p class="calibre2">text with no meaning.</p> Code:
[a-z]</p> <p class="calibre2"> Is there a way? DOH! I found it! Search string: (\w+)</p> <p class="calibre2"> Replace with: \1 Last edited by flameproof; 02-20-2012 at 08:45 AM. |
02-20-2012, 09:21 AM | #2 |
Member
Posts: 17
Karma: 10
Join Date: Dec 2011
Device: Sony PRS-T1
|
Please let me add one more common problem: wrong hyphen (probably from the PDF)
Search: (\w)-(\w) replace with: \1\2 How can I make it case sensitive? I like to correct 'ele-phant' but not 'John-Bob' ? |
Advert | |
|
02-20-2012, 09:29 AM | #3 |
Grand Sorcerer
Posts: 12,166
Karma: 73448616
Join Date: Nov 2007
Location: Toronto
Device: Nexus 7, Clara, Touch, Tolino EPOS
|
|
02-20-2012, 09:37 AM | #4 |
Member
Posts: 17
Karma: 10
Join Date: Dec 2011
Device: Sony PRS-T1
|
|
02-21-2012, 07:24 PM | #5 |
Evangelist
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
|
Use the posix character classes rather, for your example : ([[:lower:]])-([[:lower:]])
List of posix char classes |
Advert | |
|
02-22-2012, 11:11 AM | #6 |
♫
Posts: 660
Karma: 506380
Join Date: Aug 2010
Location: Germany
Device: Kobo Aura / PB Lux 2 / Bookeen Frontlight / Kobo Mini / Nook Color
|
[[:lower:]]?
I guess you mean [:lower:] But honestly, I find [a-z] way easier to write. Especially since I have to add some German letters often, like [a-zäöüß] |
02-22-2012, 11:37 AM | #7 | |
Grand Sorcerer
Posts: 27,548
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
|
|
02-22-2012, 11:50 AM | #8 |
Connoisseur
Posts: 54
Karma: 37363
Join Date: Aug 2011
Location: Istanbul
Device: EBW1150, Nook STR
|
I was used to write [a-z] for lowercase letters too, but since I discovered that unicode properties flag is working in Sigil 0.5 (*UCP), I simply use character classes with it to cover non-ASCII letters.
\w, \W, [:lower:], [:upper:], [:alpha:], [:alnum:] are all affected by (*UCP). |
02-22-2012, 01:36 PM | #9 |
♫
Posts: 660
Karma: 506380
Join Date: Aug 2010
Location: Germany
Device: Kobo Aura / PB Lux 2 / Bookeen Frontlight / Kobo Mini / Nook Color
|
|
02-22-2012, 07:50 PM | #10 |
Evangelist
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
|
They're great, since they cover your unicode characters too, for example you dont have to add ä,ß, etc - they are already understood to be lowercase.
[a-zäöüß] will all be captured by [[:lower:]], as well as a load more of edge cases which you might not have thought of. Saving you time, making edits more complete. The punct class is also especially useful - and very often overlooked. The more you know! |
02-23-2012, 02:51 AM | #11 |
♫
Posts: 660
Karma: 506380
Join Date: Aug 2010
Location: Germany
Device: Kobo Aura / PB Lux 2 / Bookeen Frontlight / Kobo Mini / Nook Color
|
|
02-23-2012, 04:43 AM | #12 |
Connoisseur
Posts: 54
Karma: 37363
Join Date: Aug 2011
Location: Istanbul
Device: EBW1150, Nook STR
|
@WS64: add (*UCP) in front of your pattern like:
Code:
(*UCP)\b[[:lower:]]+ Last edited by Timur; 02-23-2012 at 04:43 AM. Reason: typo |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
regex search/replace | Sharlene | Sigil | 10 | 01-28-2012 04:14 AM |
regex replace??? | schuster | Conversion | 14 | 01-29-2011 09:02 AM |
need regex help search and replace | schuster | Calibre | 4 | 01-10-2011 09:00 AM |
My RegEx isn't doing what I hoped to remove page numbers and a fixed string | winterminute | Calibre | 6 | 12-19-2010 10:55 PM |
Find and replace string with wildcard | jhempel24 | Sigil | 15 | 11-12-2010 01:50 PM |