Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 02-20-2012, 09:34 AM   #1
flameproof
Member
flameproof began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Dec 2011
Device: Sony PRS-T1
Regex - replace only part of a string - how?

I have often problems with books that were auto-converted by Calibre. Here is one issue:

Text has often wrong line breaks.

Example:

Code:
  <p class="calibre2">This is just a sample</p>

  <p class="calibre2">text with no meaning.</p>
I can find it with the string:

Code:
 [a-z]</p>

  <p class="calibre2">
But when I replace it then (of course) the last letter is missing. Without the [a-z] I would catch normal end of sentence line breaks.

Is there a way?


DOH! I found it!


Search string: (\w+)</p>

<p class="calibre2">

Replace with: \1

Last edited by flameproof; 02-20-2012 at 09:45 AM.
flameproof is offline   Reply With Quote
Old 02-20-2012, 10:21 AM   #2
flameproof
Member
flameproof began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Dec 2011
Device: Sony PRS-T1
Please let me add one more common problem: wrong hyphen (probably from the PDF)

Search: (\w)-(\w)
replace with: \1\2

How can I make it case sensitive? I like to correct 'ele-phant' but not 'John-Bob' ?
flameproof is offline   Reply With Quote
Old 02-20-2012, 10:29 AM   #3
PeterT
Taking a break; Fed up
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
PeterT's Avatar
 
Posts: 7,197
Karma: 45329895
Join Date: Nov 2007
Location: Toronto
Device: Wife: Touch, Arc, Vox Me: Nexus 7, Glo
Quote:
Originally Posted by flameproof View Post
Please let me add one more common problem: wrong hyphen (probably from the PDF)

Search: (\w)-(\w)
replace with: \1\2

How can I make it case sensitive? I like to correct 'ele-phant' but not 'John-Bob' ?
I think a brute force approach would be
([a-z]\w)-(\w)
PeterT is offline   Reply With Quote
Old 02-20-2012, 10:37 AM   #4
flameproof
Member
flameproof began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Dec 2011
Device: Sony PRS-T1
Quote:
Originally Posted by PeterT View Post
I think a brute force approach would be
([a-z]\w)-(\w)
Thanks.

Seems '([a-z])-([a-z])' with a clicked 'Match Cases' is OK too.
flameproof is offline   Reply With Quote
Old 02-21-2012, 08:24 PM   #5
Serpentine
Evangelist
Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.
 
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
Use the posix character classes rather, for your example : ([[:lower:]])-([[:lower:]])

List of posix char classes
Serpentine is offline   Reply With Quote
Old 02-22-2012, 12:11 PM   #6
WS64
WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.
 
WS64's Avatar
 
Posts: 587
Karma: 506380
Join Date: Aug 2010
Location: Germany
Device: Kobo Aura / Bookeen Frontlight / Kobo Mini / Kindle 3 / Nook Color
[[:lower:]]?

I guess you mean [:lower:]
But honestly, I find [a-z] way easier to write. Especially since I have to add some German letters often, like [a-zń÷Ř▀]
WS64 is offline   Reply With Quote
Old 02-22-2012, 12:37 PM   #7
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 9,424
Karma: 43260000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
I guess you mean [:lower:]
Nope. [[:lower:]] is the correct usage.
DiapDealer is online now   Reply With Quote
Old 02-22-2012, 12:50 PM   #8
Timur
Connoisseur
Timur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five words
 
Posts: 54
Karma: 37363
Join Date: Aug 2011
Location: Istanbul
Device: EBW1150, Nook STR
I was used to write [a-z] for lowercase letters too, but since I discovered that unicode properties flag is working in Sigil 0.5 (*UCP), I simply use character classes with it to cover non-ASCII letters.
\w, \W, [:lower:], [:upper:], [:alpha:], [:alnum:] are all affected by (*UCP).
Timur is offline   Reply With Quote
Old 02-22-2012, 02:36 PM   #9
WS64
WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.
 
WS64's Avatar
 
Posts: 587
Karma: 506380
Join Date: Aug 2010
Location: Germany
Device: Kobo Aura / Bookeen Frontlight / Kobo Mini / Kindle 3 / Nook Color
Quote:
Originally Posted by DiapDealer View Post
Nope. [[:lower:]] is the correct usage.
I really had to check these...
You are (of course) right, I was wrong.
I never tried those since I never saw a reason to use them...
WS64 is offline   Reply With Quote
Old 02-22-2012, 08:50 PM   #10
Serpentine
Evangelist
Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.
 
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
Quote:
Originally Posted by WS64 View Post
I never tried those since I never saw a reason to use them...
They're great, since they cover your unicode characters too, for example you dont have to add ń,▀, etc - they are already understood to be lowercase.

[a-zń÷Ř▀] will all be captured by [[:lower:]], as well as a load more of edge cases which you might not have thought of. Saving you time, making edits more complete. The punct class is also especially useful - and very often overlooked.

The more you know!
Serpentine is offline   Reply With Quote
Old 02-23-2012, 03:51 AM   #11
WS64
WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.
 
WS64's Avatar
 
Posts: 587
Karma: 506380
Join Date: Aug 2010
Location: Germany
Device: Kobo Aura / Bookeen Frontlight / Kobo Mini / Kindle 3 / Nook Color
Quote:
Originally Posted by Serpentine View Post
They're great, since they cover your unicode characters too, for example you dont have to add ń,▀, etc - they are already understood to be lowercase.
I just checked. [[:lower:]] does NOT find ń÷Ř▀.
WS64 is offline   Reply With Quote
Old 02-23-2012, 05:43 AM   #12
Timur
Connoisseur
Timur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five wordsTimur can name that ebook in five words
 
Posts: 54
Karma: 37363
Join Date: Aug 2011
Location: Istanbul
Device: EBW1150, Nook STR
@WS64: add (*UCP) in front of your pattern like:

Code:
(*UCP)\b[[:lower:]]+

Last edited by Timur; 02-23-2012 at 05:43 AM. Reason: typo
Timur is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
regex search/replace Sharlene Sigil 10 01-28-2012 05:14 AM
regex replace??? schuster Conversion 14 01-29-2011 10:02 AM
need regex help search and replace schuster Calibre 4 01-10-2011 10:00 AM
My RegEx isn't doing what I hoped to remove page numbers and a fixed string winterminute Calibre 6 12-19-2010 11:55 PM
Find and replace string with wildcard jhempel24 Sigil 15 11-12-2010 02:50 PM


All times are GMT -4. The time now is 05:03 PM.


MobileRead.com is a privately owned, operated and funded community.