Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 03-12-2015, 11:31 AM   #1
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,087
Karma: 447222
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
RegEx question about repeating

I have simple stored search RE to replace em, en, and dash in Hx's (mostly for consistent formatting)

Find : <([Hh][1-6])>(.*?)\s*[-—–]{1,}\s*(.*?)</\1>

Replace: <\1>\2 \3</\1>

The once in awhile problem occurs when there are 2 or more em, en, or dash in the same Hx


Code:
<h1>fasfasdsadf –    asdfsdfsd — sdafasdasd - asasdf - asdsadf - asdasdf</h1>
Is there are a way to have the RE do them all, or do I still have to do [Replace All] until 0 are found?

Last edited by phossler; 03-12-2015 at 03:16 PM. Reason: Supposed to be Hx and not just H1s
phossler is offline   Reply With Quote
Old 03-12-2015, 10:12 PM   #2
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Code:
<([Hh][1-6])>(.*?)\s*(?:[-—–]{1,}\s*(.*?))+</\1>
Repeating a capturing group
eschwartz is offline   Reply With Quote
Advert
Old 03-13-2015, 09:29 AM   #3
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,087
Karma: 447222
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
Did I do something wrong? I see how the repeat capture group works (or at least I think I do), but the Replace is not what I was expecting

It seems to replace too much, and all I was trying to do was end up with the fourth line in the picture. The F&R generated the first line.
Attached Thumbnails
Click image for larger version

Name:	Capture.JPG
Views:	310
Size:	96.8 KB
ID:	135905  
phossler is offline   Reply With Quote
Old 03-13-2015, 10:04 AM   #4
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,087
Karma: 447222
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
Also I tried a FR function

Code:
import regex
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
  	return match.group().replace('–',' ').replace('—',' ').replace('-',' ').replace(' {2,}',' ')
which actually seems to replace the dashes, but then the remove multiple spaces piece on the end doesn't seem to do anything

So I suspect that I'm missing something fundamental here
phossler is offline   Reply With Quote
Old 03-13-2015, 10:05 AM   #5
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Err, good point. It will only capture the last match.

You could search for that match multiple times though, and use a "?" to make all but the first optional.
eschwartz is offline   Reply With Quote
Advert
Old 03-13-2015, 12:07 PM   #6
jbacelar
Interested in the matter
jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.
 
jbacelar's Avatar
 
Posts: 421
Karma: 426094
Join Date: Dec 2011
Location: Spain, south coast
Device: Pocketbook InkPad 3
FuntcionRegex:

Search:
<([Hh][1-6])>.+?<

Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
return match.group().replace("@-@","@").replace("@–@","@").replace("@—@","@")

Naturally must change @ by space

Last edited by jbacelar; 03-13-2015 at 12:49 PM.
jbacelar is offline   Reply With Quote
Old 03-15-2015, 09:27 PM   #7
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,087
Karma: 447222
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
This is what I have now, and it works, but just looks ugly.

It replaces the em, en, and dashes in Hx's, even multiples, and then shrinks multiple spaces to a single space (up to 10)

Is there a way to make it a little more elegant (and maintainable)?

Find:

<([Hh][1-6])>(.*?)</\1>


Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    return match.group().replace("-"," ").replace("–"," ").replace("—"," ").replace ("  "," ").replace ("  "," ").replace ("  "," ").replace ("  "," ").replace ("  "," ").replace ("  "," ").replace ("  "," ").replace ("  "," ").replace ("  "," ").replace ("  "," ").replace ("  "," ")
phossler is offline   Reply With Quote
Old 03-16-2015, 03:14 AM   #8
jbacelar
Interested in the matter
jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.
 
jbacelar's Avatar
 
Posts: 421
Karma: 426094
Join Date: Dec 2011
Location: Spain, south coast
Device: Pocketbook InkPad 3
I do not know which is the layout of dashes or spaces (and quantity) into your text, but I think something like what I propose (or similar) should work, (up to 10 spaces).

Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
return match.group().replace('@@','@').replace('-','@').replace('–','@').replace('—','@').replace(' @@@','@').replace('@@','@')
jbacelar is offline   Reply With Quote
Old 03-16-2015, 07:27 AM   #9
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,559
Karma: 204127028
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Find:
Code:
<([Hh][1-6])([^>]*)>(.*?)</\1>
Code:
import regex
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    text_str = regex.sub(r'''[-–—]''', ' ', match.group(3))
    text_str = regex.sub(r''' {2,}''', ' ', text_str)
    return '<{0}{1}>{2}</{0}>'.format(match.group(1), match.group(2), text_str)
I don't know about graceful (I've always been strangely comforted by the aesthetics of the python one-liner, myself), but the above won't have the ten-space-or-less limitation.

** added the possibility to work with header tags that may have attributes.

Last edited by DiapDealer; 03-16-2015 at 07:50 AM.
DiapDealer is offline   Reply With Quote
Old 03-16-2015, 01:35 PM   #10
jbacelar
Interested in the matter
jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.jbacelar ought to be getting tired of karma fortunes by now.
 
jbacelar's Avatar
 
Posts: 421
Karma: 426094
Join Date: Dec 2011
Location: Spain, south coast
Device: Pocketbook InkPad 3
@DiapDealer

Definitivo
jbacelar is offline   Reply With Quote
Old 03-16-2015, 08:39 PM   #11
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,087
Karma: 447222
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
@DiapDealer









I will study the technique since I can see many more places I can save myself some tedious work
phossler is offline   Reply With Quote
Old 03-17-2015, 08:51 PM   #12
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,087
Karma: 447222
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
@DiapDealer --

Could you please explain the syntax, grammar, and punctuation of the function?

I read this ...

https://docs.python.org/2.7/library/...ght=sub#re.sub

but still don't get it

The match.group(3) and the space{2,} I recognize, then things like

r'''something''' i.e. why the r and 3 single quotes?

return .... i.e. I can figure out the {0}, etc. but why in ' ....' and what is the .format for?

In case you haven't realized, my understanding of python is zilch

I am trying to figure out enough to cookbook some other functions

Thanks
phossler is offline   Reply With Quote
Old 03-17-2015, 09:25 PM   #13
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
r'''something''' -- see: https://docs.python.org/2.0/ref/strings.html

.format() acts on a string, and takes x arguments. For each argument, insert the value into the original string, replacing {n}.
eschwartz is offline   Reply With Quote
Old 03-17-2015, 10:00 PM   #14
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,559
Karma: 204127028
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
The r''' ''' is probably overkill in this situation, but I've gotten into the habit of using them all the time for regex expressions in python. '[-–—]' or "[-–—]" would achieve the same thing in this particular instance. It's still just a string representation of the regex expression.

Code:
text_str = regex.sub(r'''[-–—]''', ' ', match.group(3))
regex.substitute('everything matching this expression', with 'this', in 'this string')
Find all occurrences of - or – or — and replace them with a space in the string contained in the 3rd matching group. Store the results in text_str.

Code:
text_str = regex.sub(r''' {2,}''', ' ', text_str)
Find all occurrences of two or more consecutive spaces and replace them with a single space in the text_str string. Store the results in text_str.

Code:
return '<{0}{1}>{2}</{0}>'.format(match.group(1), match.group(2), text_str)
String formatting/substitution.
Code:
'Hello {0}'.format('there')
Substitute {0} with 'there'
Code:
'Hello {0} {1} {2}, {0}'.format('there', 'you', 10)
Becomes 'Hello there you 10, there.'

You don't even need to use numbers if you're not going to repeat anything:
Code:
'Hello {} {} {}, {}'.format('there', 'you', 10, 'you')
You could also use string concatenation:
Code:
return match.group(1) + match.group(2) + text_str + match.group(1)
But then you have to worry about making sure everything is represented properly as a string beforehand. Probably not necessary in this case, but again, just a habit I've gotten into to avoid type mismatches (plus I just like it better than the %s %d string substitution method )

Code:
return '<%s%s>%s</%s>' % (match.group(1), match.group(2), text_str, match.group(1))
In this particular case:
Code:
return '<{0}{1}>{2}</{0}>'.format(match.group(1), match.group(2), text_str)
match.group(1) will be the tag name (h1, h2, h3, etc) and gets plugged into both {0}s.
match.group(2) will be any (optional) attributes (class="foo") and gets plugged into {1}.
text_str is our manipulated content from between the h-tags and gets plugged into {2}

Last edited by DiapDealer; 03-17-2015 at 10:11 PM.
DiapDealer is offline   Reply With Quote
Old 03-17-2015, 11:29 PM   #15
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Also because going forward, the format function is recommended -- for that very reason of course.
eschwartz is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
regex question DrChiper Editor 14 11-22-2014 04:27 AM
RegEx question (again) phossler Sigil 12 01-20-2013 02:37 PM
Yet another regex question Jabby Sigil 8 01-30-2012 08:41 PM
Regex question and maybe some help crutledge Sigil 9 03-10-2011 04:37 PM
Regex Question Archon Conversion 11 02-05-2011 10:13 AM


All times are GMT -4. The time now is 05:45 AM.


MobileRead.com is a privately owned, operated and funded community.