![]() |
#1 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
understandng the sample add books regex
I am wanting to understand how exactly this works:
(?P<author>.+) - (?P<title>[^_]+) I have followed the link to the tutorial & looked at other regex syntax summaries but am still not getting it I don't see a definition of ?P anywhere, for example could someone kindly break down how the above code works, please. |
![]() |
![]() |
![]() |
#2 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
See here. The word you're looking for is "named backreferences". It was explained in the tutorial passing back when the tutorial still contained how to add books, I may have to write a paragraph about backreferences to add.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 110
Karma: 5176
Join Date: Dec 2010
Device: Mac OSX, iPad, iPod, & Nook
|
Wow! Sorry Manichean but the Python documentation you pointed to is for experienced programmers. I can't hardly understand anything on that page and I read it several times.
It is definitely not a beginners guide to regular expressions and I think what cybmole was looking for was a simple explanation of how the regular expression that he posted works. Something like: (?P<author>.+) - (?P<title>[^_]+) The first part of the expression up through the hyphen takes a book's filename (any character contained) and puts the first part of the filename up to the space before the hyphen as the "author" when it adds the book. The second part does the same thing for the "title". The "[^_]" in the second part tells it to ignore underscores in titles. I am not entirely sure of my explanation but the Python manual makes too many references to things you don't need to know at the beginning level (octal numbers and nulls?) of understanding regular expressions. And as far as I have learned so far the "?P" is specific to Python's "flavor" of regex and is not found in other regex's so that is why you could not find it cybmole. I think it appears as though Calibre's Python uses (?P<foo>) to pass the contained value of the parentheses to the variable "foo" as other regexes use \1 to pass the contained value of the first set of parentheses to the replace field to be used in a replacement expression. Check out chapter 8 of the TextWrangler manual. Even if you don't use TextWrangler it is a good beginners guide to regex. You can download it here: http://www.barebones.com/support/textwrangler/ About 3/4 of the way down the page. Happy Monday Archon Last edited by Archon; 02-14-2011 at 09:47 AM. |
![]() |
![]() |
![]() |
#4 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
Quote:
this is the info that I was missing beforehand ( taken from your link): (?P<name>...) Similar to regular parentheses, but the substring matched by the group is accessible within the rest of the regular expression via the symbolic group name name. Group names must be valid Python identifiers, and each group name must be defined only once within a regular expression. A symbolic group is also a numbered group, just as if the group were not named. So the group named id in the example below can also be referenced as the numbered group 1. For example, if the pattern is (?P<id>[a-zA-Z_]\w*), the group can be referenced by its name in arguments to methods of match objects, such as m.group('id') or m.end('id'), and also by name in the regular expression itself (using (?P=id)) and replacement text given to .sub() (using \g<id>). |
|
![]() |
![]() |
![]() |
#5 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
I mostly ignored the whole (named) backreference thing before because it wasn't really applicable. In the search & replace, it's very convenient. Like I said, I'll probably add a paragraph on that sometime soon.
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 110
Karma: 5176
Join Date: Dec 2010
Device: Mac OSX, iPad, iPod, & Nook
|
Wow again!
Apparently cybmole is a lot more experienced at this than I presumed from his question. I stand corrected. I still think the Python manual is really hard to decipher unless you have been programming for years. Happy Monday Again! Archon |
![]() |
![]() |
![]() |
#7 | |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,882
Karma: 59840450
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
![]() |
|
![]() |
![]() |
![]() |
#8 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
i did work as a programmer when I was much younger, never in python or via regex though, my html , before getting into ebooks was very sketchy - I could do a home page with the aid of frontpage but that was about it ! ).
I actually worked on programming typesetting of complex books ( shipping resisters n stuff) at a computer bureau for a while but it was done in Fortran subroutines back then, and we had to join some bolshie newspaper/print workers closed shop union to be allowed to work on books! ( maybe have been the NUJ - National Union of Journalists - this was pre-Murdoch times in the UK when all newspapers were still printed in Fleet Street , and closed shop unions ruled the printing trade! ) I found regex very hard at first, but am now OK with basic find & replace via sigil or calibre - I can edit line feeds, chapter tags n stuff OK. I'm just learning it one command at a time, on a needs driven basis - something in an ebook has to annoy me enough to want to figure out how to fix it! ditto with css stuff, am slowly getting the hang of it & have enough OCD to want to e.g. make all books in a series use the same formatting - very sad :-) I learnt last week ( driven by some dictionary hassles necessities) how to de-drm my purchased amazon books , so I can now reformat those if I care to. In fact "owning" an e-book that I could not actually edit would really annoy me. Took 1/2 a day but tis done now with a calibre plug in installed & working. The teach yourself... in 10 mins book is very good also |
![]() |
![]() |
![]() |
#9 | |
Member
![]() Posts: 11
Karma: 10
Join Date: Feb 2011
Device: none
|
Quote:
the ( ) create a RegEx "group". A group can be referenced in a replace string. In a replacement string a group can be referenced by \1 or \2 in other words the order that the original ( ) defines each group. ALSO if you put a "name" in the ( ) like ?P<author> you can use the "named" in the replacement string! So (?P<author>.+) - (?P<title>[^_]+) has 2 groups that are named. The first group is named "author", the period means any character will be part of the author group. The author group stops building up when the dash - is encountered. The second groups is created and named "title". the characters allowed are defined by the brackets [ ]. In this case, all characters are allowed in the group EXCEPT the underline character _ this is the ^ which says NO to including the _ so we get ^_ which means the group will NOT include any underlines. The + means that there must be at least ONE character for the group to be built. The P in the (?P<group name>) is a special Python requirement to let Python know that a group is to have a name. Two references I am using are: Regular Expressions in 10 Minutes by Ben Forta Regular Expression Pocket Reference by Tony Stubblebine other books on RegEx: Mastering Regular Expressions by Jeffrey E. F. Friedl Regular Expressions Cookbook by Jan Goyvaerts & Steven Levithan Beginning Regular Expressions by Andrew Watt hope that helps. |
|
![]() |
![]() |
![]() |
#10 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
That's still no good if you want to ignore the middle text, which is why some of the regex groups get very complicated to optionally skip things and still match correctly. |
|
![]() |
![]() |
![]() |
#11 | |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 110
Karma: 5176
Join Date: Dec 2010
Device: Mac OSX, iPad, iPod, & Nook
|
Quote:
I use grep based Regex in BBEdit and TextWrangler, and have looked at a little Ruby based Regex through TextMate. I never saw the "?P" thing in any of those flavors of Regex but grep uses local variables for back-references and can be accessed with \1 for the first capture group, Ruby uses $1. So, the "(?P<group name>)" with the named variable makes the result available to the rest of the program for creating the filename and the metadata for the interface? Happy Humpday |
|
![]() |
![]() |
![]() |
#12 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
'Sample' books from Amazon | copyrite | Amazon Kindle | 4 | 10-31-2010 07:40 AM |
Add book regex no longer works | magphil | Calibre | 3 | 03-24-2010 03:40 PM |
DJVU sample books | DaleDe | Other formats | 0 | 12-18-2009 12:12 PM |
How do I remove sample books? | dhoyng | Sony Reader | 3 | 11-12-2008 10:44 AM |
Sample books | Laurens | Sony Reader | 18 | 10-20-2006 06:00 AM |