Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 02-14-2011, 06:02 AM   #1
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
understandng the sample add books regex

I am wanting to understand how exactly this works:
(?P<author>.+) - (?P<title>[^_]+)

I have followed the link to the tutorial & looked at other regex syntax summaries but am still not getting it

I don't see a definition of ?P anywhere, for example

could someone kindly break down how the above code works, please.
cybmole is offline   Reply With Quote
Old 02-14-2011, 06:50 AM   #2
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
See here. The word you're looking for is "named backreferences". It was explained in the tutorial passing back when the tutorial still contained how to add books, I may have to write a paragraph about backreferences to add.
Manichean is offline   Reply With Quote
Advert
Old 02-14-2011, 09:35 AM   #3
Archon
Zealot
Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!
 
Archon's Avatar
 
Posts: 110
Karma: 5176
Join Date: Dec 2010
Device: Mac OSX, iPad, iPod, & Nook
Wow! Sorry Manichean but the Python documentation you pointed to is for experienced programmers. I can't hardly understand anything on that page and I read it several times.

It is definitely not a beginners guide to regular expressions and I think what cybmole was looking for was a simple explanation of how the regular expression that he posted works.

Something like:
(?P<author>.+) - (?P<title>[^_]+)

The first part of the expression up through the hyphen takes a book's filename (any character contained) and puts the first part of the filename up to the space before the hyphen as the "author" when it adds the book.
The second part does the same thing for the "title". The "[^_]" in the second part tells it to ignore underscores in titles.

I am not entirely sure of my explanation but the Python manual makes too many references to things you don't need to know at the beginning level (octal numbers and nulls?) of understanding regular expressions. And as far as I have learned so far the "?P" is specific to Python's "flavor" of regex and is not found in other regex's so that is why you could not find it cybmole.

I think it appears as though Calibre's Python uses (?P<foo>) to pass the contained value of the parentheses to the variable "foo" as other regexes use \1 to pass the contained value of the first set of parentheses to the replace field to be used in a replacement expression.

Check out chapter 8 of the TextWrangler manual. Even if you don't use TextWrangler it is a good beginners guide to regex.
You can download it here:
http://www.barebones.com/support/textwrangler/
About 3/4 of the way down the page.

Happy Monday
Archon

Last edited by Archon; 02-14-2011 at 09:47 AM.
Archon is offline   Reply With Quote
Old 02-14-2011, 11:10 AM   #4
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
Quote:
Originally Posted by Manichean View Post
See here. The word you're looking for is "named backreferences". It was explained in the tutorial passing back when the tutorial still contained how to add books, I may have to write a paragraph about backreferences to add.
thanks , I understand it now. inclusion within calibre tutorial , or as a specific link from the add books preferences page would be good.

this is the info that I was missing beforehand ( taken from your link):

(?P<name>...)
Similar to regular parentheses, but the substring matched by the group is accessible within the rest of the regular expression via the symbolic group name name. Group names must be valid Python identifiers, and each group name must be defined only once within a regular expression. A symbolic group is also a numbered group, just as if the group were not named. So the group named id in the example below can also be referenced as the numbered group 1.

For example, if the pattern is (?P<id>[a-zA-Z_]\w*), the group can be referenced by its name in arguments to methods of match objects, such as m.group('id') or m.end('id'), and also by name in the regular expression itself (using (?P=id)) and replacement text given to .sub() (using \g<id>).
cybmole is offline   Reply With Quote
Old 02-14-2011, 11:42 AM   #5
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
I mostly ignored the whole (named) backreference thing before because it wasn't really applicable. In the search & replace, it's very convenient. Like I said, I'll probably add a paragraph on that sometime soon.
Manichean is offline   Reply With Quote
Advert
Old 02-14-2011, 11:44 AM   #6
Archon
Zealot
Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!
 
Archon's Avatar
 
Posts: 110
Karma: 5176
Join Date: Dec 2010
Device: Mac OSX, iPad, iPod, & Nook
Wow again!
Apparently cybmole is a lot more experienced at this than I presumed from his question.

I stand corrected.

I still think the Python manual is really hard to decipher unless you have been programming for years.

Happy Monday Again!
Archon
Archon is offline   Reply With Quote
Old 02-14-2011, 12:08 PM   #7
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,794
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by Archon View Post
Wow again!
Apparently cybmole is a lot more experienced at this than I presumed from his question.

I stand corrected.

I still think the Python manual is really hard to decipher unless you have been programming for years.

Happy Monday Again!
Archon
I look at the Python manual and my eyes glaze. Mainchean's tutorial got me using REGEX (Ironically, I use it in Sigil, not here )
theducks is online now   Reply With Quote
Old 02-14-2011, 12:51 PM   #8
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
i did work as a programmer when I was much younger, never in python or via regex though, my html , before getting into ebooks was very sketchy - I could do a home page with the aid of frontpage but that was about it ! ).

I actually worked on programming typesetting of complex books ( shipping resisters n stuff) at a computer bureau for a while but it was done in Fortran subroutines back then, and we had to join some bolshie newspaper/print workers closed shop union to be allowed to work on books! ( maybe have been the NUJ - National Union of Journalists - this was pre-Murdoch times in the UK when all newspapers were still printed in Fleet Street , and closed shop unions ruled the printing trade! )

I found regex very hard at first, but am now OK with basic find & replace via sigil or calibre - I can edit line feeds, chapter tags n stuff OK. I'm just learning it one command at a time, on a needs driven basis - something in an ebook has to annoy me enough to want to figure out how to fix it! ditto with css stuff, am slowly getting the hang of it & have enough OCD to want to e.g. make all books in a series use the same formatting - very sad :-)

I learnt last week ( driven by some dictionary hassles necessities) how to de-drm my purchased amazon books , so I can now reformat those if I care to. In fact "owning" an e-book that I could not actually edit would really annoy me. Took 1/2 a day but tis done now with a calibre plug in installed & working.

The teach yourself... in 10 mins book is very good also
cybmole is offline   Reply With Quote
Old 02-22-2011, 03:19 PM   #9
meads
Member
meads began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Feb 2011
Device: none
Quote:
Originally Posted by cybmole View Post
I am wanting to understand how exactly this works:
(?P<author>.+) - (?P<title>[^_]+)

I have followed the link to the tutorial & looked at other regex syntax summaries but am still not getting it

I don't see a definition of ?P anywhere, for example

could someone kindly break down how the above code works, please.

the ( ) create a RegEx "group". A group can be referenced in a replace string. In a replacement string a group can be referenced by \1 or \2 in other words the order that the original ( ) defines each group. ALSO if you put a "name" in the ( ) like ?P<author> you can use the "named" in the replacement string!

So (?P<author>.+) - (?P<title>[^_]+) has 2 groups that are named. The first group is named "author", the period means any character will be part of the author group. The author group stops building up when the dash - is encountered. The second groups is created and named "title". the characters allowed are defined by the brackets [ ]. In this case, all characters are allowed in the group EXCEPT the underline character _ this is the ^ which says NO to including the _ so we get ^_ which means the group will NOT include any underlines. The + means that there must be at least ONE character for the group to be built.

The P in the (?P<group name>) is a special Python requirement to let Python know that a group is to have a name.

Two references I am using are:
Regular Expressions in 10 Minutes by Ben Forta
Regular Expression Pocket Reference by Tony Stubblebine

other books on RegEx:
Mastering Regular Expressions by Jeffrey E. F. Friedl
Regular Expressions Cookbook by Jan Goyvaerts & Steven Levithan
Beginning Regular Expressions by Andrew Watt

hope that helps.
meads is offline   Reply With Quote
Old 02-22-2011, 03:55 PM   #10
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by meads View Post
The author group stops building up when the dash - is encountered.
It's subtle, but to be perfectly accurate, for (?P<author>.+) - (?P<title>[^_]+) the author group keeps building up until it encounters the last space - hyphen - space group, not the first one. The matching is said to be "greedy" so in a filename like "author - something in the middle - title.txt" where there are two space - hyphen - space groups, the first group (author) greedily builds up to be "author - something in the middle" and the last group is left with only the stuff after the last space - hyphen - space group. To reverse that, you can make the first group non-greedily match.

That's still no good if you want to ignore the middle text, which is why some of the regex groups get very complicated to optionally skip things and still match correctly.
Starson17 is offline   Reply With Quote
Old 03-02-2011, 05:43 AM   #11
Archon
Zealot
Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!
 
Archon's Avatar
 
Posts: 110
Karma: 5176
Join Date: Dec 2010
Device: Mac OSX, iPad, iPod, & Nook
Quote:
The P in the (?P<group name>) is a special Python requirement to let Python know that a group is to have a name.
Ahh yes that is what I thought.

I use grep based Regex in BBEdit and TextWrangler, and have looked at a little Ruby based Regex through TextMate. I never saw the "?P" thing in any of those flavors of Regex but grep uses local variables for back-references and can be accessed with \1 for the first capture group, Ruby uses $1.

So, the "(?P<group name>)" with the named variable makes the result available to the rest of the program for creating the filename and the metadata for the interface?

Happy Humpday
Archon is offline   Reply With Quote
Old 03-02-2011, 06:08 AM   #12
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by Archon View Post
So, the "(?P<group name>)" with the named variable makes the result available to the rest of the program for creating the filename and the metadata for the interface?
Yeah. It's basically the same thing as backreferencing a group with \number, except with names.
Manichean is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
'Sample' books from Amazon copyrite Amazon Kindle 4 10-31-2010 07:40 AM
Add book regex no longer works magphil Calibre 3 03-24-2010 03:40 PM
DJVU sample books DaleDe Other formats 0 12-18-2009 12:12 PM
How do I remove sample books? dhoyng Sony Reader 3 11-12-2008 10:44 AM
Sample books Laurens Sony Reader 18 10-20-2006 06:00 AM


All times are GMT -4. The time now is 02:53 PM.


MobileRead.com is a privately owned, operated and funded community.