Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 06-23-2014, 04:39 AM   #1
myki
Zealot
myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.
 
Posts: 126
Karma: 20236
Join Date: May 2014
Device: Kinde PW v1, Kobo H2O, Onyx Boox T68
Change case in a regular expression

Greetings,

Is there a way to capitalize some text with regex in Calibre ?
I think I've read somewhere that it was possible in python, but not in Calibre.

My goal :
I am trying to improve my "save as" modele, trying to extract the name of the author_sort and capitalize it, and extract the firstname.
For example : from Christie, Agatha to CHRISTIE Agatha
For now, i built 2 personnal fields. It works but it would be better to work with regex...

Thanx for your help
myki is offline   Reply With Quote
Old 06-24-2014, 06:45 AM   #2
Sabardeyn
Guru
Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.
 
Sabardeyn's Avatar
 
Posts: 644
Karma: 1242364
Join Date: May 2009
Location: The Right Coast
Device: PC (Calibre), Nexus 7 2013 (Moon+ Pro), HTC HD2/Leo (Freda)
Ok, I'm not the best person to be answering because I definitely cannot build the regex expression for you. But lets start the ball rolling with some general Q&A that might inspire others to offer some regex examples.

I think it is possible to perform the regex operation that you want, but it's going to require a complex expression depending on what kind of Lastname situation(s) you want to account for. What I mean is, do you have lastnames that include:
  • hyphenated names (Smith-Johnson)
  • multi-word names (Michelle Van de Camp)
  • empty names (ie, Cher - has no last name)
  • names writen in leetspeak (Thompson = +h0|\/|p50|\|)
Are there any other word boundary issues (finding the beginning/end of a name)? Because things like this will be the major issue in determining the correct expression to use.

Another issue is where in calibre do you plan to use this expression? In Add Books? Or in Edit Metadata? Somewhere else?

Does the ebook source follow a distinct, rigidly enforced pattern? For instance a filename like: Title - Firstname Lastname.Extension. Consistent source material helps during input.



PS: Take a look at the topic Tyranosaurus Regex, including my post (msg #17) where the efforts of others came together in a wonderful regex expression which was particularly efficient. I didn't create it, I just stuck it all together (and I got lucky!).

There also used to be a Regex example topic, but I'm not seeing it ATM.

Last edited by Sabardeyn; 06-24-2014 at 06:47 AM.
Sabardeyn is offline   Reply With Quote
Advert
Old 06-24-2014, 10:51 PM   #3
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Use this, in general program mode. Or tweak into another mode, but general program mode is awesome for complex stuff, like combining multiple fields in one using complex regexes and function calls.

It will only parse the first author.

Code:
program:

	FN=re(
		field('author_sort'),
		'([^,]+),.+',
		'\1'
	);

	LN=re(
		field('author_sort'),
		'[^,]+,(.+)',
		'\1'
	);

	fixed_author_sort=strcat(
		uppercase(FN),
		LN
	)

Last edited by eschwartz; 06-25-2014 at 03:31 AM.
eschwartz is offline   Reply With Quote
Old 06-25-2014, 02:43 AM   #4
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,336
Karma: 8012652
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
This is an interesting case. In effect, it wants a function that allows a template to be applied to each group matched in the search regular expression, in effect providing similar functionality as python's match groups. This function, having a variable number of arguments, would look something like:
Code:
re_groups(field, search_expr, template_for_group_1, t_for_g_2, ...)
The group would be named '$' when the template for that group is called. The result is the field with each match of 'search_expr' replaced by the result of evaluating the template for each group and concatenating the groups.

For the situation being discussed in this thread and using eschwartz's example, one would use something like
Code:
re_groups(field('author_sort'), '([^,]+), (.+)', 'program: uppercase($)', 'program: $')
or
Code:
re_groups(field('author_sort'), '([^,]+), (.+)', "[[$:uppercase()]]", '[[$]]')
I will look at whether such a function is buildable.

Last edited by chaley; 06-25-2014 at 03:19 AM. Reason: changed syntax
chaley is offline   Reply With Quote
Old 06-25-2014, 03:32 AM   #5
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Sounds awesome, chaley!

I gave up pretty quickly looking at that and took the easy way.
eschwartz is offline   Reply With Quote
Advert
Old 06-25-2014, 04:08 AM   #6
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,336
Karma: 8012652
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
I have it working and will submit it for the next release.

Examples using your suggested approach:
Code:
program: re_group(field('authors'), '(\S*), (\S*)', '[[$:uppercase()]] ', "[[$]]")
or
Code:
{authors:'re_group(field('authors'), '(\S*), (\S*)', '[[$:uppercase()]] ', "[[$]]")'}
Now I need to add another function like list_re that uses re_group instead of re. Something like
Code:
list_re_group(src_list, separator, search_re, group_1, group_2, ...)
that would apply the re_group to each item in the list.
chaley is offline   Reply With Quote
Old 06-25-2014, 05:39 AM   #7
myki
Zealot
myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.
 
Posts: 126
Karma: 20236
Join Date: May 2014
Device: Kinde PW v1, Kobo H2O, Onyx Boox T68
Thanx for your help guys... But you lost me lol
I don't know anything about the program mode.
I looked into the documentation and it frightened me !

I don't want to make you lose your time, but if you're still motivated to help me, i would need more explantions, beginner oriented please

Thank you !

HS : Calibre can do so much things, it's really impressive.

EDIT : Argh ! because of my english, i have confused "capitalize" and "change in uppercase", from Christie, Agatha to CHRISTIE Agatha.
Sorry for that !

Last edited by myki; 06-25-2014 at 05:48 AM.
myki is offline   Reply With Quote
Old 06-25-2014, 05:49 AM   #8
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Currently you can use my program: block to "fix" the author_sort for your use. I would recommend using it inside {:'program-goes-here'} e.g.

Code:
{:'program:FN=re(field("author_sort"),"([^,]+),.+","\1");LN=re(field("author_sort"),"[^,]+,(.+)",\1");fixed_author_sort=strcat(uppercase(FN),LN)'}
can be used as a template to produce author_sort_fixed.

Either code block works, but:
  • This new one is stuck into a template, allowing you to follow it with more standard templates.
  • The previous one was meant for integrating into program mode, where any further sections have to be called with the field function and strcat'ed onto the last function call. If you don't understand this, you don't have to -- just use the new one.
Using general program mode is only required for people who want to spend time making it look nice. And are probably a little obsessed about it. As you can see, the code I just gave is a long incomprehensible blob without the neatly-laid-out whitespace and stuff. But it works just as well once written.

Also, you can use chaley's new function if you are willing to wait until Friday (and the next calibre release).

Last edited by eschwartz; 06-25-2014 at 06:19 AM. Reason: grammar
eschwartz is offline   Reply With Quote
Old 06-25-2014, 05:58 AM   #9
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by myki View Post
EDIT : Argh ! because of my english, i have confused "capitalize" and "change in uppercase", from Christie, Agatha to CHRISTIE Agatha.
Sorry for that !
Your intent was clear, though. for giving an example of what you want.

Also, technically you are capitalizing all letters.
eschwartz is offline   Reply With Quote
Old 06-25-2014, 06:37 AM   #10
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
Capitalizing only the first letter of each word is often referred to as title case.

If it makes you feel any better, I don't know anything about Python (the language the program is written in) either.
mrmikel is offline   Reply With Quote
Old 06-25-2014, 07:12 AM   #11
myki
Zealot
myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.
 
Posts: 126
Karma: 20236
Join Date: May 2014
Device: Kinde PW v1, Kobo H2O, Onyx Boox T68
Thank you very much, it is very instructive !

I hesitate to change the way "author_sort" works, so finally i found a solution :
I built 2 personnalised columns, from another :
Code:
#author_sort_lastname with {:'uppercase('{author_sort:list_item(0,\,)}')'}
#author_sort_firstname with {author_sort:list_item(1,\,)}
And my save as modele :
Code:
{#author_sort_lastname} {#author_sort_firstname:re([.],)}/{series:'test($,'{series}/[{series}-{series_index:0>2s}] ','{#author_sort_lastname} {#author_sort_firstname} - ')'}{title}
I kept first and last name separated in case i wish in the future, the firstname before

In the adjustments, i changed :
Code:
save_template_title_series_sorting = 'strictly_alphabetic'
At last, i tried to replace {#author_sort_lastname} and {#author_sort_firstname} by their value, but i get an error.

It was very interesting, thanx again
myki is offline   Reply With Quote
Old 06-25-2014, 09:26 AM   #12
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,336
Karma: 8012652
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by myki View Post
Thank you very much, it is very instructive !
...
Code:
#author_sort_lastname with {:'uppercase('{author_sort:list_item(0,\,)}')'}
And my save as modele :
Code:
{#author_sort_lastname} {#author_sort_firstname:re([.],)}/{series:'test($,'{series}/[{series}-{series_index:0>2s}] ','{#author_sort_lastname} {#author_sort_firstname} - ')'}{title}
I note that you are using subtemplates (nested templates). You should not do that. There are several ways that evaluating templates can break by using subtemplates.

Your first template is better written as something like
Code:
{author_sort:'uppercase(list_item($, 0,\,))'}
and your second one should really be written in general program mode because of its complexity, something like:
Code:
program:
	as = field('author_sort');
# list item removes the comma. Note that this doesn't work if there 
# are multiple authors or if the author name doesn't contain a comma
	asfn = list_item(as, 1, ',');
	asln = uppercase(list_item(as, 0, ','));

# Use the template function as a convenience to avoid calling format_number
	has_series = template('{series}/[{series}-{series_index:0>2s}]');
	no_series = strcat(asln, ' ', asfn, '  -');
	series_val = test(field('series'), has_series, no_series);

	strcat(asln, ' ', asfn,  '/', series_val, ' ', field('title'))
Note that this template does not make use of the #author_sort_*_name columns, so perhaps they can go away.
chaley is offline   Reply With Quote
Old 06-25-2014, 09:29 AM   #13
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,336
Karma: 8012652
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by eschwartz View Post
Using general program mode is only required for people who want to spend time making it look nice. And are probably a little obsessed about it.
Not quite true. GPM is close to required when using subtemplates. One *can* use a form of subtemplates in TPM, but it is very tricky and error prone.
chaley is offline   Reply With Quote
Old 06-25-2014, 09:39 AM   #14
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,336
Karma: 8012652
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by chaley View Post
Now I need to add another function like list_re that uses re_group instead of re. Something like
Code:
list_re_group(src_list, separator, search_re, group_1, group_2, ...)
that would apply the re_group to each item in the list.
I submitted this function as well. The prototype is different from what I suggested earlier:
Code:
list_re_group(src_list, separator, include_re, search_re, group_1, group_2, ...)
The function first builds a result list by applying 'include_re' against each list element, then applies re_group(search_re, g1, ...) to each resulting element.

This thread provides an example of this function's use, to uppercase the last name of each author for a book:
Code:
program: list_re_group(field('authors'), ' & ', '.', '([^,]*), (.*)', '{$:uppercase()}, ', '{$}')
or
Code:
{authors:'list_re_group($, ' & ', '.', '([^,]*), (.*)', '[[$:uppercase()]], ', '[[$]]')'}
chaley is offline   Reply With Quote
Old 06-25-2014, 04:59 PM   #15
myki
Zealot
myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.myki can self-interpret dreams as they happen.
 
Posts: 126
Karma: 20236
Join Date: May 2014
Device: Kinde PW v1, Kobo H2O, Onyx Boox T68
As i said i didn't knew anything about GPM.
For me i feel more comfortable with program mode thanks to the use of variables.
And the writing is more vertical, when the normal mode is more horizontal.

I'm curious :
Is it possible to use variables in normal mode, and how ??
Is it possible to increase the size of the police inside the modele editor ?

Thanx for your help...

@chaley :
If i had your function, it would have been so easier for me to harmonize my authors the first time i built my database
myki is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Case Sensitive Regular Expression silentguy Calibre 5 05-11-2015 05:56 AM
Please help me with regular expression :help: Tatjana Library Management 2 05-30-2014 05:41 PM
Regular Expression Help Azhad Calibre 86 09-27-2011 02:37 PM
Regular expression help krendk Calibre 4 12-04-2010 04:32 PM
Help with the regular expression Dysonco Calibre 9 03-22-2010 10:45 PM


All times are GMT -4. The time now is 05:47 AM.


MobileRead.com is a privately owned, operated and funded community.