View Single Post
Old 10-17-2010, 01:16 AM   #54
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,728
Karma: 6690881
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by Gary_M_Mugford View Post
The best of the initialisms versus shorten factions would be a regex that would check for spaces>0 and then use the shorten(4,~,3) versus the full regex contraction.
What you are asking for can be done, but not directly with regexps.

First, while working through this example I found that my implementation of the lookup function wasn't optimal. It suffered from the same problems that sweetpea found with test etc. I changed lookup, and the changes will appear in the next release (tomorrow, possibly). The examples uses the new functionality, so they won't work at home until release/upgrade.

OK, how to do it. The full solution requires creating three composite columns. The first column is used to remove the leading articles. The second is used to compute the 'shorten' form. The third is to compute the 'initials' form. Once you have these columns, the plugboard selects between them.

First column:
Code:
Name: #stripped_series. Template: {series:re(^(A|The|An)\s+,)||}
Second column (the shortened form):
Code:
Name: #shortened. Template: {#stripped_series:shorten(4,-,4)}
Third column (the initials form):
Code:
Name: #initials. Template: {#stripped_series:re(([^\s])[^\s]+(\s|$),\1)}
Plugboard expression:
Code:
Template:{#composite:lookup(.\s,#initials,.,#shortened,series)}{series_index:0>2s| [|] }{title}
Destination field: title
This set of fields and plugboard produces:
Series: The Lord of the Rings
Series index: 2
Title: The Two Towers
Output: LotR [02] The Two Towers

Series: Dahak
Series index: 1
Title: Mutineers Moon
Output: Dahak [01] Mutineers Moon

Series: Berserkers
Series Index: 4
Title: Berserker Throne
Output: Bers-kers [04] Berserker Throne

Series: Meg Langslow Mysteries
Series Index: 3
Title: Revenge of the Wrought-Iron Flamingos
Output: MLM [03] Revenge of the Wrought-Iron Flamingos

Of course, the plugboard template can easily be changed to require 2, 3, or however-many-you-want spaces before initials are used. For example, the following checks for 3 spaces.
Plugboard expression:
Code:
Template:{#composite:lookup(\s.*\s.*\s,#initials,.,#shortened,series)}{series_index:0>2s| [|] }{title}
Destination field: title
Using this template would change the Meg Langslow example to be:

Series: Meg Langslow Mysteries
Series Index: 3
Title: Revenge of the Wrought-Iron Flamingos
Output: Meg -ries [03] Revenge of the Wrought-Iron Flamingos

All the others would remain the same.

Last edited by chaley; 10-17-2010 at 04:22 PM. Reason: Fix wrong names in template
chaley is offline   Reply With Quote