Question: Here's a template I use to select, split, and sort hierarchical tags for a #subjects column:
Code:
program:
## Splitting tags
if
'^(Fiction|Nonfiction|Magazines & Periodicals)' in $#booktype
then
split_tags = re($tags, '\.', ',')
else
## empty for other booktypes with more specific columns
split_tags = ''
fi;
## Removing a few unwanteds and sorting
cleaned_tags = list_sort(
list_difference(
split_tags,
'Fiction, Nonfiction, Magazines & Periodicals, Cultures & Regions, Social Issues',
','), 0,
',')
Seeing as my books are always tagged as, e.g.
Fiction.Science Fiction.Space Opera
Nonfiction.Biographies and Memoirs, Nonfiction.Music
I thought it might make sense to change
re($tags, '\.', ',') to exclude the 'topmost' (anything to the left of the first period) rather than remove them out after the fact.
a) Would this improve performance? I also have other tags to remove, so I wouldn't be removing the list_difference entirely.
b) What regex would I use?
re($tags, '(.*)\.', ',') only semi-worked; I see why but I'm not sure how to properly capture it.