Pruning redundant and partially redundant tags

Sidetrack · 03-01-2013, 05:05 PM

I've been using the goodreads metadata download plugin to map tags to a hierarchy I like, and now I'd like to prune some of the redundant information out of the rest of my tags. So I'm looking for an elegant solution. I'm getting there with the regex replacement, but as stated, any more elegant solutions would be appreciated. I'm a little stumped on how to search for books that have redundant info on something better than a case-by-case basis.

example:

foo.fie, foo.fie.fum, foo, fum fie would become simply: foo.fie.fum
or
fiction, genre.crime, genre.mystery, genre.mystery.hard-boiled, crime, mystery, mystery & detective, hardboiled mystery
would become
genre.crime, genre.mystery.hardboiled

my regex is similar to this, though I've got a bit of a mishmash going with special cases:
template {tags} (\.[^\.,]+)(.*, )?([^,\.]*)\1; \1\2

I have to use separate search terms if the offending tags sort alphabetically before the genre tags

Any ideas on how to search for or otherwise identify books with partially redundant tags? Maybe a calculated column? How about some cleaner more robust replacement terms?

One other thing that bugs me is when I get info like the author's name or publisher mixed in as a tag when I've already got that information in it's appropriate column.

03-01-2013, 05:05 PM	#1
Sidetrack Enthusiast Posts: 39 Karma: 10 Join Date: Jan 2009 Location: South Pacific Device: Kindle DX	Pruning redundant and partially redundant tags I've been using the goodreads metadata download plugin to map tags to a hierarchy I like, and now I'd like to prune some of the redundant information out of the rest of my tags. So I'm looking for an elegant solution. I'm getting there with the regex replacement, but as stated, any more elegant solutions would be appreciated. I'm a little stumped on how to search for books that have redundant info on something better than a case-by-case basis. example: foo.fie, foo.fie.fum, foo, fum fie would become simply: foo.fie.fum or fiction, genre.crime, genre.mystery, genre.mystery.hard-boiled, crime, mystery, mystery & detective, hardboiled mystery would become genre.crime, genre.mystery.hardboiled my regex is similar to this, though I've got a bit of a mishmash going with special cases: template {tags} (\.[^\.,]+)(., )?([^,\.])\1; \1\2 I have to use separate search terms if the offending tags sort alphabetically before the genre tags Any ideas on how to search for or otherwise identify books with partially redundant tags? Maybe a calculated column? How about some cleaner more robust replacement terms? One other thing that bugs me is when I get info like the author's name or publisher mixed in as a tag when I've already got that information in it's appropriate column.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Redundant topic line	Steven630	Recipes	6	06-22-2012 12:43 PM
bad / redundant html ?	cybmole	Calibre	0	12-29-2010 11:49 AM
Redundant/Invalid TOC entries	Stinger	Kobo Reader	4	06-26-2010 09:02 PM
Not to be obnoxiously redundant but can we have a jetBook forum?	wodin	Feedback	7	05-25-2009 03:41 PM
Redundant collections after using calibre	Yarrow	Calibre	0	12-25-2008 04:30 PM