Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 11-10-2011, 06:03 AM   #1
smoothrolla
Member
smoothrolla began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Nov 2011
Device: kindle
using templates/pyhon and custom columns to extract specific data from tags

Hi Guys

I recently found out how to copy all data from one column (tags) to another customer column by using search and replace, thanks Chaley!

However i only want to copy specific data from the tags, using a template/python function so i dont have to do it manually.

I started to learn about templates and python last night and got pretty far:

ie

i created a column called #testcomposite

first i tried templates to extract only known genres from the tags column:

Code:
{#testcomposite:'list_intersection(field('tags'),'Adult, Adventure, Anthologies, Biography, Childrens, Classics, Drugs, Fantasy, Food, Football, Health, History, Historical, Horror, Humour, Inspirational, Modern, Music, Mystery, Non-Fiction, Poetry, Political, Philosophy, Psychological, Reference, Religion, Romance, Science, Science Fiction, Self Help, Short Stories, Sociology, Spirituality, Suspense, Thriller, Travel, Vampires, War, Western, Writing, Young Adult',',')'}
That worked, but this was slow, so i figured out how to create a python function by adapting the list_intersection function:

function:getgenre, 1 param
Code:
def evaluate(self, formatter, kwargs, mi, locals, val):
list1 = val
list2 = 'Adult, Adventure, Anthologies, Biography, Childrens, Classics, Drugs, Fantasy, Food, Football, Health, History, Historical, Horror, Humour, Inspirational, Modern, Music, Mystery, Non-Fiction, Poetry, Political, Philosophy, Psychological, Reference, Religion, Romance, Science, Science Fiction, Self Help, Short Stories, Sociology, Spirituality, Suspense, Thriller, Travel, Vampires, War, Western, Writing, Young Adult'
separator = ','
l1 = [l.strip() for l in list1.split(separator) if l.strip()]
l2 = [icu_lower(l.strip()) for l in list2.split(separator) if l.strip()]
res = []
for i in l1:
if icu_lower(i) in l2:
res.append(i)
return ', '.join(res)
called with template:
Code:
{#testcomposite:'getgenre(field('tags'))'}
So thats great, runs real quick, was quite amazed i got this far

However, what i would like is something like this:
Extract all the known genres from the tags (like above)
but also if i come across a tag which contains *mystery* (like Mystery & Detective) then add genre "Mystery" to the #testcomposite column

so something like this
if tag item like '*horror*' or tag item='Scarey' or tag item='Spooky' then add 'Horror' etc

Any help is appreicated, in either template or python (or both!)

PS, i am a programmer, but python and calibre is all very new to me and a little lower level language than im used to.

PPS, im amazed at home flexable this program is, hats off to the creator(s)!

Thanks very much!

Last edited by smoothrolla; 11-10-2011 at 11:38 AM.
smoothrolla is offline   Reply With Quote
Old 11-10-2011, 12:27 PM   #2
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,691
Karma: 6240117
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
First, you can make the python function faster by changing the code as follows:
Code:
def evaluate(self, formatter, kwargs, mi, locals, val):
    list1 = val
    l2 = ['adult', 'adventure', 'anthologies', 'biography', ..., 'young adult']
    l1 = [l.strip() for l in list1.split(',') if l.strip()]
    l1lcase = [icu_lower(l) for l in l1]
    res = set()
    for idx,item in enumerate(l1lcase):
        if item in l2:
            res.add(l1[idx])
    return ', '.join(res)
The reason to use a set for res is to avoid having the same entry in the result more than once. This will matter in the code below.

You can do the 'like' examples using something like:
Code:
    for item in l1lcase:
        if 'horror' in item or item in ['scary', 'spooky']:
            res.add('Horror')
            break
    for item in l1lcase:
        if 'mystery' in item or 'detective' in item:
            res.add('Mystery')
            break
When the "in" operator is applied as "string in string", it is a "contains" operation.

The set is necessary here because the added item might already be in the result, thus adding it more than once.
chaley is offline   Reply With Quote
Advert
Old 11-10-2011, 03:22 PM   #3
smoothrolla
Member
smoothrolla began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Nov 2011
Device: kindle
Thanks Charley!

I got the speeded up script working great, thanks for that.
I half understand it, slowly getting there

I decided the code needed reworking so it looks for partial matches for all the genres i have provided (41 of them), and then the new code to map scarey to Horror etc (rather than have 41 for loops)

Here is the new code:
Code:
def evaluate(self, formatter, kwargs, mi, locals, val):
    list1 = val
    l2 = ['adult', 'adventure', 'anthologies', 'biography', 'childrens', 'classics', 'drugs', 'fantasy', 'food', 'football', 'health', 'history', 'historical', 'horror', 'humour', 'inspirational', 'modern', 'music', 'mystery', 'non-fiction', 'poetry', 'political', 'philosophy', 'psychological', 'reference', 'religion', 'romance', 'science', 'science fiction', 'self help', 'short stories', 'sociology', 'spirituality', 'suspense', 'thriller', 'travel', 'vampires', 'war', 'western', 'writing', 'young adult']
    l1 = [l.strip() for l in list1.split(',') if l.strip()]
    l1lcase = [icu_lower(l) for l in l1]
    res = set()
    for idx,item in enumerate(l1lcase):
        if item in l2:
            res.add(l1[idx])

    for item in l1lcase:
        for item2 in l2:
            if item2 in item:
                res.add(item2)
                break

    for item in l1lcase:
        if 'scary' in item or 'spooky' in item:
            res.add('Horror')
            break

    return ', '.join(res)
But this bit of that code adds tags in lowercase:
Code:
    
for item in l1lcase:
        for item2 in l2:
            if item2 in item:
                res.add(item2)
                break
I tried using
res.add(titlecase(item2))

but that thows an error

Maybe i need to keep the list2 in titlecase and lowercase it as i go, ill try to figure it out but if you can put me on the right path i would really appreciate it.

Thanks!

Last edited by smoothrolla; 11-10-2011 at 03:53 PM.
smoothrolla is offline   Reply With Quote
Old 11-10-2011, 04:03 PM   #4
smoothrolla
Member
smoothrolla began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Nov 2011
Device: kindle
Ok i got a solution, probably inelegant though

i create another list in titlecase of the tags i want to do a partial search for, as i decided i didnt want to search for them all (for example science is in science fiction so i got both tags which i didnt really want)

Code:
def evaluate(self, formatter, kwargs, mi, locals, val):
    list1 = val
    l2 = ['adult', 'adventure', 'anthologies', 'biography', 'childrens', 'classics', 'drugs', 'fantasy', 'food', 'football', 'health', 'history', 'historical', 'horror', 'humour', 'inspirational', 'modern', 'music', 'mystery', 'non-fiction', 'poetry', 'political', 'philosophy', 'psychological', 'reference', 'religion', 'romance', 'science', 'science fiction', 'self help', 'short stories', 'sociology', 'spirituality', 'suspense', 'thriller', 'travel', 'vampires', 'war', 'western', 'writing', 'young adult']
    l1 = [l.strip() for l in list1.split(',') if l.strip()]
    l1lcase = [icu_lower(l) for l in l1]
    res = set()
    for idx,item in enumerate(l1lcase):
        if item in l2:
            res.add(l1[idx])

    l3 = ['Adult', 'Adventure', 'Anthologies', 'Biography', 'Childrens', 'Classics', 'Drugs', 'Fantasy', 'Food', 'Football', 'Health', 'History', 'Historical', 'Horror', 'Humour', 'Inspirational', 'Modern', 'Music', 'Mystery', 'Non-Fiction', 'Poetry', 'Political', 'Philosophy', 'Psychological', 'Reference', 'Religion', 'Romance', 'Science fiction', 'Self Help', 'Short Stories', 'Sociology', 'Spirituality', 'Suspense', 'Thriller', 'Travel', 'Vampires', 'War', 'Western', 'Writing', 'Young Adult']

    for item in l1lcase:
        for item2 in l3:
            check = item2.lower()
            if check in item:
                res.add(item2)
                break

    for item in l1lcase:
        if 'scary' in item or 'spooky' in item:
            res.add('Horror')
            break

    return ', '.join(res)
need to do some more tests but its looking good

Thanks again for your help!
smoothrolla is offline   Reply With Quote
Old 11-10-2011, 04:04 PM   #5
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,691
Karma: 6240117
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by smoothrolla View Post
Thanks Charley!
chaley, not charley.
Quote:
But this bit of that code adds tags in lowercase:
Code:
for item in l1lcase:
        for item2 in l2:
            if item2 in item:
                res.add(item2)
                break
That is the point of the 'enumerate'. The arrays l1 and l1lcase are ordered and indexed the same, so
Code:
for idx,item in enumerate(l1lcase):
        for item2 in l2:
            if item2 in item:
                res.add(l1[idx])
                break
will add the cased version of item2 to the result.

The enumerate operator returns the index and the value (a tuple in python terms), which in this case is the index and the lowercase version of the value. Because l1lcase and l1 are parallel arrays, the l1[idx] gets the equivalent item for the one in l1lcase, which is the cased version.
chaley is offline   Reply With Quote
Advert
Old 11-10-2011, 04:18 PM   #6
smoothrolla
Member
smoothrolla began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Nov 2011
Device: kindle
Quote:
chaley, not charley.
sorry, I wasnt sure

Quote:
The enumerate operator returns the index and the value (a tuple in python terms), which in this case is the index and the lowercase version of the value. Because l1lcase and l1 are parallel arrays, the l1[idx] gets the equivalent item for the one in l1lcase, which is the cased version.
Great thanks, i wanted to code something like that but unsure how in this language, very frustrating.

I come up with a slightly different way of doing it, posted a minute before your reply so not sure if you saw it, its probably laughable mind you

thanks
smoothrolla is offline   Reply With Quote
Old 11-10-2011, 05:12 PM   #7
smoothrolla
Member
smoothrolla began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Nov 2011
Device: kindle
I thought i would post my final approach here incase someone else finds it usefull in the future

i need to add some more tags->genre mappings (like football->sports etc) but you get the idea

Code:
def evaluate(self, formatter, kwargs, mi, locals, val):
    # turn the tags into an array and create a lowercase version
    tagslist      = [l.strip() for l in val.split(',') if l.strip()]
    tagslistlcase = [icu_lower(l) for l in tagslist]

    # my list of genres i want, and create a lowercase version
    genrelist      = ['Adult', 'Adventure', 'Anthologies', 'Biography', 'Childrens', 'Classics', 'Drugs', 'Fantasy', 'Food', 'Football', 'Health', 'History', 'Historical', 'Horror', 'Humour', 'Inspirational', 'Modern', 'Music', 'Mystery', 'Non-Fiction', 'Poetry', 'Political', 'Philosophy', 'Psychological', 'Reference', 'Religion', 'Romance', 'Science', 'Science Fiction', 'Self Help', 'Short Stories', 'Sociology', 'Spirituality', 'Suspense', 'Thriller', 'Travel', 'Vampires', 'War', 'Western', 'Writing', 'Young Adult']
    genrelistlcase = [icu_lower(l) for l in genrelist]

    res = set()

    # loop through the genres
    for idx,genre in enumerate(genrelistlcase):
        # loop through the tags and see if the genre is contained in a tag
        for tag in tagslistlcase:
            if genre in tag:
                # dont add science if it was found in science fiction
                if genre != 'science' or (genre == 'science' and 'science fiction' not in tag):
                    # add to array
                    res.add(genrelist[idx])
                    break

    # final loop through the tags to look for specific tags i want to map to a genre
    for tag in tagslistlcase:
        if 'religious' in tag or 'christian' in tag:
            res.add('Religion')
        if 'children' in tag:
            res.add('Childrens')

    # join the array into a string and return
    return ', '.join(res)

Last edited by smoothrolla; 11-10-2011 at 06:13 PM.
smoothrolla is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
how to move tags data into a new custom column smoothrolla Library Management 6 05-30-2018 07:19 AM
Custom Columns - How are you using yours? nynaevelan Library Management 19 04-18-2011 12:42 AM
Can custom book data be displayed in a custom column? kiwidude Development 9 03-02-2011 05:35 AM
Techniques to use plugboards, custom columns and templates kovidgoyal Library Management 0 01-26-2011 04:21 PM
ADD Books & extract tags from title? johnb0647 Calibre 3 01-08-2011 05:36 PM


All times are GMT -4. The time now is 02:31 AM.


MobileRead.com is a privately owned, operated and funded community.