[Metadata Source Plugin] Goodreads - Page 34

davidfor · 05-11-2020, 01:07 AM

Quote:

Originally Posted by DictatorAlly

Ok, thank you. Has it been proposed before?

What are you suggesting using the API for? The plugin uses the API used on the search pages that is used for autocomplete. That doesn't need a key. Otherwise is is parsing the pages. Using the GR API might be simpler, but the search doesn't return all the books.

DictatorAlly · 05-15-2020, 07:08 AM

I was thinking it could speed up requests. It doesn't sound fit for purpose when it doesn't return all the books though.

I would like to suggest some different feature additions to this GR plugin.

I am currently working on them, but any help would be appreciated.

I would like to add 4 options to to the parse_tags function.

The first option would be to add all found GR genre's to the filter list, this would be a user option: ON/OFF. They would be added with a key and value equal to the GR genre.

Then I would like the user to choose between 3 methods when adding tags. The current and default method: Exclusive, plus two new methods: No Filter and Inclusive.

The current method will only add tag's if they are found in the user filter list. We keep this method the same and call this option: Method to add found Tags: Exclusive.

The next option No Filter would add all GR genre's found as tag's without passing them through the user filter. This option is the easiest to code, and currently work's in the code below.

I am stuck on the coding for this option. The third option, Inclusive, would work the same as Exclusive (passing all GR genre's through the filter) at first but then it will also include all GR genre's found that are not on the user filter list.

One problem not yet solved with these new feature's, how do we ignore a GR genre? Perhap's with some logic where we put the genre in the user filter with the value "IGNORED". Though this doesn't sound great, as it would break the Exclusive function, I think there would be a better way. Perhaps a new list for only ignored term's .

This is what I have so far, I'm having trouble iterating over a list I am removing item's from:

Spoiler:

davidfor · 05-15-2020, 09:12 AM

Quote:

Originally Posted by DictatorAlly

I was thinking it could speed up requests. It doesn't sound fit for purpose when it doesn't return all the books though.

The search API doesn't return Amazon exclusive books. And there could be an issue if everyone didn't use their own key.

Quote:

I would like to suggest some different feature additions to this GR plugin.

I am currently working on them, but any help would be appreciated.

I would like to add 4 options to to the parse_tags function.

The first option would be to add all found GR genre's to the filter list, this would be a user option: ON/OFF. They would be added with a key and value equal to the GR genre.

To the best of my knowledge, there isn't a GR genre list. My understanding is the the genres are actually just the shelves people have created. If you look at a book, it shows the number of people who have used the shelf. And that seems to be the same for the Genre page.

Quote:

Then I would like the user to choose between 3 methods when adding tags. The current and default method: Exclusive, plus two new methods: No Filter and Inclusive.

The current method will only add tag's if they are found in the user filter list. We keep this method the same and call this option: Method to add found Tags: Exclusive.

The next option No Filter would add all GR genre's found as tag's without passing them through the user filter. This option is the easiest to code, and currently work's in the code below.

I am stuck on the coding for this option. The third option, Inclusive, would work the same as Exclusive (passing all GR genre's through the filter) at first but then it will also include all GR genre's found that are not on the user filter list.

Sorry, I had to read that a few times to understand what you meant. Which is lucky as I completely disagreed the first time through.

You are overcomplicating it. I would add two options:

- Above the list I would add "Map tags". Selecting this would enable the list and the second option. This would default to on for backwards compatibility.
- Under the list, "Add unmapped tags". If selected (and "Map tags" is selected), anything that wasn't mapped in the list would be added as well.

Quote:

One problem not yet solved with these new feature's, how do we ignore a GR genre? Perhap's with some logic where we put the genre in the user filter with the value "IGNORED". Though this doesn't sound great, as it would break the Exclusive function, I think there would be a better way. Perhaps a new list for only ignored term's .

This is already handled. If the mapping has no tags, then the genre is dropped. But, I'll admit, I hadn't noticed this before. It was only looking at the code that I realised this.

Quote:

This is what I have so far, I'm having trouble iterating over a list I am removing item's from:

Spoiler:

The following should work with my suggestions:

Code:

    def parse_tags(self, root):
        # Goodreads does not have "tags", but it does have Genres (wrapper around popular shelves)
        # We will use those as tags (with a bit of massaging)
        genres_node = root.xpath('//div[@class="stacked"]/div/div/div[contains(@class, "bigBoxContent")]/div/div[@class="left"]')
        #self.log.info("Parsing tags")
        if genres_node:
            #self.log.info("Found genres_node")
            genre_tags = list()
            if cfg.plugin_prefs[cfg.STORE_NAME][cfg.KEY_MAP_GENRES]:
            for genre_node in genres_node:
                sub_genre_nodes = genre_node.xpath('a')
                genre_tags_list = [sgn.text_content().strip() for sgn in sub_genre_nodes]
                #self.log.info("Found genres_tags list:", genre_tags_list)
                if genre_tags_list:
                    genre_tags.append(' > '.join(genre_tags_list))
            if cfg.plugin_prefs[cfg.STORE_NAME][cfg.KEY_MAP_GENRES]:
                calibre_tags = self._convert_genres_to_calibre_tags(genre_tags)
            else:
                calibre_tags = genre_tags
            if len(calibre_tags) > 0:
                return calibre_tags

    def _convert_genres_to_calibre_tags(self, genre_tags):
        # for each tag, add if we have a dictionary lookup
        calibre_tag_lookup = cfg.plugin_prefs[cfg.STORE_NAME][cfg.KEY_GENRE_MAPPINGS]
        add_unmapped_tags = cfg.plugin_prefs[cfg.STORE_NAME][cfg.KEY_ADD_UNMAPPED_GENRES]
        calibre_tag_map = dict((k.lower(),v) for (k,v) in calibre_tag_lookup.items())
        tags_to_add = set()
        for genre_tag in genre_tags:
            if genre_tag.lower() in calibre_tag_map:
                tags = calibre_tag_map.get(genre_tag.lower(), None)
            elif add_unmapped_tags:
                tags = genre_tag # Need to handle tags with > in them.
            if tags:
                tags_to_add.union(tags)
        return list(tags_to_add)

That should work, but. I haven't tested that. There is an issue with genre's that have multiple levels. They could be split, converted to dots for heiarchical tags, or just left like that. I have options for this in the Kobo metadata source plugin, but, the configuration is starting to get complicated if these are added. I'd probably just let them through as they are.

Honestly, I'm not sure if all this is needed. And if I as doing it now, I might just let them all through and rely on the Tag Mapper. This didn't exist at the time the plugin was written. It has the advantage of working all tags from all sources, not just one. The disadvantage of this is that it has to be run separately.

DictatorAlly · 05-15-2020, 11:53 AM

Thank you for taking the time to understand my suggestion. I've been thinking about it for awhile now.

Quote:

To the best of my knowledge, there isn't a GR genre list. My understanding is the the genres are actually just the shelves people have created. If you look at a book, it shows the number of people who have used the shelf. And that seems to be the same for the Genre page.

I think there is a Genre GR list, it is the result of GR's processing of their shelves. In comparison to the shelves which display everyone's chosen tags - rubbish or not, GR's Genre's are their processing of that list, kinda resulting in only high quality genre's/shelves/tags remaining.

We could process the list ourselves but I think GR's Genre processing is already good enough, barring some exceptions like not always displaying Fiction/Non-Fiction, and including terms like Audiobook/eBooks. Combining the plugins GR and GR more tags would give you access to the first page of shelves which you could process yourself with tag mapper.

I didn't realise the tag mapper exists. I think your solution of allowing all tags through and using the tag mapper instead is the best solution.

Quote:

There is an issue with genre's that have multiple levels. They could be split, converted to dots for heiarchical tags, or just left like that. I have options for this in the Kobo metadata source plugin, but, the configuration is starting to get complicated if these are added. I'd probably just let them through as they are.

I see the tag mapper would be able to split them, I'm not sure it could convert to dots though. That is something I would be interested in if I was using a program that could take advantage of it. Otherwise I want to keep the way it is(or with dots - whatever looks better), and create another tag for the top levels, so I can use it like a hierarchy for system's that don't support hierarchies.

The Kobo metadata source plugin is also new to me. If I re-evalute whether GR's provides a genre list in comparison to Kobo, I would agree with you that GR doesn't.

My goal is for to use one source for metadata tags and have them be a representation of hierarchical categories similar to Kobo/probably Kobo if I can get it working, or do you have any suggestions?

I thought by removing the GR Plugins filter the genre's provided by GR would be good enough for this, and I wouldn't need to find out what they are and attempt to create a filter list for them like someone did in this thread years ago. But the more I look at how GR processes their GR list the more I don't think it is adequate. I want a genre list that is more refined.

I hope that relying on only one metadata source that already provides such a list would be 80% of the way there.

The alternative would be to use the tag mapper to output your own hierarchical list, which judging by GR attempt is quite difficult.

I've installed Kobo and I like that it first searches for only an ISBN match and will only searches by title if an ISBN isn't present in the calibre metadata. This completely avoids getting mismatches for known ISBN's - if only the plugin Extract ISBN's worked well - I think I'll take a look at that. The Amazon/Goodreads/Google plugins, frustratingly, will overwrite a known ISBN. Without having a user specified flag the other behaviour is better. I think each plugin should have an option to disable overwriting the ISBN though, that way it can search by ISBN first, not find a match, and then follow up by a Title/Author search and still find relevant information. I guess that would be another useful flag - 1) Match ISBN only 2) Match ISBN or Title/Author.

Quote:

Honestly, I'm not sure if all this is needed. And if I as doing it now, I might just let them all through and rely on the Tag Mapper. This didn't exist at the time the plugin was written. It has the advantage of working all tags from all sources, not just one. The disadvantage of this is that it has to be run separately.

I agree, this is the best solution, and your changes in parse_tags should be enough to achieve. Perhaps you could modify the code to not generate the default genreMappings key on new installs. This way you're not impacting user's who rely on the GR's plugin mapping configuration. Also, perhaps an explanation in the GR's plugin configuration page that it would be better to use the tag mapper/empty this filter to allow all GR genre's through - since it is not immediately obvious.

I'm trying to use the Kobo plugin now and it is failing. Also, it appears to take 30 seconds to timeout though, which is far to long for bulk searching. I just freshly installed the plugin, is it working for you?

This is the error:

Spoiler:

DictatorAlly · 05-15-2020, 12:08 PM

Sorry I misread your code earlier, I thought it said:

Code:

    def parse_tags(self, root):
        # Goodreads does not have "tags", but it does have Genres (wrapper around popular shelves)
        # We will use those as tags (with a bit of massaging)
        genres_node = root.xpath('//div[@class="stacked"]/div/div/div[contains(@class, "bigBoxContent")]/div/div[@class="left"]')
        #self.log.info("Parsing tags")
        if genres_node:
            #self.log.info("Found genres_node")
            genre_tags = list()
            for genre_node in genres_node:
                sub_genre_nodes = genre_node.xpath('a')
                genre_tags_list = [sgn.text_content().strip() for sgn in sub_genre_nodes]
                #self.log.info("Found genres_tags list:", genre_tags_list)
                if genre_tags_list:
                    genre_tags.append(' > '.join(genre_tags_list))
            if cfg.plugin_prefs[cfg.STORE_NAME][cfg.KEY_GENRE_MAPPINGS]:
                calibre_tags = self._convert_genres_to_calibre_tags(genre_tags)
            else:
                calibre_tags = genre_tags
            if len(calibre_tags) > 0:
                return calibre_tags

Which I have just tested and it works well for removing/or not removing the GR's plugin filter while being backwards compatible.

DictatorAlly · 05-15-2020, 01:58 PM

I realise I shouldn't have posted about the Kobo error in this thread. I've moved the post to the thread [Metadata Source Plugin] Kobo Books.

https://www.mobileread.com/forums/sh...30#post3988630

Terisa de morgan · 05-15-2020, 04:25 PM

Quote:

Originally Posted by DictatorAlly

I think there is a Genre GR list, it is the result of GR's processing of their shelves. In comparison to the shelves which display everyone's chosen tags - rubbish or not, GR's Genre's are their processing of that list, kinda resulting in only high quality genre's/shelves/tags remaining.

Not high quality, it's related to the number of times a book has been shelved like that, and no "hand picking for rightness" is done.

DictatorAlly · 05-15-2020, 04:44 PM

Quote:

Originally Posted by Terisa de morgan

Not high quality, it's related to the number of times a book has been shelved like that, and no "hand picking for rightness" is done.

While being high quality is subjective. The rest of your statement is false.

Compare any book listed Genre's and Top Shelves and the difference is obvious. They're hand picking the Top Shelves that can be displayed as Genre's and how they will be displayed.

Here's a link for you:

https://www.goodreads.com/book/show/...edemption-prep

https://www.goodreads.com/work/shelves/68506618

You'll see the top shelf to-read is not included as a Genre. I have also seen examples of word's you'd expect to be a Genre being ignored.

Here is another example:

https://www.goodreads.com/work/shelves/66802198

Note how they map the Top Shelf canadian to the Genre Cultural > Canada.

These are clear examples of how GR processes their Top Shelves to generate Genre's, whatever nonsense is in Top Shelves doesn't make it into GR Genre's unless it is hand picked via their process.

Rellwood · 06-19-2020, 02:51 PM

Does the plugin only show the tags that are configured, or does it use all that are available?

I configured a lot of tags, but I haven't seen any other than those being offered. Meaning, do I have to pre-configure a tag for it to be available?

davidfor · 06-19-2020, 11:52 PM

Quote:

Originally Posted by Rellwood

Does the plugin only show the tags that are configured, or does it use all that are available?

I configured a lot of tags, but I haven't seen any other than those being offered. Meaning, do I have to pre-configure a tag for it to be available?

The plugin drops any GR tags that aren't in the mapping. That is part of the discussion that too place last month. And I haven't had a chance to do anything about.

davidfor · 09-20-2020, 09:21 AM

The plugin has been updated to version 1.4.0. This is mainly for support of the upcoming calibre version 5. The changes are:

Update: Changes for Python 3 support in calibre.
Fix: Small error in handling editions.

Calibre will announce the availability of the update. If there are any issues, please report them here.

Marlobo · 09-21-2020, 08:30 PM

With the update to 1.50 it has stopped downloading the series.

davidfor · 09-21-2020, 11:10 PM

Quote:

Originally Posted by Marlobo

With the update to 1.50 it has stopped downloading the series.

I have no idea how I missed that. But, there were no changes in that code or the page, so I don't see why it is not working. I'll look at it as soon as I can.

Marlobo · 09-26-2020, 09:52 PM

Quote:

Originally Posted by davidfor

I have no idea how I missed that. But, there were no changes in that code or the page, so I don't see why it is not working. I'll look at it as soon as I can.

Many thanks for the quick fix.

jindroush · 09-27-2020, 09:39 AM

Guys, I'm a little bit lost. When I look up metadata for czech edition of Ray Bradbury's Pampeliškové víno (isbn:9780671037703), I can see that lots of data was downloaded, from log it's clear that comments, series, publisher were correctly downloaded and parsed, but after the "OK", it seems that only "ids" column is filled.
Anybody else has this weird problem?
(Running Calibre 5.0.1, x64 on Windows 10 x64bit, english).

05-15-2020, 07:08 AM	#497
DictatorAlly Enthusiast Posts: 26 Karma: 10 Join Date: May 2020 Device: None	I was thinking it could speed up requests. It doesn't sound fit for purpose when it doesn't return all the books though. I would like to suggest some different feature additions to this GR plugin. I am currently working on them, but any help would be appreciated. I would like to add 4 options to to the parse_tags function. The first option would be to add all found GR genre's to the filter list, this would be a user option: ON/OFF. They would be added with a key and value equal to the GR genre. Then I would like the user to choose between 3 methods when adding tags. The current and default method: Exclusive, plus two new methods: No Filter and Inclusive. The current method will only add tag's if they are found in the user filter list. We keep this method the same and call this option: Method to add found Tags: Exclusive. The next option No Filter would add all GR genre's found as tag's without passing them through the user filter. This option is the easiest to code, and currently work's in the code below. I am stuck on the coding for this option. The third option, Inclusive, would work the same as Exclusive (passing all GR genre's through the filter) at first but then it will also include all GR genre's found that are not on the user filter list. One problem not yet solved with these new feature's, how do we ignore a GR genre? Perhap's with some logic where we put the genre in the user filter with the value "IGNORED". Though this doesn't sound great, as it would break the Exclusive function, I think there would be a better way. Perhaps a new list for only ignored term's . This is what I have so far, I'm having trouble iterating over a list I am removing item's from: Spoiler: Code: def parse_tags(self, root): # Goodreads does not have "tags", but it does have Genres (wrapper around popular shelves) # We will use those as tags (with a bit of massaging) genres_node = root.xpath('//div[@class="stacked"]/div/div/div[contains(@class, "bigBoxContent")]/div/div[@class="left"]') #self.log.info("Parsing tags") if genres_node: #self.log.info("Found genres_node") genre_tags = list() for genre_node in genres_node: sub_genre_nodes = genre_node.xpath('a') genre_tags_list = [sgn.text_content().strip() for sgn in sub_genre_nodes] #self.log.info("Found genres_tags list:", genre_tags_list) if genre_tags_list: genre_tags.append(' > '.join(genre_tags_list)) no_filter_tags = genre_tags exclusive_tags = self._convert_genres_to_calibre_tags(genre_tags) inclusive_tags = exclusive_tags + self._remove_filtered_genre_tags(genre_tags) calibre_tags = inclusive_tags if len(calibre_tags) > 0: return calibre_tags def _remove_filtered_genre_tags(self, genre_tags): # for each tag, remove if we have a dictionary lookup calibre_tag_lookup = cfg.plugin_prefs[cfg.STORE_NAME][cfg.KEY_GENRE_MAPPINGS] calibre_tag_map = dict((k.lower(), k) for k in calibre_tag_lookup.keys()) self.log.info("Keys - Found calibre_tag_map keys result:", calibre_tag_map) for genre_tag in reversed(genre_tags): tags = calibre_tag_map.get(genre_tag.lower(), None) self.log.info("Keys - tags result::", tags) if tags: for tag in tags: if tag in tags_to_remove: genre_tags.remove(tag) self.log.info("Keys - Removing from genre_tags keys result:", tag) return list(tags_to_remove)

06-19-2020, 02:51 PM	#504
Rellwood Library Breeder (She/Her) Posts: 1,160 Karma: 1900479 Join Date: Apr 2015 Location: Fullerton, California Device: Kobo Aura HD (1) PW3 (4) PW3 2019 new edition (1)	Question about tags Does the plugin only show the tags that are configured, or does it use all that are available? I configured a lot of tags, but I haven't seen any other than those being offered. Meaning, do I have to pre-configure a tag for it to be available?

09-20-2020, 09:21 AM	#506
davidfor Grand Sorcerer Posts: 24,907 Karma: 47303748 Join Date: Jul 2011 Location: Sydney, Australia Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos	Update to version 1.5.0 The plugin has been updated to version 1.4.0. This is mainly for support of the upcoming calibre version 5. The changes are: Update: Changes for Python 3 support in calibre. Fix: Small error in handling editions. Calibre will announce the availability of the update. If there are any issues, please report them here. Last edited by davidfor; 09-20-2020 at 09:25 AM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
[GUI Plugin] Goodreads Sync	kiwidude	Plugins	1722	04-25-2024 03:09 PM
[Metadata Download Plugin] Goodreads Metadata Deprecated	kiwidude	Plugins	30	04-23-2011 02:10 PM
[Covers Plugin] Goodreads Covers Deprecated	kiwidude	Plugins	13	04-17-2011 05:09 PM
metadata plugin	redneck_momma	Plugins	1	05-21-2010 08:41 PM

05-15-2020, 01:58 PM	#501
DictatorAlly Enthusiast Posts: 26 Karma: 10 Join Date: May 2020 Device: None	I realise I shouldn't have posted about the Kobo error in this thread. I've moved the post to the thread [Metadata Source Plugin] Kobo Books. https://www.mobileread.com/forums/sh...30#post3988630

09-21-2020, 08:30 PM	#507
Marlobo Junior Member Posts: 4 Karma: 10 Join Date: Jan 2018 Device: Kobo Forma, Kindle Oasis 2	With the update to 1.50 it has stopped downloading the series.

09-27-2020, 09:39 AM	#510
jindroush Connoisseur Posts: 78 Karma: 52 Join Date: Nov 2014 Device: Kindle	Guys, I'm a little bit lost. When I look up metadata for czech edition of Ray Bradbury's Pampeliškové víno (isbn:9780671037703), I can see that lots of data was downloaded, from log it's clear that comments, series, publisher were correctly downloaded and parsed, but after the "OK", it seems that only "ids" column is filled. Anybody else has this weird problem? (Running Calibre 5.0.1, x64 on Windows 10 x64bit, english).

Advert

Advert