![]() |
#1 |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Nov 2024
Device: multiple devices
|
LLM created tags
I've modded the Goodreads plugin (as a good base of code, thank you KiwiDude) to (after getting description and tags and title) ask Ollama [using OpenAI api, so it doesn't HAVE to be a local LLM, if you want to use some other service/api free or paid] given a list of 'tags' and the title and description, and getting back results. Getting good results... if you give the list of tags, and the description (or even just the title in some cases), a basic LLM (I'm using Mistral, but you could use something smaller/faster I bet) can return tags it thinks match. Python does the rest (as usual)
I'd love to not spend the time asking Goodreads, but not sure there is a better code base to use, that can access the existing tags, title and comments (aka Description) So I'm posting to ask if someone has a good idea for a better code base. Would need to GET those 3, and then update tags. I'll be sharing the code once I get the settings more generalized (and adjustable via the plugin settings. If I keep the 'goodreads' code base, I'll mod it to be labeled goodreads+llm or something. |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,142
Karma: 27110892
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
If this a large language model you could just ask it to generate tags directly from book title/comments or even book text. Dont bother going through goodreads.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Calibre Plugins Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,718
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
I would second the "dont bother with goodreads involvement" when it comes to tags/genres.
I'm also confused as to what your LLM is actually doing - you give it tags and it gives tags back? That seems a bit of a strange thing to try to be doing - surely that is just garbage in, garbage out? Kovid's suggestions seem more like what I too would think an LLM would be used for - a case of "tell me something I don't know", not "something I know already". Glad you found some plugin code useful as a starting point though ![]() |
![]() |
![]() |
![]() |
#4 |
Weirdo
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 811
Karma: 11003000
Join Date: Nov 2019
Location: Wuppertal, Germany
Device: Tolino Shine Color, Tolino Vision 6, Kobo Clara 2E, Boox Note Air 2+
|
LLM created tags
If you ask ChatGPT for a list of the best songs from the 70s you can be sure that about a quarter of the songs is from a different time period. Why do you think it can generate tags with better accuracy?
|
![]() |
![]() |
![]() |
#5 |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Nov 2024
Device: multiple devices
|
|
![]() |
![]() |
Advert | |
|
![]() |
#6 | ||
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Nov 2024
Device: multiple devices
|
Quote:
Quote:
currently (and I am refining improving this based on results from my use of the current): I prompt the LLM with "Classify the following book with one or more of the following tags: [desired 'broad' tag list here, mostly major genres from GR, but modified as desired by me, to sort appropriately]. Title: [title] Description: [Description if any from GR]" I then take the results, which might be more of less formatted but depending on the LLM to be consistent is unreliable, and while I could force it to a json result, it's just easier: python: given [tag list], parse results looking for any/all tags returned. Use that set plus the GR tags, and return the lot of them as result. Next gen (and this will be settings-able), the prompt will be adjustable, along with url/etc. I expect my refined prompt will be something along the lines of: "Given the following book information, respond with the following answers: 1) Fiction or Non-Fiction 2) Reader Audience: Childrens, Young Adult, Normal Adult, Reference, Textbook 3) one or more of the following genres: Science Fiction Fantasy Romance Paranormal Western Mystery Thriller ... etc... " Not one of the existing metadata sources (and as I said GR is perhaps the best to date, Amazon and others are poor seconds at best) does this well enough for my purposes. LLM have lots of limits, but given Text Info, and asking it to classify into various buckets? That's usually a good fit. Not asking it to read the book, just scan the 'back cover' and sort. Basically robotic bookshelf clerking. |
||
![]() |
![]() |
![]() |
#7 | |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Nov 2024
Device: multiple devices
|
Quote:
That's well within the realm of LLMs to do well without hallucination. I do a LOT of LLM related stuff, this is my bailiwick. I'm very aware of why ChatGPT gets such song info wrong (reason is more complex and off topic to spend much time on here, but short answer: LLMs are not good solid references, they are at best slow learners and mostly they are clever parrots), it's also true about book info. I'm not asking it that sort of question. I'm also not giving it book text and asking it to decide, but solely the Description which should be enough to gather the gist of what genre it is. |
|
![]() |
![]() |
![]() |
#8 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,142
Karma: 27110892
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Dont write a metadata fetching plugin since your plugin is basically a kind of tag correcting/expanding/filtering plugin write it as a UI plugin. Iterate over the selected books get their metadata from the database, feed it to llm, and update the tags in the database from the result. There are many plugins that work with database data for example, clean metadata/clean comments.
|
![]() |
![]() |
![]() |
#9 | |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Nov 2024
Device: multiple devices
|
Quote:
Yes, a proper LLM plugin would be better, not questioning that, but I don't see enough of the needed pieces in the clean comments (in other words, if I want to adjust tags, there is zero updating tags code in there... I'd have to write all of that.) I didn't see any tag-related plugins that worked for my purposes. ("English Noun Frequency" perhaps being the only one, and too old for my taste) added: PLUS the advantage of a 'metadata' plugin is it specifically adds 'review', batching, and so on. UI plugins lack all of that. Last edited by scruffynerf; 11-05-2024 at 11:40 AM. Reason: added why metadata is better than UI for plugin. |
|
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Can I print a list of tags and custom columns created in Calibre? | KimbreLee | Calibre | 7 | 03-15-2019 04:07 PM |
Sony PRS-T1 - Prevent collections from being created from the tags of news items | IorekB | Devices | 1 | 01-10-2012 10:14 PM |
Amazon Tags - Popular tags vs Unique tags. | chrisanthropic | Writers' Corner | 6 | 09-19-2011 11:18 PM |
Patch: Calibre adds tags to identify ebook formats created by calibre. | siebert | Calibre | 1 | 07-18-2011 02:07 PM |
Suggestion: User created sub-folders for Tags, Authors, Etc. | Daemon | Calibre | 0 | 08-23-2010 12:47 AM |