Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 11-04-2024, 11:38 PM   #1
scruffynerf
Junior Member
scruffynerf began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Nov 2024
Device: multiple devices
LLM created tags

I've modded the Goodreads plugin (as a good base of code, thank you KiwiDude) to (after getting description and tags and title) ask Ollama [using OpenAI api, so it doesn't HAVE to be a local LLM, if you want to use some other service/api free or paid] given a list of 'tags' and the title and description, and getting back results. Getting good results... if you give the list of tags, and the description (or even just the title in some cases), a basic LLM (I'm using Mistral, but you could use something smaller/faster I bet) can return tags it thinks match. Python does the rest (as usual)

I'd love to not spend the time asking Goodreads, but not sure there is a better code base to use, that can access the existing tags, title and comments (aka Description) So I'm posting to ask if someone has a good idea for a better code base. Would need to GET those 3, and then update tags.

I'll be sharing the code once I get the settings more generalized (and adjustable via the plugin settings. If I keep the 'goodreads' code base, I'll mod it to be labeled goodreads+llm or something.
scruffynerf is offline   Reply With Quote
Old 11-05-2024, 12:32 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,142
Karma: 27110892
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
If this a large language model you could just ask it to generate tags directly from book title/comments or even book text. Dont bother going through goodreads.
kovidgoyal is offline   Reply With Quote
Advert
Old 11-05-2024, 01:11 AM   #3
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,718
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
I would second the "dont bother with goodreads involvement" when it comes to tags/genres.

I'm also confused as to what your LLM is actually doing - you give it tags and it gives tags back? That seems a bit of a strange thing to try to be doing - surely that is just garbage in, garbage out? Kovid's suggestions seem more like what I too would think an LLM would be used for - a case of "tell me something I don't know", not "something I know already".

Glad you found some plugin code useful as a starting point though
kiwidude is offline   Reply With Quote
Old 11-05-2024, 08:23 AM   #4
rantanplan
Weirdo
rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.
 
Posts: 811
Karma: 11003000
Join Date: Nov 2019
Location: Wuppertal, Germany
Device: Tolino Shine Color, Tolino Vision 6, Kobo Clara 2E, Boox Note Air 2+
LLM created tags

If you ask ChatGPT for a list of the best songs from the 70s you can be sure that about a quarter of the songs is from a different time period. Why do you think it can generate tags with better accuracy?
rantanplan is offline   Reply With Quote
Old 11-05-2024, 10:45 AM   #5
scruffynerf
Junior Member
scruffynerf began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Nov 2024
Device: multiple devices
Quote:
Originally Posted by kovidgoyal View Post
If this a large language model you could just ask it to generate tags directly from book title/comments or even book text. Dont bother going through goodreads.
I'd like to, but I didn't see a good plugin doing anything like that to use as a model... ideas?
scruffynerf is offline   Reply With Quote
Advert
Old 11-05-2024, 11:00 AM   #6
scruffynerf
Junior Member
scruffynerf began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Nov 2024
Device: multiple devices
Quote:
Originally Posted by kiwidude View Post
I would second the "dont bother with goodreads involvement" when it comes to tags/genres.
answered above. I used something I felt was clean code (kudos again), since it was in the midst of doing the 'right things'. I didn't have to figure out how to get the data, then process the data, then save the data, I could just 'slide in' and add my desired tags, and use the results from goodreads as my base info

Quote:
I'm also confused as to what your LLM is actually doing - you give it tags and it gives tags back? That seems a bit of a strange thing to try to be doing - surely that is just garbage in, garbage out? Kovid's suggestions seem more like what I too would think an LLM would be used for - a case of "tell me something I don't know", not "something I know already".
No, to clarify (it was late when I wrote this)...

currently (and I am refining improving this based on results from my use of the current):
I prompt the LLM with

"Classify the following book with one or more of the following tags:
[desired 'broad' tag list here, mostly major genres from GR, but modified as desired by me, to sort appropriately].

Title: [title]
Description: [Description if any from GR]"

I then take the results, which might be more of less formatted but depending on the LLM to be consistent is unreliable, and while I could force it to a json result, it's just easier:

python: given [tag list], parse results looking for any/all tags returned.
Use that set plus the GR tags, and return the lot of them as result.

Next gen (and this will be settings-able), the prompt will be adjustable, along with url/etc.

I expect my refined prompt will be something along the lines of:

"Given the following book information, respond with the following answers:
1) Fiction or Non-Fiction
2) Reader Audience: Childrens, Young Adult, Normal Adult, Reference, Textbook
3) one or more of the following genres:
Science Fiction
Fantasy
Romance
Paranormal
Western
Mystery
Thriller
... etc...
"

Not one of the existing metadata sources (and as I said GR is perhaps the best to date, Amazon and others are poor seconds at best) does this well enough for my purposes.

LLM have lots of limits, but given Text Info, and asking it to classify into various buckets? That's usually a good fit. Not asking it to read the book, just scan the 'back cover' and sort. Basically robotic bookshelf clerking.
scruffynerf is offline   Reply With Quote
Old 11-05-2024, 11:06 AM   #7
scruffynerf
Junior Member
scruffynerf began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Nov 2024
Device: multiple devices
Quote:
Originally Posted by rantanplan View Post
If you ask ChatGPT for a list of the best songs from the 70s you can be sure that about a quarter of the songs is from a different time period. Why do you think it can generate tags with better accuracy?
I don't, I'm not asking it for concrete info, or to generate tags. I'm saying "From this list, which makes the most sense, based on description"

That's well within the realm of LLMs to do well without hallucination. I do a LOT of LLM related stuff, this is my bailiwick. I'm very aware of why ChatGPT gets such song info wrong (reason is more complex and off topic to spend much time on here, but short answer: LLMs are not good solid references, they are at best slow learners and mostly they are clever parrots), it's also true about book info. I'm not asking it that sort of question. I'm also not giving it book text and asking it to decide, but solely the Description which should be enough to gather the gist of what genre it is.
scruffynerf is offline   Reply With Quote
Old 11-05-2024, 11:30 AM   #8
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,142
Karma: 27110892
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by scruffynerf View Post
I'd like to, but I didn't see a good plugin doing anything like that to use as a model... ideas?
Dont write a metadata fetching plugin since your plugin is basically a kind of tag correcting/expanding/filtering plugin write it as a UI plugin. Iterate over the selected books get their metadata from the database, feed it to llm, and update the tags in the database from the result. There are many plugins that work with database data for example, clean metadata/clean comments.
kovidgoyal is offline   Reply With Quote
Old 11-05-2024, 11:39 AM   #9
scruffynerf
Junior Member
scruffynerf began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Nov 2024
Device: multiple devices
Quote:
Originally Posted by kovidgoyal View Post
Dont write a metadata fetching plugin since your plugin is basically a kind of tag correcting/expanding/filtering plugin write it as a UI plugin. Iterate over the selected books get their metadata from the database, feed it to llm, and update the tags in the database from the result. There are many plugins that work with database data for example, clean metadata/clean comments.
Yeah, clean plugin was my second choice, but it really didn't include all of the pieces I needed. As I said, I was looking for minimal 'plugin knowledge' coding, as I really didn't want/need to figure out all of the pieces from scratch. The GR plugin, I basically just caught the one python function parsing tags, and end up returning the 'improved results'

Yes, a proper LLM plugin would be better, not questioning that, but I don't see enough of the needed pieces in the clean comments (in other words, if I want to adjust tags, there is zero updating tags code in there... I'd have to write all of that.) I didn't see any tag-related plugins that worked for my purposes. ("English Noun Frequency" perhaps being the only one, and too old for my taste)

added: PLUS the advantage of a 'metadata' plugin is it specifically adds 'review', batching, and so on. UI plugins lack all of that.

Last edited by scruffynerf; 11-05-2024 at 11:40 AM. Reason: added why metadata is better than UI for plugin.
scruffynerf is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Can I print a list of tags and custom columns created in Calibre? KimbreLee Calibre 7 03-15-2019 04:07 PM
Sony PRS-T1 - Prevent collections from being created from the tags of news items IorekB Devices 1 01-10-2012 10:14 PM
Amazon Tags - Popular tags vs Unique tags. chrisanthropic Writers' Corner 6 09-19-2011 11:18 PM
Patch: Calibre adds tags to identify ebook formats created by calibre. siebert Calibre 1 07-18-2011 02:07 PM
Suggestion: User created sub-folders for Tags, Authors, Etc. Daemon Calibre 0 08-23-2010 12:47 AM


All times are GMT -4. The time now is 10:24 AM.


MobileRead.com is a privately owned, operated and funded community.