MobileRead Forums - View Single Post - Best strategy for metadata management for Kobo using Calibre?

ceridwen · 05-13-2019, 02:51 AM

For the curious, everything I needed for my particular use-case (I'm happy running a command-line script periodically to update my tags, I don't need a GUI) I found in the database API. I call:

Code:

db.all_book_ids()

Fetch the metadata for each book with:

Code:

db.get_metadata()

Do all the tag processing, build my set of collections, and pass the result to:

Code:

db.set_field('#collections', book_id_to_val_map)

For filtering down tags to a subset small enough for collections, I excluded all tags that were found on only one work or on one author's works, normalized tags by singularizing the last word and normalizing upper/lower case, keeping only fandom and freeform tags from AO3, and then only tags with 5 or more occurrences in my database.

At this point, I think I have more to gain by improving the metadata I'm using as input than refining the collections processing. In particular, most fanfiction sites have additional contextual information about what tag represents like the Category, Relationship, Fandom, and Freeform classifications for AO3. For AO3 I wrote my own scraper because I also needed it to handle canonicalization which FanFicFare definitely doesn't do. I took a look at FanFicFare's capabilities for other sites. Internally, it calls a method self.story.setMetadata() that takes two arguments, a type of metadata and its actual value. However, even after I looked through the config settings, it's not immediately clear to me how to use that. What I really want is a way to just dump key-value pairs into a custom column in Calibre or a file, with FanFicFare's command tool. FanFicFare also doesn't do complete metadata processing for some sites, so I'll either have to extend it or write my own scraper, again.

05-13-2019, 02:51 AM	#17
ceridwen Enthusiast Posts: 36 Karma: 10 Join Date: Feb 2017 Device: Kobo Aura H2O	For the curious, everything I needed for my particular use-case (I'm happy running a command-line script periodically to update my tags, I don't need a GUI) I found in the database API. I call: Code: db.all_book_ids() Fetch the metadata for each book with: Code: db.get_metadata() Do all the tag processing, build my set of collections, and pass the result to: Code: db.set_field('#collections', book_id_to_val_map) For filtering down tags to a subset small enough for collections, I excluded all tags that were found on only one work or on one author's works, normalized tags by singularizing the last word and normalizing upper/lower case, keeping only fandom and freeform tags from AO3, and then only tags with 5 or more occurrences in my database. At this point, I think I have more to gain by improving the metadata I'm using as input than refining the collections processing. In particular, most fanfiction sites have additional contextual information about what tag represents like the Category, Relationship, Fandom, and Freeform classifications for AO3. For AO3 I wrote my own scraper because I also needed it to handle canonicalization which FanFicFare definitely doesn't do. I took a look at FanFicFare's capabilities for other sites. Internally, it calls a method self.story.setMetadata() that takes two arguments, a type of metadata and its actual value. However, even after I looked through the config settings, it's not immediately clear to me how to use that. What I really want is a way to just dump key-value pairs into a custom column in Calibre or a file, with FanFicFare's command tool. FanFicFare also doesn't do complete metadata processing for some sites, so I'll either have to extend it or write my own scraper, again.