View Single Post
Old 03-10-2024, 03:31 AM   #1
Firehose
Junior Member
Firehose has a complete set of Star Wars action figures.Firehose has a complete set of Star Wars action figures.Firehose has a complete set of Star Wars action figures.
 
Posts: 4
Karma: 264
Join Date: Mar 2024
Device: Kobo Libra 2
Automatically create clean tags AND genres for your library

## Notes

The goal of this post is to provide arguably the best method of organizing your Calibre library without doing any manual classification. It will assign each book to a set of tags, and also assign each book to a single hierarchical genre (how physical book stores classify books).

**IMPORTANT: This method will modify your Library irreversibly. You're strongly encouraged to backup your library before proceeding.**

Tags & Genres

This approach should result in your books having relevant tags, and having genres that a decent book store would place them in. Isaacson's "Benjamin Franklin" would get automatically get placed into Nonfiction.Biography, Frank Herbert's "Dune" would get placed into Fiction.Science Fiction, and so on.

### Background

There are two general methods of categorizing information: tags and a genre hierarchical structures. They both have benefits and drawbacks. Tags provide a comprehensive way of assigning all relevant topics to a book. But they can often get disorganized, duplicated and unwieldy. It's not uncommon for people with large libraries to have thousands of inconsistently named tags (sciencefiction, Science Fiction, science-fiction) with many duplicates. Genres on the other hand aim to find the most relevant classification. While the book "Dune" could be accurately tagged "Adventure, Fantasy, Science Fiction, Novels, Space Opera", I think we'd all agree if you had to place it in a single aisle in a bookstore it would be best placed in the "Science Fiction" aisle.

This method gives the best of both worlds, multiple tags and a single canonical hierarchical genre.

### Tagging

On to the good stuff. First, we'll be using Goodreads as the sole source of tags for books. Goodreads provides the best tagging classification I've yet seen. There is less noise than the publisher provided tags, and tags seem to have been de-duplicated to provide a good starting point for classification. To get started with Goodreads, make sure you have the Goodreads plugin by Grant Drake.

Click image for larger version

Name:	Screenshot 2024-03-09 at 11.45.34 PM.png
Views:	890
Size:	202.4 KB
ID:	206814

**Warning: This method requires you to wipe all your existing tags and rely on Goodreads provided tags.** First we'll be removing all existing tags on your books.

#### Removing tags

To remove all your existing tags, select all the books you'd like to update, right click context menu -> Edit metadata -> Edit metadata in bulk. Then go to the Basic metadata tab, and click the "Clear all" checkbox to the right of the "Remove tags" input field. Then finally click "OK" to clear the tags from your library.

#### Adding Goodreads Tags

Select all the books you'd like to update, right click to -> Edit metadata -> Download metadata and covers. The click the "Configure Download" button. Make sure that ONLY the Goodreads source is checked under "Metadata sources". This step is optional, but I prefer to have Goodread's original tags so I click "Configure selected source" for Goodreads, and uncheck the "Filter and map genres to calibre tags" button. While on this screen make sure that the "Tags" checkbox is selected (it should be by default). For the other metadata field you can leave them checked if you want to override your existing library metadata for those fields. Finally, click "Save".

I also set the maximum number of tags to 10 (anymore than that for a single book will likely be noise. Here's my configuration:

Click image for larger version

Name:	Screenshot 2024-03-10 at 12.14.44 AM.png
Views:	976
Size:	246.3 KB
ID:	206817

Click "Apply" to update configuration. And then on the Schedule Download page, you can click either "Download only metadata" if you only want metadata, or "Download both" if you'd like to download metadata and cover images from Goodreads. I already had cover images in my library, so I only clicked "Download only metadata" to speed up the process. But if you're missing cover images you might want to grab both.

This process will take several minutes for larger libraries as Calibre updates your metadata with tags from Goodreads. Once this is complete, Calibre will ask you if you want to update your library with the metadata. Click "Yes" to agree. This will also take several minutes if you have a larger library.

At the end of this process you should have most of your books that could be matched with updated tags from Goodreads.

If all you want are clean tags in your library, you can stop the guide here. But if you'd like to classify books by Genre as well, then proceed.

### Auto Genres

Now that you have a single source for your tags, the next step is to classify each book into a single hierarchical genre. I prefer to keep genres simple and have only a single level deep for the hierarchy. The top level genre for books will only be one of "Fiction" or "Nonfiction". The next level would be the single most relevant tag.

Fortunately, Goodreads tags are returned with an important property: they are pre-sorted in order of relevance! That means to determine a books genre, we must first extract either the "Fiction" or "Nonfiction" tag from a book, then get the next most relevant tag.

To do this, I'm going to start by creating an intermediate column called "Autogenre" which is automatically generated from the imported tags. Go to "Preferences" -> "Add your own columns". Make the "Lookup name" field "autogenre" and make the Column heading "Auto Genre". The "Column type" field should be "Column built from other columns".

Finally under the "Template" field paste the following code:

program:
tags = field('tags');
unsorted_tags = raw_list('tags', ',');

top_tag = str_in_list(tags, ',', 'Fiction', 'Fiction', 'Nonfiction');
subtags = list_difference(unsorted_tags, 'Nonfiction,Fiction,Audiobooks,Audiobook', ',');

first_subtag = list_item(subtags, 0, ',');
genre = list_union(first_subtag, top_tag, '.');


This code ensures we keep the default sorting provided by Goodreads (Calibre normally sorts tags alphabetically), then extracts the top level tag by checking if a book is tagged Fiction or not. If you'd like additional top level tags, you will need to modify this code, but keeping it simple with two top level tags, works for me.

The most relevant genre (ex: "Science Fiction" for Dune) is determined by extracting the first tag that doesn't match any of the categories: 'Nonfiction,Fiction,Audiobooks,Audiobook'.

Here's what it should look like before creating the column:

Click image for larger version

Name:	Screenshot 2024-03-10 at 1.15.16 AM.png
Views:	855
Size:	113.3 KB
ID:	206816

Click "OK" to generate the "Auto Genre" field.

### Genres

*You can skip this step if you don't want the genre in the tag browser. However I find it useful to quickly filter my Calibre library so I recommend this step.*

You should now have tags and the "Auto Genre" fields populated. Unfortunately you cannot use the "Auto Genre" field in the tag browser because it's a computed field and Calibre needs a field with fixed data for the tag browser.

To do this, we'll need to copy the values from the "Auto Genre" column into a fixed "Genre" column. The "Auto Genre" column updates every time tags are modified. But the "Genre" column will remain fixed after the copy operation.

First, if you don't already have it, create a new column, with the "Lookup name" as "genre" and the "Column heading" as "Genre". The column type should be "Text, column shown in the tag browser".

Once the Genre column is created you can now copy values from the Auto Genre column. First select all of your tagged books. Then go to "Edit metadata" -> "Edit metadata in bulk". Go to the "Search and replace" tab.

Setup your search and replace settings as follows. Be sure that the "Search mode" field is set to "Regular expression". The "Search field" should be set to "#autogenre" and the "Destination field" should be set to "#genre". Both the "Search for" and "Replace with" fields should contain the exact same value, to ensure the fields get copied with no changes (Calibre required text so this is arbitrary, I use the letter "z").

Click image for larger version

Name:	Screenshot 2024-03-10 at 1.25.31 AM.png
Views:	692
Size:	184.0 KB
ID:	206815

You can save this search/replace rules as "AutoGenre to Genre" if you need to update your "Genre" column in the future after adding additional books.

Now click "OK", and this should do a one-time generation of your "Genre" column.

### Wrap up

Now you should have tags from Good reads, "Auto Genre" column automatically updated from tags as they change, and the "Genre" column which is static text.

The "Genre" column should be showing in the tag browser, where you can filter you library, by Fiction, Nonfiction, or any of the respective subgenres.

You will also be able to use the Genre or Auto Genre field to create directory structure on your e-reader based on genres.

https://manual.calibre-ebook.com/sub...late-functions

I didn't use the genre directory structure, but if you wanted it because you had a very large library, you could use a save to disk template like:

```
{#genre:subitems(0,1)||/}{#genre:subitems(1,2)||/}{title} - {authors}
```
This would create the directory structure like:

Fiction/Science Fiction/Dune - Frank Herbert.epub

Nonfiction/Biography/Benjamin Frankin - Walter Isaacson.epub

I have a Kobo Device so I use the Auto Genre field to create collections on my device by setting the "Collections columns" field to "#autogenre" in the KoboTouch plugin. This process will vary if you're on another type of device.

Last edited by pdurrant; 03-10-2024 at 05:27 AM. Reason: put images in their place in the text
Firehose is offline   Reply With Quote