Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Readers > Kobo Reader > Kobo Developer's Corner

Notices

Reply
 
Thread Tools Search this Thread
Old 04-20-2019, 12:24 AM   #1
ceridwen
Enthusiast
ceridwen began at the beginning.
 
Posts: 36
Karma: 10
Join Date: Feb 2017
Device: Kobo Aura H2O
Best strategy for metadata management for Kobo using Calibre?

I'm looking to automate the metadata handling for books on my Kobo H2O to make searching and organizing my library easier. I think the easiest way to do this is going to be to insert metadata into some field into the epub, import it using Calibre, and then use Calibre to set up collections and put the metadata into another field like subtitle. Setting up a plugboard template to push data into subtitle and to have Calibre create collections based on a column are straightforward thanks to the existing work some great people have done on the Kobo Calibre plugin. I have some questions about automating the earlier steps in the process.

What columns will Calibre automatically populate for me from metadata in an epub? Is there anything beyond Tags? I know that for instance epubs downloaded from AO3 will show up with entries in the Tags column in Calibre. I'd like to be able to add metadata to another custom column, though, because I expect I'm going to have to do a lot of filtering to get the set of possible tags down to something small enough so that the resulting collections won't overwhelm the Kobo. It would be nice if I could leave the existing AO3 metadata field intact so that I can use it later if I need to regenerate tags. I also have epubs I'm generating myself from my own tools and from other sources that I want to tag. Is there a good way of adding metadata to cbzs? What kind of problems am I likely to encounter when automatically converting tags (which may contain Unicode, for instance) to collections?

Just for laughs, I tried earlier with the full set of AO3 tags in my library and confirmed that it will crash trying to load the DB, probably because it runs out of memory. Has anyone played around with how many collections it's possible to put into the Kobo DB before the UI starts to become too slow or crashes?

My goal is to automate as much of this pipeline as possible and to avoid making it too slow. I know there's a calibre-db CLI,but writing code that calls a CLI is going to involve a lot of indirection that will make the script slower and harder to write. Is there a programmatic interface I should be looking at? Has anyone else tried to do something like this and open-sourced tools I don't know about? Are there pitfalls I should be aware of?

This is a follow-up to a previous question I asked about what metadata Kobo's software will read, where davidfor kindly established that it won't pick up useful metadata for sideloaded epubs.
ceridwen is offline   Reply With Quote
Old 04-20-2019, 12:38 AM   #2
ilovejedd
hopeless n00b
ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.
 
ilovejedd's Avatar
 
Posts: 5,126
Karma: 19597086
Join Date: Jan 2009
Location: in the middle of nowhere
Device: PW4, PW3, Libra H2O, iPad 10.5, iPad 11, iPad 12.9
If you know python, you can always write your own plugin.
https://manual.calibre-ebook.com/creating_plugins.html

I'm not quite up for a crash course in programming and python so I just use a combination of Job Spy plugin (scrub tags) and Calibre Bulk Metadata Editor search & replace (regex mode). I have a #freeform custom column to which I use to hold a copy of the original tags from the AO3 EPUB.

Note, I believe you can also use FanFicFare to download AO3 metadata and do the filtering via its personal.ini file.

Calibre is also capable of importing custom column data from the EPUB's OPF file (assuming custom column exists in the library and uses correct type and format) so another approach is to edit the OPF files inside the EPUB to add custom columns populated with your data in case you're writing in a different language.

Last edited by ilovejedd; 04-20-2019 at 12:41 AM.
ilovejedd is offline   Reply With Quote
Advert
Old 04-20-2019, 03:03 PM   #3
ceridwen
Enthusiast
ceridwen began at the beginning.
 
Posts: 36
Karma: 10
Join Date: Feb 2017
Device: Kobo Aura H2O
Red face

Quote:
Originally Posted by ilovejedd View Post
If you know python, you can always write your own plugin.
https://manual.calibre-ebook.com/creating_plugins.html
I do know Python so yes, I can write a Calibre plugin if it's the best way to do it.

Quote:
Originally Posted by ilovejedd View Post
Note, I believe you can also use FanFicFare to download AO3 metadata and do the filtering via its personal.ini file.
I looked at this. For my uses, there are two problems: I'm dealing with metadata from sources that FanFicFare doesn't cover, and FanFicFare's metadata handling is too manual for what I want from it. One of the obvious ways to reduce the number of collections for my particular library is to exclude tags with only one example from being made into a collection. I don't think FanFicFare can do this without my modifying it pretty heavily. I'm still considering using it as one part of my toolchain.

Quote:
Originally Posted by ilovejedd View Post
Calibre is also capable of importing custom column data from the EPUB's OPF file (assuming custom column exists in the library and uses correct type and format) so another approach is to edit the OPF files inside the EPUB to add custom columns populated with your data in case you're writing in a different language.
That sounds like what I need. Does it do this automatically for any custom column?
ceridwen is offline   Reply With Quote
Old 04-20-2019, 04:56 PM   #4
ilovejedd
hopeless n00b
ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.
 
ilovejedd's Avatar
 
Posts: 5,126
Karma: 19597086
Join Date: Jan 2009
Location: in the middle of nowhere
Device: PW4, PW3, Libra H2O, iPad 10.5, iPad 11, iPad 12.9
Quote:
Originally Posted by ceridwen View Post
That sounds like what I need. Does it do this automatically for any custom column?
For fixed data custom columns yes. You need to look at how exactly the custom column is formatted in the OPF. Best way is to create the custom columns in the Calibre library and populate them with your data. Then study the metadata.opf inside the book folder.

I haven't tried it with columns built from other columns but that shouldn't really matter.
ilovejedd is offline   Reply With Quote
Old 04-20-2019, 09:18 PM   #5
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,905
Karma: 47303824
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
Quote:
Originally Posted by ceridwen View Post
I looked at this. For my uses, there are two problems: I'm dealing with metadata from sources that FanFicFare doesn't cover, and FanFicFare's metadata handling is too manual for what I want from it. One of the obvious ways to reduce the number of collections for my particular library is to exclude tags with only one example from being made into a collection. I don't think FanFicFare can do this without my modifying it pretty heavily. I'm still considering using it as one part of my toolchain.
FFF uses regex to map tags and most other metadata from the sites to whatever you prefer. Setting it up is going to be time-consuming, but any mechanism like that will be. Calibre also has a tag mapping tool.

Extending FFF to cover other sites isn't that hard. At least, it isn't if the site is well structured. If you are downloading from sites such as AO3, it is worth looking at. It is intended for story posting sites. It isn't for stores or more traditionally published books. For those, there are a myriad of metadata source plugins.

For collections on a Kobo device, any column, or columns, can be used. I've always found tags to be useless as there are to many. But, you can populate another column by some method, or use a "column built from other columns" with an appropriate template.
davidfor is offline   Reply With Quote
Advert
Old 04-23-2019, 12:09 AM   #6
ceridwen
Enthusiast
ceridwen began at the beginning.
 
Posts: 36
Karma: 10
Join Date: Feb 2017
Device: Kobo Aura H2O
Quote:
Originally Posted by ilovejedd View Post
For fixed data custom columns yes. You need to look at how exactly the custom column is formatted in the OPF. Best way is to create the custom columns in the Calibre library and populate them with your data. Then study the metadata.opf inside the book folder.

I haven't tried it with columns built from other columns but that shouldn't really matter.
Experimentally, populating the .opf file inside an .epub doesn't cause Calibre to read the metadata contained in that file, regardless of whether it's created as a generic tag using
Code:
<meta name="foo" content="bar"/>
or using the tag construction Calibre does in the metadata.opf attached to an epub file:
Code:
<meta name="calibre:user_metadata:#test" content="{&quot;display&quot;: {&quot;description&quot;: &quot;&quot;, &quot;is_names&quot;: true}, &quot;colnum&quot;: 1, &quot;column&quot;: &quot;value&quot;, &quot;#extra#&quot;: null, &quot;table&quot;: &quot;custom_column_1&quot;, &quot;is_category&quot;: true, &quot;rec_index&quot;: 22, &quot;is_csp&quot;: false, &quot;#value#&quot;: [&quot;foo, bar&quot;], &quot;category_sort&quot;: &quot;value&quot;, &quot;is_multiple&quot;: &quot;|&quot;, &quot;is_multiple2&quot;: {&quot;cache_to_list&quot;: &quot;|&quot;, &quot;ui_to_list&quot;: &quot;&amp;&quot;, &quot;list_to_ui&quot;: &quot; &amp; &quot;}, &quot;datatype&quot;: &quot;text&quot;, &quot;name&quot;: &quot;Test&quot;, &quot;is_editable&quot;: true, &quot;kind&quot;: &quot;field&quot;, &quot;label&quot;: &quot;test&quot;, &quot;search_terms&quot;: [&quot;#test&quot;], &quot;link_column&quot;: &quot;value&quot;, &quot;is_custom&quot;: true}"/>
For ebooks I'm more or less generating myself, I can create the dc:subject tags when I build the rest of it. For AO3, since dc:subject tags are already there, I probably have to copy the existing epub and rewrite the content.opf file. Because I'm going to want to store the results of tag processing locally (because it requires calling out to the Internet), there's really no way around keeping two epubs. The only other thing I could do is use calibredb to do the insertion in the Calibre database (which I think does allow me to set custom fields) and then have a separate file that caches the results of tag processing.
ceridwen is offline   Reply With Quote
Old 04-23-2019, 01:41 AM   #7
ilovejedd
hopeless n00b
ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.
 
ilovejedd's Avatar
 
Posts: 5,126
Karma: 19597086
Join Date: Jan 2009
Location: in the middle of nowhere
Device: PW4, PW3, Libra H2O, iPad 10.5, iPad 11, iPad 12.9
Quote:
Originally Posted by ceridwen View Post
Experimentally, populating the .opf file inside an .epub doesn't cause Calibre to read the metadata contained in that file, regardless of whether it's created as a generic tag using
Code:
<meta name="foo" content="bar"/>
or using the tag construction Calibre does in the metadata.opf attached to an epub file:
Code:
<meta name="calibre:user_metadata:#test" content="{&quot;display&quot;: {&quot;description&quot;: &quot;&quot;, &quot;is_names&quot;: true}, &quot;colnum&quot;: 1, &quot;column&quot;: &quot;value&quot;, &quot;#extra#&quot;: null, &quot;table&quot;: &quot;custom_column_1&quot;, &quot;is_category&quot;: true, &quot;rec_index&quot;: 22, &quot;is_csp&quot;: false, &quot;#value#&quot;: [&quot;foo, bar&quot;], &quot;category_sort&quot;: &quot;value&quot;, &quot;is_multiple&quot;: &quot;|&quot;, &quot;is_multiple2&quot;: {&quot;cache_to_list&quot;: &quot;|&quot;, &quot;ui_to_list&quot;: &quot;&amp;&quot;, &quot;list_to_ui&quot;: &quot; &amp; &quot;}, &quot;datatype&quot;: &quot;text&quot;, &quot;name&quot;: &quot;Test&quot;, &quot;is_editable&quot;: true, &quot;kind&quot;: &quot;field&quot;, &quot;label&quot;: &quot;test&quot;, &quot;search_terms&quot;: [&quot;#test&quot;], &quot;link_column&quot;: &quot;value&quot;, &quot;is_custom&quot;: true}"/>
For ebooks I'm more or less generating myself, I can create the dc:subject tags when I build the rest of it. For AO3, since dc:subject tags are already there, I probably have to copy the existing epub and rewrite the content.opf file. Because I'm going to want to store the results of tag processing locally (because it requires calling out to the Internet), there's really no way around keeping two epubs. The only other thing I could do is use calibredb to do the insertion in the Calibre database (which I think does allow me to set custom fields) and then have a separate file that caches the results of tag processing.
I just tested exporting EPUB with "Update metadata in saved copies" enabled then imported those EPUB files to an empty library with matching custom columns. All custom columns seem to have been populated just fine. The #test custom column does exist in your library, right, and you just copied that formatting from a Calibre-generated OPF?

Note, I think the comma may be a hard coded special character for tag-like custom columns (even if it's set to contain names) so avoid using it in the value.

Assuming "foo, bar" is meant to be a single tag in custom column #test, just try removing the comma.

Assuming "foo, bar" is supposed to be two separate tags in custom column #test, that should be "foo", "bar".
ilovejedd is offline   Reply With Quote
Old 04-23-2019, 09:32 PM   #8
ceridwen
Enthusiast
ceridwen began at the beginning.
 
Posts: 36
Karma: 10
Join Date: Feb 2017
Device: Kobo Aura H2O
Quote:
Originally Posted by ilovejedd View Post
I just tested exporting EPUB with "Update metadata in saved copies" enabled then imported those EPUB files to an empty library with matching custom columns. All custom columns seem to have been populated just fine. The #test custom column does exist in your library, right, and you just copied that formatting from a Calibre-generated OPF?
You're right that if I save the file somewhere else and reimport it, it keeps the custom column data. I did some experimentation on this and it left me more confused than I was before. If I unzip the test epub that contains the custom column data and then copy select tags into the content.opf of another test from the content.opf of the known-to-work ebub and re-zip it, Calibre doesn't pick up the metadata in the custom column. Copying the entire content.opf doesn't work, either. Most confusingly, if I unzip the known-to-work epub, don't change anything, and then zip the same files into a new epub, that doesn't get the custom column metadata when I import it. I even checked the extra field of the original zipfile, which seems to be empty.

Does anyone know how this works?

Quote:
Originally Posted by ilovejedd View Post
Note, I think the comma may be a hard coded special character for tag-like custom columns (even if it's set to contain names) so avoid using it in the value.

Assuming "foo, bar" is meant to be a single tag in custom column #test, just try removing the comma.

Assuming "foo, bar" is supposed to be two separate tags in custom column #test, that should be "foo", "bar".
In this case, it's mean to be two separate tags. I entered those tags into the Calibre GUI manually to create the files I've been using for testing. Do you mean it has to be entered "foo", "bar" in the Calibre GUI?
ceridwen is offline   Reply With Quote
Old 04-23-2019, 11:03 PM   #9
ilovejedd
hopeless n00b
ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.
 
ilovejedd's Avatar
 
Posts: 5,126
Karma: 19597086
Join Date: Jan 2009
Location: in the middle of nowhere
Device: PW4, PW3, Libra H2O, iPad 10.5, iPad 11, iPad 12.9
Quote:
Originally Posted by ceridwen View Post
You're right that if I save the file somewhere else and reimport it, it keeps the custom column data. I did some experimentation on this and it left me more confused than I was before. If I unzip the test epub that contains the custom column data and then copy select tags into the content.opf of another test from the content.opf of the known-to-work ebub and re-zip it, Calibre doesn't pick up the metadata in the custom column. Copying the entire content.opf doesn't work, either. Most confusingly, if I unzip the known-to-work epub, don't change anything, and then zip the same files into a new epub, that doesn't get the custom column metadata when I import it. I even checked the extra field of the original zipfile, which seems to be empty.

Does anyone know how this works?
EPUB has specific rules when you zip it. Iirc, the first file added to the ZIP must be the manifest and it must be stored without compression. Can't remember the rules exactly right now.

Alternately, if you want to avoid EPUB surgery, you can keep the modified OPF file alongside the EPUB using the same filename as the EPUB (or I think even metadata.opf if one title per folder). I've noticed Calibre always respected external OPF files during import if they exist.


Quote:
In this case, it's mean to be two separate tags. I entered those tags into the Calibre GUI manually to create the files I've been using for testing. Do you mean it has to be entered "foo", "bar" in the Calibre GUI?
Sorry wasn't more clear. I meant in the OPF.
Code:
&quot;#value#&quot;: [&quot;foo&quot;, &quot;bar&quot;]
At least that's how my tag-like custom columns were stored when I inspected the OPFs.
ilovejedd is offline   Reply With Quote
Old 04-23-2019, 11:34 PM   #10
ceridwen
Enthusiast
ceridwen began at the beginning.
 
Posts: 36
Karma: 10
Join Date: Feb 2017
Device: Kobo Aura H2O
For my personal AO3 library, after I stripped all the rating, warning, category, and character tags, there are more than 6000 distinct tags. Canonicalizing them reduces this to more than 5000. Dropping tags that only appear once reduces this to 1277 tags, which is still probably too many collections for a Kobo to handle, I'm assuming? I'm going to look at how much I lose by starting to less-frequent tags, though I'm curious if anyone has other ideas for reducing the number of tags with minimal loss of information.
ceridwen is offline   Reply With Quote
Old 04-23-2019, 11:45 PM   #11
ceridwen
Enthusiast
ceridwen began at the beginning.
 
Posts: 36
Karma: 10
Join Date: Feb 2017
Device: Kobo Aura H2O
Quote:
Originally Posted by ilovejedd View Post
EPUB has specific rules when you zip it. Iirc, the first file added to the ZIP must be the manifest and it must be stored without compression. Can't remember the rules exactly right now.
The official spec only says that the mimetype file to be first:
http://www.idpf.org/epub/301/spec/ep...tainer-zipreqs I tried rebuilding my test epubs (with the content.opf edits) with the mimetype file first and this fixed the issue. Thanks!

Quote:
Originally Posted by ilovejedd View Post
Sorry wasn't more clear. I meant in the OPF.
Code:
&quot;#value#&quot;: [&quot;foo&quot;, &quot;bar&quot;]
At least that's how my tag-like custom columns were stored when I inspected the OPFs.
I see. In this case, it's me misinterpreting what the Calibre GUI is doing. Thanks for the pointer, now I think I can construct the custom Calibre meta tags correctly.
ceridwen is offline   Reply With Quote
Old 04-23-2019, 11:47 PM   #12
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,905
Karma: 47303824
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
Quote:
Originally Posted by ceridwen View Post
You're right that if I save the file somewhere else and reimport it, it keeps the custom column data. I did some experimentation on this and it left me more confused than I was before. If I unzip the test epub that contains the custom column data and then copy select tags into the content.opf of another test from the content.opf of the known-to-work ebub and re-zip it, Calibre doesn't pick up the metadata in the custom column. Copying the entire content.opf doesn't work, either. Most confusingly, if I unzip the known-to-work epub, don't change anything, and then zip the same files into a new epub, that doesn't get the custom column metadata when I import it. I even checked the extra field of the original zipfile, which seems to be empty.

Does anyone know how this works?
I'm not sure when you are doing this. I think the problem is the timing of what you have doing and when calibre reads the metadata.

Calibre reads the metadata when the book is first added (assuming appropriate options are set). It will read the custom columns from the OPF in the epub if matching columns exist in the library. Otherwise, they will be ignore. You can also read metadata from the file inside calibre from the Edit Metadata screen. Select the format you want to extract the metadata from in the top right corner and press the appropriate button.

Calibre doesn't automatically update the book when you change the metadata in calibre. It will update the metadata.opf file in the directory with the book. The actual book gets updated when you send it outside the library (save-to-disk and send-to-device - these do not update the copy of the book in the library), edit the book, convert it, use Polish book or use the Embed metadata function. Basically, you have to do something to have the book updated.

If you replace the copy of a book in the library, calibre doesn't change the metadata either in it's database or the file. This includes dropping the new version on the details pain of the exiting book, or adding the book and having duplicates merged automatically. If you want to update the metadata from the new version of the book, you need to do this manually as above. I would expect the same if you use the command-line to replace the book.

And as @ilovejedd said, if there is an external OPF file when you are adding a new book, calibre will use it for the metadata over the metadata in the book. Changing that is slightly simpler than changing the book.
Quote:
In this case, it's mean to be two separate tags. I entered those tags into the Calibre GUI manually to create the files I've been using for testing. Do you mean it has to be entered "foo", "bar" in the Calibre GUI?
No, in the calibre GUI, you enter "foo, bar". That will be treated as two tags; "foo" and "bar". When calibre saves this to the OPF file, it will be saved as:

Code:
&quot;#value#&quot;: [&quot;foo&quot;, &quot;bar&quot;]
It is stored in the OPF file as a comma separated list of quoted strings. With what you had, I would expect calibre to display in GUI "foo; bar". The comma will be changed to a semicolon
davidfor is offline   Reply With Quote
Old 04-24-2019, 12:06 AM   #13
ilovejedd
hopeless n00b
ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.
 
ilovejedd's Avatar
 
Posts: 5,126
Karma: 19597086
Join Date: Jan 2009
Location: in the middle of nowhere
Device: PW4, PW3, Libra H2O, iPad 10.5, iPad 11, iPad 12.9
Quote:
Originally Posted by ceridwen View Post
For my personal AO3 library, after I stripped all the rating, warning, category, and character tags, there are more than 6000 distinct tags. Canonicalizing them reduces this to more than 5000. Dropping tags that only appear once reduces this to 1277 tags, which is still probably too many collections for a Kobo to handle, I'm assuming? I'm going to look at how much I lose by starting to less-frequent tags, though I'm curious if anyone has other ideas for reducing the number of tags with minimal loss of information.
I keep a #freeform custom column where I copy all AO3 tags found in my downloaded fics. Then a #fictags column with scrubbed values (LibreOffice Calc spreadsheet+Job Spy scrub tags feature -> performs both canonicalizing and deletion of tags I'm not interested in). I also keep a separate #collections column for creating collections based only on a handful of most used tags (less than a hundred).
  • #collections: for offline on-device browsing
  • #fictags: for online on-device browsing (of Calibre mobile server or COPS) using built-in browser or via smartphone/tablet using Calibre Companion or content server
  • #freeform: for browsing via fast device with largish screen (tablet or PC) using Calibre, Calibre Companion or content server
ilovejedd is offline   Reply With Quote
Old 04-24-2019, 04:40 AM   #14
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,905
Karma: 47303824
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
Quote:
Originally Posted by ceridwen View Post
For my personal AO3 library, after I stripped all the rating, warning, category, and character tags, there are more than 6000 distinct tags. Canonicalizing them reduces this to more than 5000. Dropping tags that only appear once reduces this to 1277 tags, which is still probably too many collections for a Kobo to handle, I'm assuming? I'm going to look at how much I lose by starting to less-frequent tags, though I'm curious if anyone has other ideas for reducing the number of tags with minimal loss of information.
1277 collections is probably to many. At the least, it is probably impractical. Looking at my Clara HD, it has 10 collections per page. That would mean 128 pages of collections. I would never page through that. The sorting of size and date created might help, but, again, to many to be practical.

Performance is a separate issue. I haven't loaded a lot of collections for a while (I'm at four pages at the moment). But, the performance tends to be related as much to the number of books in the collections, as to the number of collections. By that I mean adding a collection with 500 books in it, will have a bigger impact than adding ten collections with 10 books in each.

This is with recent firmware. There was major fix to collection management some time last year. There was a point where I had about 40 pages of collections and it took over a minute to open the collections list. And I added a single collection with about 1000 books on it and it went to over 2 minutes to open the list. I supplied this to Kobo and they were able to find the issue. The latter came down to about 10 seconds.
davidfor is offline   Reply With Quote
Old 04-29-2019, 02:02 AM   #15
ceridwen
Enthusiast
ceridwen began at the beginning.
 
Posts: 36
Karma: 10
Join Date: Feb 2017
Device: Kobo Aura H2O
Yeah, 1277 is definitely too many . Actually, probably several hundred is too many, so I'm going to have to figure out some heuristics to cut down on the tags more. The note about large collections being slow is very good to know---I went back and chopped some tags that swept up too many books, so my largest collection is now ~150 books. I need to refine further, since my non-AO3 tags generate another 300+ (though some of those tags can be merged by normalizing capitalization).

At the moment, I'm using a very crude approach: I've created a custom column, and I have two scripts that literally invoke calibredb repeatedly. The AO3 script handles AO3 tags directly (bypassing the epub) and only calls calibredb to look up the calibre id and then to run calibredb set_metadata. The non-AO3 script looks up the Tags column and then runs calibredb set_metadata to add some of them to the custom column. This is slow and awkward, so I'd really like to write a plugin to do it instead. I spent some time looking through the API documentation for plugins, but I didn't see any discussion of how to write a plugin that does the kinds of things I want here. Does anyone know of a good example plugin that could get me started or some other resource that would help me?
ceridwen is offline   Reply With Quote
Reply

Tags
calibre, kobo, kobo calibre database, metadata


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Management of annotations with Kobo and Calibre Ziggi Plugins 2 11-13-2015 01:10 PM
Metadata Management on Android not working? TheStretchedElf Devices 0 08-08-2012 09:10 AM
Kobo to Calibre Metadata Issue joelarthurs Calibre 0 01-21-2012 03:10 PM
kobo management strategy baronrus Kobo Reader 1 03-25-2011 04:34 PM
Automatic Metadata Management gxxshock Calibre 2 12-28-2008 12:48 PM


All times are GMT -4. The time now is 09:30 PM.


MobileRead.com is a privately owned, operated and funded community.