Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Plugins

Notices

View Poll Results: What features would you like added to this plugin?
Release existing bug fixes now 4 50.00%
Add options to make search more flexible 2 25.00%
Offer fewer options 1 12.50%
Link all matching gutenberg IDs 1 12.50%
Only link gutenberg ID for exact edition match 1 12.50%
Import more wikidata fields 2 25.00%
Handle wikidata entries with multiple book editions correctly 5 62.50%
Multiple Choice Poll. Voters: 8. You may not vote on this poll

Reply
 
Thread Tools Search this Thread
Old 11-04-2018, 06:57 PM   #1
compurandom
Guru
compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.
 
Posts: 918
Karma: 417282
Join Date: Jun 2015
Device: kobo aura h2o, kobo forma
[Metadata Source Plugin] wikidata

This plugin attempts to find a book in Wikipedia's wikidata and download metadata.

Once a number of books have wikidata identifiers, you might want to use the wikidata gui plugin to merge in additional metadata

Note that this plugin is designed to be used for low volume searching for single books that the wikidata-gui plugin can't find. If you try to search for multiple books at once, it may put undue stress on the wikidata servers, where the wikidata-gui plugin tries to be more efficent with bulk searches. You may end up being rate limited by this.

Features supported in version 2.0:
  • Python3 support for Calibre 5 (should work with Calibre 4 but untested)
  • Search wikidata database by author, title and combinations of author and title, isbn, gutenberg ID, and wikidata ID
  • Import first publication date, gutenberg ID, series data, ISBN
  • Import several properties (instance, genre) as tags
  • Automatic detection and conversion of Overdrive linked gutenberg IDs
  • Link wikidata ID to make browsing of additional wikidata info easy
  • Link gutenberg ID to Project Gutenberg website for easy book import
  • After finding a book, offer images linked in wikidata as covers
  • Integrate handling of multiple external identifiers with the Wikidata GUI plugin

Constructive criticism for this plugin would be greatly appreciated.
No new releases are planned for this plugin unless features are requested.

Examples of books that are in wikipedia but are not found by this plugin (or have metadata you want to import) are very welcome!

Version History:
Spoiler:

Version 2.0.0 - 27 Nov 2020
Fixes for python3 / Calibre 5.4, hopefully this still works with python2
Upgrade to Sparql 1.9.0.dev0
Attempt at better error handling, at least better logging
Note: this version has not been tested with Calibre 4, reports welcome, but there's not much reason other than python3 to upgrade at this time.

Version 1.3.0 - 13 Jan 2019
refactored url / identifier code to work with wikidata gui plugin

maybe fixed bugs in the url paste translator

Version 1.2.0 - 19 Dec 2018
Add ISBN, series to metadata
improve ISBN search
add support for translating urls pasted into ID
Fixed a crash in fuzzy search for books with unknown author

Version 1.1.0 - 15 Nov 2018
Added cover download capability
Add option to save translated or found gutenberg IDs
Add option to use wikidata Q codes or descriptions for instance/genre tags
- existing calibre filtering of tags works well with this

bugs fixed:
fixed crash on book search with unknown author
Use all gutenberg IDs found in wikidata
removed internal commas from tags

bugs:
Only finds first overdrive gutenberg ID attached to a book in existing metadata
(no intent to fix this unless someone complains, it works well enough)

Version 1.0.0 - 12 Nov 2018

Features added:
Save wikidata "instance of" and "genre" properties as tags (optionally)
support gutenberg book IDs (finding, searching by, saving)
additional more accurate searches
inexact searches with better sorting (ie., author words, title words)
support for languages other than english
use more inclusive wikidata literature and publication types
add support for keep_dup

Bugs fixed:
better formatting of exception errors
finds more books successfully

Version 0.5.0 - 9 Nov 2018
New features:
Add support for saving and using the wikidata ID
add options: slow search, ignore wikidata, language(default=en)
much improved book hit rate:
search for all subclasses of written works instead of just books
search for alternate titles too
find books without publication dates too

Bugs fixed:
Display the actual title and authors found
improved logging of search attempts
better selection of correct relevant book from fuzzy matches

Version 0.1.0 - 4 Nov 2018
Initial version, pubdate only, limited search options


Searches performed (stops on first success currently):
Spoiler:

(optional) indicates this search can be disabled or enabled
Search by saved wikidata ID (optional)
Search by saved or found Project Gutenberg id (optional)
Search by ISBN
Search for exact author/title
Search for Title only (inexact)
Search for first author only, sort by closest title match (inexact)
Search by partial title match (inexact, slow)


Known bugs:
Spoiler:
  • ISBN search is not well tested and does not handle multiple editions even though wikidata requires ISBNs to be attached to editions only
  • Exact searches stop less exact searches even when they are wrong (fix would allow individual selection of each search)
  • Calibre version 3.35 or later needed to keep more than the first book returned (bug #1802293 )
  • Only the first gutenberg ID from Overdrive is used for matching, but all of them found in wikidata will be imported.
  • Sometimes finds works that are not written works when using inexact searches (so examine results carefully if you turn this on!)
  • Does not always properly handle wikidata where a book has multiple editions; this is rare but may become less rare in the future; a fix is being considered
  • tries to use the oldest publication date found rather than the date of a specific edition (this is a feature really--complain if you want this to be an option)


Possible future features:
No work is planned on any of these unless someone asks for them.
Spoiler:
  • better handling of book editions in wikidata
  • remove subtitles from titles during title search to try to find more titles successfully
  • Other wikidata properties (publisher, comments, wikidata appropriate for tags); GUI plugin does this already
  • further refinement of book search is possible (but is it necessary?)


Note: Some authors ask for payment or donations. I ask that if you use this plugin, drop a note in this thread so I know its not just people downloading it and deleting it.

Let me know what features you use, and especially what features you'd like added!
Maybe let me know roughly how often you find books with this, what your success rate is, maybe examples of metadata from books you can't find in wikidata.

My motivation for development is driven purely by feedback!
Attached Files
File Type: zip Wikidata.zip (38.4 KB, 60000 views)

Last edited by compurandom; 11-29-2020 at 02:29 AM. Reason: update to 2.0
compurandom is offline   Reply With Quote
Old 11-04-2018, 11:06 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I have added it to the index, however it will not show up in calibre because the plugin files are ina sub-directory in the zip file, they should be at the top level.
kovidgoyal is online now   Reply With Quote
Old 11-04-2018, 11:43 PM   #3
compurandom
Guru
compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.
 
Posts: 918
Karma: 417282
Join Date: Jun 2015
Device: kobo aura h2o, kobo forma
Repurposing this message to house old versions.

Version 1.3.0 released Jan 2019, this was tested with calibre 4 / python 2 and maybe earlier.
Attached Files
File Type: zip Wikidata.zip (38.7 KB, 256 views)

Last edited by compurandom; 11-27-2020 at 11:24 PM.
compurandom is offline   Reply With Quote
Old 11-05-2018, 02:22 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
the rest looks fine.
kovidgoyal is online now   Reply With Quote
Old 11-06-2018, 09:45 PM   #5
compurandom
Guru
compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.
 
Posts: 918
Karma: 417282
Join Date: Jun 2015
Device: kobo aura h2o, kobo forma
I just ran this on the portion of my library that should be in wikipedia.
It found 118 titles, and didn't find 87. So I have some test cases for things it doesn't find.

I suspect a lot of those are not books -- maybe poems, short stories, etc.
This currently only finds books. I'll figure out how to add the others slowly.

Next version I release will have support for saving and using the wikidata identifier.

Covers is a possibility as well.
compurandom is offline   Reply With Quote
Old 11-09-2018, 07:25 AM   #6
compurandom
Guru
compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.
 
Posts: 918
Karma: 417282
Join Date: Jun 2015
Device: kobo aura h2o, kobo forma
Wikidata is a rich source of metadata, for example
https://www.wikidata.org/wiki/Q1219561

http://tinyurl.com/yd3m69x6

Would anyone be interested in having more of that metadata imported?

For example, wikidata tags works as novels, plays, poems, etc.
It tags these literary works with one or more genre tags as well.
I could import this metadata either into the existing tags column or a user defined column.

Is there other metadata that people would want?

Last edited by compurandom; 11-11-2018 at 12:01 PM.
compurandom is offline   Reply With Quote
Old 11-16-2018, 12:09 AM   #7
compurandom
Guru
compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.
 
Posts: 918
Karma: 417282
Join Date: Jun 2015
Device: kobo aura h2o, kobo forma
The latest version now supports covers and I've tweaked the options and refined some of the searches. It finds most books on the first try now. Books in my library that it still can't find are either not in wikidata or have issues with author and title spelling (i.e., I'll fix by editing my library.)

I consider this plugin "finished" and will not develop it further unless I get feedback.
compurandom is offline   Reply With Quote
Old 12-09-2018, 06:12 PM   #8
Jay Dugger
Junior Member
Jay Dugger began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Dec 2012
Device: none
Yes, I have interest in more metadata import. However, it will take me about a week to test the plugin.
Jay Dugger is offline   Reply With Quote
Old 12-10-2018, 07:15 AM   #9
compurandom
Guru
compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.
 
Posts: 918
Karma: 417282
Join Date: Jun 2015
Device: kobo aura h2o, kobo forma
Quote:
Originally Posted by Jay Dugger View Post
Yes, I have interest in more metadata import. However, it will take me about a week to test the plugin.
Let me know what fields you find you'd like imported and where you'd like them to go... I've found a couple more external database IDs that might be interesting, but most of the other fields I've seen I'm not sure what to do with.

I've also considered writing other plugins for other databases, and if I did that, I'd set it up so that this plugin would share overlapping database IDs.
compurandom is offline   Reply With Quote
Old 12-14-2018, 03:00 PM   #10
Jay Dugger
Junior Member
Jay Dugger began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Dec 2012
Device: none
I feel as if I ask for the sun, the moon, and the stars, but if I don't ask...

Working from the example here, https://www.wikidata.org/wiki/Q1219561,

The field series may map to the column Series; genre to a custom column "Genre;" and so on.

publisher to Publisher

language of work or name to Languages (though this should append, and not overwrite)

publication date to Published

illustrator to a custom column of the same name

narrative locations and characters also to a custom column of the same name, but these two wikidata properties might only apply to fiction...

The Commons category and the topic's main category as tags

All available identifiers as Ids.
Jay Dugger is offline   Reply With Quote
Old 12-16-2018, 10:10 AM   #11
compurandom
Guru
compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.
 
Posts: 918
Karma: 417282
Join Date: Jun 2015
Device: kobo aura h2o, kobo forma
I don't think there's a way to append to a column from a metadata plugin unless the column is already a tags column, in which case it always appends. (Languages is a tags field, so that works.)

I'm also not sure how to do a custom column, but worst case, we can try to get a feature added to calibre.

It already gets publication date.
I notice this book has both a publication date and an inception date. I was considering using inception date when publication date was not available.

Publisher is a field that typically has multiple values even for a single edition. How should that be handled?

"All available identifiers" is a tall order. Wikidata has a *lot* of them.
"All identifiers used by this list of books" is more tractable. Also, some identifiers have messy properties or may otherwise not be appropriate; for instance, with this book, the NNL work ID has 12 instances; do you really want all those added to calibre? If I add identifiers, I'd prefer to add them one at a time and give users options as to which ones they want imported, but this could quickly become a long list. This might be something appropriate for a "tweaks" style calibre preference.

Some of these fields are edition specific, and this plugin is currently specifically designed to not match to a specific volume; in fact, for the first publication date, I was considering traversing the "edition of" links to find the first edition printed. How would you propose selecting the correct edition for edition specific fields?

Last edited by compurandom; 12-16-2018 at 11:40 AM.
compurandom is offline   Reply With Quote
Old 12-16-2018, 01:40 PM   #12
compurandom
Guru
compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.
 
Posts: 918
Karma: 417282
Join Date: Jun 2015
Device: kobo aura h2o, kobo forma
Some books are in multiple series, for example:
https://www.wikidata.org/wiki/Q1122549
Calibre doesn't directly support this.

Do you have suggestions on how to handle that?

If I ignore this, it will show up as two separate books in the metadata search.
I'm not sure if calibre will try to merge them.

Last edited by compurandom; 12-16-2018 at 01:43 PM.
compurandom is offline   Reply With Quote
Old 12-16-2018, 02:12 PM   #13
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,782
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
I have a second Series_like column
I also created a search alternate that looks at both
Preferences Searching: Grouped Searches (tab): Series

If there are more than 2, the Main (calibre Series) gets set to "<various>" and I set the alternate to what I consider Primary_series (and not worry about the others)

Also, if the series is a subset of the Primary, I use Hierarchical notation in the regular series column and use the index for the branch
eg Pern.Dragon Riders of or Pern. Harper_Hall
theducks is offline   Reply With Quote
Old 12-16-2018, 02:59 PM   #14
compurandom
Guru
compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.
 
Posts: 918
Karma: 417282
Join Date: Jun 2015
Device: kobo aura h2o, kobo forma
Quote:
Originally Posted by theducks View Post
Also, if the series is a subset of the Primary, I use Hierarchical notation in the regular series column and use the index for the branch
eg Pern.Dragon Riders of or Pern. Harper_Hall
That's a great methodology. Unfortunately, it'd be really hard to code.
Like, which series goes first? How do I even detect that one is a subseries of the other?

Wikidata does not readily provide information to detect and make decisions on these things. Right now, pretty much a single query gets all the data for all the books. I'm not too keen on doing a second query for each book alternative unless I have to or unless it finds there is only a single book in the results.

I have preliminary series working. If wikidata doesn't have a series index, I set it to 0. If it is in two series, both are returned (and visible in the log) but calibre merges them, only keeping one set of series data, which presumably is somewhat random.

I'm open to suggestions on adjustments to this behavior; otherwise let me know if you'd like me to release it immediately, or wait for a few more features.

Last edited by compurandom; 12-17-2018 at 12:19 AM.
compurandom is offline   Reply With Quote
Old 12-17-2018, 12:26 AM   #15
compurandom
Guru
compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.
 
Posts: 918
Karma: 417282
Join Date: Jun 2015
Device: kobo aura h2o, kobo forma
I'm looking into how to add data to custom columns. At first pass, it looks like it isn't hard, as long as the custom column name is fixed ahead of time. ( @kovidgoyal let me know if there's gotchas or flexibility in this.)

I'm taking Jay Dugger's feature list as a priority list and will implement them in that order, but delaying harder ones unless discussion here indicates otherwise.

In the next 3-5 days I'll see if I can add (some of) illustrator, narritive locations, and characters.
compurandom is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[Metadata Source Plugin] BiblioNETka.pl fenuks Plugins 8 01-15-2024 01:19 PM
[Metadata Source Plugin] Comicvine chewt0y Plugins 88 07-11-2022 12:00 PM
[Metadata Source Plugin] Biblionet.gr wrangly Plugins 13 01-21-2021 07:46 AM
[Metadata Source Plugin] Skoob rodrigoccurvo Plugins 11 06-13-2019 06:44 PM
[Metadata Source Plugin] DIZZIE_NL Pr.BarnArt Plugins 7 08-11-2014 01:48 PM


All times are GMT -4. The time now is 06:56 PM.


MobileRead.com is a privately owned, operated and funded community.