View Single Post
Old 11-04-2018, 06:57 PM   #1
compurandom
Guru
compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.
 
Posts: 917
Karma: 417282
Join Date: Jun 2015
Device: kobo aura h2o, kobo forma
[Metadata Source Plugin] wikidata

This plugin attempts to find a book in Wikipedia's wikidata and download metadata.

Once a number of books have wikidata identifiers, you might want to use the wikidata gui plugin to merge in additional metadata

Note that this plugin is designed to be used for low volume searching for single books that the wikidata-gui plugin can't find. If you try to search for multiple books at once, it may put undue stress on the wikidata servers, where the wikidata-gui plugin tries to be more efficent with bulk searches. You may end up being rate limited by this.

Features supported in version 2.0:
  • Python3 support for Calibre 5 (should work with Calibre 4 but untested)
  • Search wikidata database by author, title and combinations of author and title, isbn, gutenberg ID, and wikidata ID
  • Import first publication date, gutenberg ID, series data, ISBN
  • Import several properties (instance, genre) as tags
  • Automatic detection and conversion of Overdrive linked gutenberg IDs
  • Link wikidata ID to make browsing of additional wikidata info easy
  • Link gutenberg ID to Project Gutenberg website for easy book import
  • After finding a book, offer images linked in wikidata as covers
  • Integrate handling of multiple external identifiers with the Wikidata GUI plugin

Constructive criticism for this plugin would be greatly appreciated.
No new releases are planned for this plugin unless features are requested.

Examples of books that are in wikipedia but are not found by this plugin (or have metadata you want to import) are very welcome!

Version History:
Spoiler:

Version 2.0.0 - 27 Nov 2020
Fixes for python3 / Calibre 5.4, hopefully this still works with python2
Upgrade to Sparql 1.9.0.dev0
Attempt at better error handling, at least better logging
Note: this version has not been tested with Calibre 4, reports welcome, but there's not much reason other than python3 to upgrade at this time.

Version 1.3.0 - 13 Jan 2019
refactored url / identifier code to work with wikidata gui plugin

maybe fixed bugs in the url paste translator

Version 1.2.0 - 19 Dec 2018
Add ISBN, series to metadata
improve ISBN search
add support for translating urls pasted into ID
Fixed a crash in fuzzy search for books with unknown author

Version 1.1.0 - 15 Nov 2018
Added cover download capability
Add option to save translated or found gutenberg IDs
Add option to use wikidata Q codes or descriptions for instance/genre tags
- existing calibre filtering of tags works well with this

bugs fixed:
fixed crash on book search with unknown author
Use all gutenberg IDs found in wikidata
removed internal commas from tags

bugs:
Only finds first overdrive gutenberg ID attached to a book in existing metadata
(no intent to fix this unless someone complains, it works well enough)

Version 1.0.0 - 12 Nov 2018

Features added:
Save wikidata "instance of" and "genre" properties as tags (optionally)
support gutenberg book IDs (finding, searching by, saving)
additional more accurate searches
inexact searches with better sorting (ie., author words, title words)
support for languages other than english
use more inclusive wikidata literature and publication types
add support for keep_dup

Bugs fixed:
better formatting of exception errors
finds more books successfully

Version 0.5.0 - 9 Nov 2018
New features:
Add support for saving and using the wikidata ID
add options: slow search, ignore wikidata, language(default=en)
much improved book hit rate:
search for all subclasses of written works instead of just books
search for alternate titles too
find books without publication dates too

Bugs fixed:
Display the actual title and authors found
improved logging of search attempts
better selection of correct relevant book from fuzzy matches

Version 0.1.0 - 4 Nov 2018
Initial version, pubdate only, limited search options


Searches performed (stops on first success currently):
Spoiler:

(optional) indicates this search can be disabled or enabled
Search by saved wikidata ID (optional)
Search by saved or found Project Gutenberg id (optional)
Search by ISBN
Search for exact author/title
Search for Title only (inexact)
Search for first author only, sort by closest title match (inexact)
Search by partial title match (inexact, slow)


Known bugs:
Spoiler:
  • ISBN search is not well tested and does not handle multiple editions even though wikidata requires ISBNs to be attached to editions only
  • Exact searches stop less exact searches even when they are wrong (fix would allow individual selection of each search)
  • Calibre version 3.35 or later needed to keep more than the first book returned (bug #1802293 )
  • Only the first gutenberg ID from Overdrive is used for matching, but all of them found in wikidata will be imported.
  • Sometimes finds works that are not written works when using inexact searches (so examine results carefully if you turn this on!)
  • Does not always properly handle wikidata where a book has multiple editions; this is rare but may become less rare in the future; a fix is being considered
  • tries to use the oldest publication date found rather than the date of a specific edition (this is a feature really--complain if you want this to be an option)


Possible future features:
No work is planned on any of these unless someone asks for them.
Spoiler:
  • better handling of book editions in wikidata
  • remove subtitles from titles during title search to try to find more titles successfully
  • Other wikidata properties (publisher, comments, wikidata appropriate for tags); GUI plugin does this already
  • further refinement of book search is possible (but is it necessary?)


Note: Some authors ask for payment or donations. I ask that if you use this plugin, drop a note in this thread so I know its not just people downloading it and deleting it.

Let me know what features you use, and especially what features you'd like added!
Maybe let me know roughly how often you find books with this, what your success rate is, maybe examples of metadata from books you can't find in wikidata.

My motivation for development is driven purely by feedback!
Attached Files
File Type: zip Wikidata.zip (38.4 KB, 58938 views)

Last edited by compurandom; 11-29-2020 at 02:29 AM. Reason: update to 2.0
compurandom is offline   Reply With Quote