View Single Post
Old 03-06-2011, 07:21 AM   #1
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,230
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
Metadata scraper plugin api

One of the plugins I have been considering writing is mentioned in this post.

I like the idea of a plugin into which you can add scripts to scrape data from websites for populating the book metadata in Calibre. I know Kovid is working on rewriting the metadata download api, but I doubt (correct me if wrong!) that he is considering going to this extent. The idea would be that users could right click on a book and run one of their scripts, which would scrape the data and populate whatever metadata fields they chose, be it identifiers, standard metadata fields like series/tags, or custom columns.

I believe this sounds very similar to the news recipe stuff in terms of basic infrastructure (scripts written in python, able to be added by users etc).

Assuming you are still reading and haven't thought this a really bad idea, the first consideration I have is the API that users would have available in their scripts. I don't want to try to get too clever in terms of restricting to make it user friendly. At the end of the day the scripts will be written by people who will have to know Python and any attempt to wrap stuff will invariably lead to restrictions for future versions of Calibre I will come to regret. I like what Kovid did with the plugins API in terms of not limiting the sandbox you can play in even though this means you have to get a bit dirty in poring through Calibre source code to learn how to do stuff.

With that said, the plugin is focused around scraping data for a book. So getting the user to write code that inherits from a base class and overrides a function that is passed a populated Metadata object for the current row seems sensible. On that object the user can get/set as they please all the standard metadata fields and identifiers. What I don't believe they can get access to (Charles will correct me if wrong!) is the custom column fields, which are on the db object. So I could also pass in a db object (which would give users more flexibility by also letting them do things like scrape covers).

All thoughts welcome - good/bad idea, other fields I should consider passing etc. It's just vapourware at this point.

EDIT: A potential technical challenge - is it possible to write a script that inherits from a class that exists only in a Calibre plugin zip file? Or would it require the base class to sit in the Calibre codebase, which could rather scupper the whole idea without Kovid's support.

I guess the other consideration with this is you could argue that this plugin functionality wise does overlap with standard metadata and cover download plugins - the difference being that the user would be able to granularly choose which to run and (perhaps) retrieve more data fields than is possible using the current API at least. Perhaps the whole idea does become redundant with Kovid's new API - I'm just guessing at this point

Last edited by kiwidude; 03-06-2011 at 08:01 AM.
kiwidude is offline   Reply With Quote