Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 03-06-2011, 06:21 AM   #1
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,601
Karma: 2092290
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
Metadata scraper plugin api

One of the plugins I have been considering writing is mentioned in this post.

I like the idea of a plugin into which you can add scripts to scrape data from websites for populating the book metadata in Calibre. I know Kovid is working on rewriting the metadata download api, but I doubt (correct me if wrong!) that he is considering going to this extent. The idea would be that users could right click on a book and run one of their scripts, which would scrape the data and populate whatever metadata fields they chose, be it identifiers, standard metadata fields like series/tags, or custom columns.

I believe this sounds very similar to the news recipe stuff in terms of basic infrastructure (scripts written in python, able to be added by users etc).

Assuming you are still reading and haven't thought this a really bad idea, the first consideration I have is the API that users would have available in their scripts. I don't want to try to get too clever in terms of restricting to make it user friendly. At the end of the day the scripts will be written by people who will have to know Python and any attempt to wrap stuff will invariably lead to restrictions for future versions of Calibre I will come to regret. I like what Kovid did with the plugins API in terms of not limiting the sandbox you can play in even though this means you have to get a bit dirty in poring through Calibre source code to learn how to do stuff.

With that said, the plugin is focused around scraping data for a book. So getting the user to write code that inherits from a base class and overrides a function that is passed a populated Metadata object for the current row seems sensible. On that object the user can get/set as they please all the standard metadata fields and identifiers. What I don't believe they can get access to (Charles will correct me if wrong!) is the custom column fields, which are on the db object. So I could also pass in a db object (which would give users more flexibility by also letting them do things like scrape covers).

All thoughts welcome - good/bad idea, other fields I should consider passing etc. It's just vapourware at this point.

EDIT: A potential technical challenge - is it possible to write a script that inherits from a class that exists only in a Calibre plugin zip file? Or would it require the base class to sit in the Calibre codebase, which could rather scupper the whole idea without Kovid's support.

I guess the other consideration with this is you could argue that this plugin functionality wise does overlap with standard metadata and cover download plugins - the difference being that the user would be able to granularly choose which to run and (perhaps) retrieve more data fields than is possible using the current API at least. Perhaps the whole idea does become redundant with Kovid's new API - I'm just guessing at this point

Last edited by kiwidude; 03-06-2011 at 07:01 AM.
kiwidude is offline   Reply With Quote
Old 03-06-2011, 10:15 AM   #2
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,703
Karma: 6658935
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by kiwidude View Post
So getting the user to write code that inherits from a base class and overrides a function that is passed a populated Metadata object for the current row seems sensible. On that object the user can get/set as they please all the standard metadata fields and identifiers.
Setting a field in a Metadata object has no effect. You would need to call set_metadata.
Quote:
What I don't believe they can get access to (Charles will correct me if wrong!) is the custom column fields, which are on the db object.
The Metadata object has all the metadata in it, including custom columns and user categories.
Quote:
So I could also pass in a db object (which would give users more flexibility by also letting them do things like scrape covers).
The metadata object also can have the cover, if get_metadata is called with get_cover=True.

I am not sure whether passing a db handle is a good idea or not. If you pass it, you are giving away the keys to the city. If you don't, then there are some things like plugin custom data that the subclass would be unable to get.
Quote:
EDIT: A potential technical challenge - is it possible to write a script that inherits from a class that exists only in a Calibre plugin zip file? Or would it require the base class to sit in the Calibre codebase, which could rather scupper the whole idea without Kovid's support.
It works both with the 'with self: import foo' scheme and with the zipimport scheme. As usual, the problem with the 'with xxx' scheme will be with name collisions. It would not be a good idea to have multiple plugins containing different versions of your base class all having the same name. For that reason I suggest that you use the zipimport technique, which will eliminate that problem.
chaley is offline   Reply With Quote
Advert
Old 03-06-2011, 10:21 AM   #3
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,778
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
The only use I can see for this functionality is to have plugins that populate custom columns. And a better solution for that, IMO, is to use a plugboards type functionality which allows the user to tell a standard metadata download plugin to copy/move some metadata from a standard field to a custom field.
kovidgoyal is offline   Reply With Quote
Old 03-06-2011, 10:56 AM   #4
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,601
Karma: 2092290
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
Thanks chaley/Kovid for the replies.
Quote:
Originally Posted by kovidgoyal View Post
The only use I can see for this functionality is to have plugins that populate custom columns. And a better solution for that, IMO, is to use a plugboards type functionality which allows the user to tell a standard metadata download plugin to copy/move some metadata from a standard field to a custom field.
Until chaley posted I wasn't aware that you could set custom columns on a Metadata object. I had looked at the print output and saw nothing. Looking again I believe the get_user_metadata and set_user_metadata methods are what I should have spotted.

If you can do that, then in theory you should be able to do most of what I want with a metadata download plugin (custom columns set on an mi returned by the API will get saved right?).

So I think it requires a few things:

(1) user control over which plugin to run. By user control I don't mean the chore of drilling into the plugin preferences dialog and enabling/disabling plugins. Instead a simple right-click to execute a particular metadata download or cover download plugin that you have installed.

(2) Enhanced versions of the metadata download plugins to retrieve more data than just the "standard" fields they do currently. So for instance an example by the OP in the thread I referenced above was getting the price for a book. The enhanced metadata download plugin would need to scrape all the data it can that might be "interesting" for the user to choose from?

(3) Configuration to grab the data and assign it to columns for that plugin. As you have said a plugboard type approach could be used, but I don't think that just the "standard" metadata fields would be sufficient?

Any thoughts?
kiwidude is offline   Reply With Quote
Old 03-06-2011, 11:04 AM   #5
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,778
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
It's easy enough to stuff non-standard metadata into a dict that the plugboard system can use to populate user specified custom columns.
kovidgoyal is offline   Reply With Quote
Advert
Old 03-06-2011, 11:58 AM   #6
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,703
Karma: 6658935
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
putting aside the finding of interesting data ...

If, for example, a plugin assigns a value to mi.price, then the template engine can retrieve that value. Currently doing so requires using the raw_field function, but there is no overriding reason that normal template references couldn't work. The only issue is field-specific formatting, but given that the template engine cannot know what the field means, the only thing the engine can choose to do is produce a string.

Given the above, then it is easy to imagine 'plugboards' that massage raw data and store it into arbitrary metadata fields. We have something similar to that function in metadata search/replace, which handles type and is_multiple mismatches.

As far as I can see, the creation and maintenance of the screen maps describing where 'interesting' information is and how it is to be scraped is the hard part. Especially maintenance, given that the page formats change on a regular basis. I did something like this for regression testing of an application. It turned out that maintaining the testing scraper was as hard as maintaining the application.
chaley is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[Metadata Download Plugin] Goodreads Metadata **Deprecated** kiwidude Plugins 30 04-23-2011 02:10 PM
Metadata downlad plugin trouble Daermond Plugins 7 10-28-2010 03:33 PM
metadata plugin redneck_momma Plugins 1 05-21-2010 08:41 PM
calibre now uses the Google Books API to get metadata kovidgoyal Calibre 9 03-23-2009 09:36 PM
Ubook plugin api Dopedangel Reading and Management 0 08-25-2007 06:54 AM


All times are GMT -4. The time now is 12:39 AM.


MobileRead.com is a privately owned, operated and funded community.