MobileRead Forums - View Single Post

davidfor · 04-02-2021, 06:30 AM

Quote:

Originally Posted by Boilerplate4U

A cold pint of lager always helps!

Thanks, but base.py doesn't really provide any direct hints regarding how to grasp the workflow that currently is the essential part for me.

base.py gives you the API the metadata source plugins need to implement. How you implement them is up to you. But, identify is called with the search parameters. And the results, as Metadata objects, are added to "result_queue". How that happens is up to you. It depends on the site. If you are looking only with identifiers, the search part is relatively simple. If the site offers a good API, it becomes even easier.

Quote:

As of searching, I think it's quite straight forward, either use ids like isbn otherwise title/author.

Yes, that is it. But, exactly how you do it is up to you. And the site. But most of the metadata plugins follow the pattern I outlined above:

Do the following search, but stop when matches are found:
1. Using the site's identifier if it is known
2. Using the ISBN if it is known and the site supports it.
3. Using the Title and Author
4. Using the Title.
For the close matches, get the full details and build Metadata objects to return.

Quote:

Some of the scientific sites I plan to use offers a api-key by register an account that allows volume access for free (tho I'm quite sure there is some kind of limit anyhow)

Which can be a problem. Kovid had to remove support for WorldCat because they changed the rules on the API so that the limits were just to low for the potential number of users.

Quote:

I believe source code samples does really matter like in the spirit of the design philosophy Specification By Example (imho).

Thanks about the info regarding how the plugins are executing within calibre.

Somewhat OT, but do you have any knowledge about scraping epub for metadata using EPUBMetadataReader? It seem that the <dc:identifier> is not used to extract isbn from content.opf. Question: Do you possibly have a clue as to which source file that may cope with this?

Snippet from content.opf:

Code:

<package version="2.0" unique-identifier="bookid">
  <metadata>
    <dc:identifier id="bookid">9781783984343</dc:identifier>
    <dc:title >Reactive Programming with Scala and Akka</dc:title>
    <dc:publisher >Packt Publishing</dc:publisher>
    <dc:language >en</dc:language>
    <meta name="cover" content="cover-image"/>
  </metadata>

I haven't looked at it for a long time, but I think the EPUBMetadataReader metadata reader plugin is only reading the OPF file. That is where it gets the ISBN, but, your example is not how the ISBN is identified. The following line is:

Code:

    <dc:identifier opf:scheme="ISBN">9781927464243</dc:identifier>

The "opf:scheme" describes how the identifier is used.