MobileRead Forums - View Single Post

kiwidude · 06-23-2011, 08:34 PM

@madinlisboa - as someone who has developed a number of metadata plugins and pored through Kovid's code I can confirm that you are not correct in your assumptions on the behaviour.

The current metadata plugins do merge fields together from different sources. However it will only do this under certain conditions. One of those conditions is that the ISBNs for each of those sources refers to the same edition, which is determined by a call to the XISBN service to identify an ISBN "pool". If results from your sources have ISBNs that fall in the same pool, then data will be selected from across those results - so it could well be series information from one website and comments from another etc.

The only other exception to this I am aware of is something I created an request for in Launchpad, and it is the more rare situation of where you get multiple results, but only a subset of which have an ISBN. This most frequently happens with future book releases. The current behaviour is to take only data from the result with the ISBN as the assumption is that this is a "better" result. I want this changed so that it will also merge with an ISBNless results from another source. So if Goodreads gives you an ISBN, but FantasticFiction has series information but no ISBN, you get both merged together.

As to why you get different results when you repeatedly search, I presume you are talking about on the same book? In which case the answer to that comes down to the ISBN. Every time you do a metadata download for a book, the ISBN for the book will get overwritten with one from the results. This means your ISBN may flop around a bit with multiple metadata retrieves. And then you may find that one of the other metadata sources will give you a different result to previous, because it happens to have different metadata for that edition of the book.

If you want maximum population of data there are a number of steps you can take. Firstly use plugins that populate the fields you are interested in for the books you have. For instance if you want series information, the Goodreads or Fantastic Fiction plugins are the best. For tags as genres, use Goodreads. For covers, I prefer B&N with FF or Goodreads as a fallback option. Others have said they like Amazon which is fine for some things (though no series or tags information and covers can be hit and miss and low resolution).

Secondly, a good ISBN will always give you the best match. The Extract ISBN plugin can help with this, provided the ISBN is in the book for it to find. Note this plugin is not infallible, not all books have an ISBN it can read, or worse sometimes the ISBN it finds is from the publisher advertising some other book within it. However a very high % of the time it gets it right, and will give you the best chance of a quality edition match with most metadata sources.

Thirdly, download metadata one at a time if you want serious quality. Take care as to what fields you have selected to overwrite - if you have title and authors checked, then make sure in the results it really is the book you expect. The metadata sources are not miracle workers as they are at the mercy of the results returned by the website search engines. Sometimes they prioritise books they want to sell in the search results (like box sets) over the actual edition you want. Or you might have the wrong ISBN and get data for a different book. By reviewing the search results of metadata download you can make sure you get a result for the right book.

Fourthly, it might be necessary to do multiple retrieves for a single book. Remember the ISBN may change with each retrieve. So if it fails to find a match on a site for your first ISBN (or perhaps you didn't even have an ISBN), it might find one with the second download. I make sure that where possible every single book in my library is linked to its FantasticFiction page, B&N and Goodreads. If I have the right ISBN I can get this in one download, sometimes it takes more.

And finally, in the case of my plugins at least, sometimes it is necessary to manually assign the identifier. All my Metadata plugins work on the basis that if you have a website specific identifier (ff: or barnesnoble: or goodreads: ) then it will jump straight to the webpage for that book to pull metadata from it. This is both fast and means you are not at the mercy of the website search engine. These identifiers will get populated automatically when the plugin finds a match for a book. However if needed you can force such a match manually by typing in the id it needs (FF/B&N) or use the Goodreads Sync plugin "Link book" feature to assign a Goodreads id. If you do this and then do a metadata retrieval as I said above it will pull data from that specific edition page of the book.

That's my tips, if you want seriously good data. It may sound like more work than hitting Ctrl+D on a bunch of books and expecting "magic" to happen. In reality you can do several books a minute and you only have to do it once...

06-23-2011, 08:34 PM	#6
kiwidude Calibre Plugins Developer Posts: 4,636 Karma: 2162064 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	@madinlisboa - as someone who has developed a number of metadata plugins and pored through Kovid's code I can confirm that you are not correct in your assumptions on the behaviour. The current metadata plugins do merge fields together from different sources. However it will only do this under certain conditions. One of those conditions is that the ISBNs for each of those sources refers to the same edition, which is determined by a call to the XISBN service to identify an ISBN "pool". If results from your sources have ISBNs that fall in the same pool, then data will be selected from across those results - so it could well be series information from one website and comments from another etc. The only other exception to this I am aware of is something I created an request for in Launchpad, and it is the more rare situation of where you get multiple results, but only a subset of which have an ISBN. This most frequently happens with future book releases. The current behaviour is to take only data from the result with the ISBN as the assumption is that this is a "better" result. I want this changed so that it will also merge with an ISBNless results from another source. So if Goodreads gives you an ISBN, but FantasticFiction has series information but no ISBN, you get both merged together. As to why you get different results when you repeatedly search, I presume you are talking about on the same book? In which case the answer to that comes down to the ISBN. Every time you do a metadata download for a book, the ISBN for the book will get overwritten with one from the results. This means your ISBN may flop around a bit with multiple metadata retrieves. And then you may find that one of the other metadata sources will give you a different result to previous, because it happens to have different metadata for that edition of the book. If you want maximum population of data there are a number of steps you can take. Firstly use plugins that populate the fields you are interested in for the books you have. For instance if you want series information, the Goodreads or Fantastic Fiction plugins are the best. For tags as genres, use Goodreads. For covers, I prefer B&N with FF or Goodreads as a fallback option. Others have said they like Amazon which is fine for some things (though no series or tags information and covers can be hit and miss and low resolution). Secondly, a good ISBN will always give you the best match. The Extract ISBN plugin can help with this, provided the ISBN is in the book for it to find. Note this plugin is not infallible, not all books have an ISBN it can read, or worse sometimes the ISBN it finds is from the publisher advertising some other book within it. However a very high % of the time it gets it right, and will give you the best chance of a quality edition match with most metadata sources. Thirdly, download metadata one at a time if you want serious quality. Take care as to what fields you have selected to overwrite - if you have title and authors checked, then make sure in the results it really is the book you expect. The metadata sources are not miracle workers as they are at the mercy of the results returned by the website search engines. Sometimes they prioritise books they want to sell in the search results (like box sets) over the actual edition you want. Or you might have the wrong ISBN and get data for a different book. By reviewing the search results of metadata download you can make sure you get a result for the right book. Fourthly, it might be necessary to do multiple retrieves for a single book. Remember the ISBN may change with each retrieve. So if it fails to find a match on a site for your first ISBN (or perhaps you didn't even have an ISBN), it might find one with the second download. I make sure that where possible every single book in my library is linked to its FantasticFiction page, B&N and Goodreads. If I have the right ISBN I can get this in one download, sometimes it takes more. And finally, in the case of my plugins at least, sometimes it is necessary to manually assign the identifier. All my Metadata plugins work on the basis that if you have a website specific identifier (ff: or barnesnoble: or goodreads: ) then it will jump straight to the webpage for that book to pull metadata from it. This is both fast and means you are not at the mercy of the website search engine. These identifiers will get populated automatically when the plugin finds a match for a book. However if needed you can force such a match manually by typing in the id it needs (FF/B&N) or use the Goodreads Sync plugin "Link book" feature to assign a Goodreads id. If you do this and then do a metadata retrieval as I said above it will pull data from that specific edition page of the book. That's my tips, if you want seriously good data. It may sound like more work than hitting Ctrl+D on a bunch of books and expecting "magic" to happen. In reality you can do several books a minute and you only have to do it once... Last edited by kiwidude; 06-23-2011 at 08:41 PM. Reason: typos