MobileRead Forums - View Single Post - Merging two titles and keeping 'best cover'

kjdavies · 06-04-2020, 03:21 AM

Hi All,

I'm loading a fairly large number of files (>12k in this library) that have pretty wretched metadata. About all I can count on is the publisher name, and that's only because that's how the download tool stores the files.

I can fix them by hand if I have to, but... 12k.

I can get metadata (title, authors, artists, description, even a cover image) from the source, but I've got no good key to join them. Most of the PDFs have no title information, and file names are... iffy, at best. Even though I've got a publisher name, 'APV2006.pdf' is not terribly useful as search term.

What I'm thinking is:

Create an (empty) entry for each entry in the metadata I have, that has author, publisher, title, cover image.
Present all entries for each publisher, with the cover images view up.
Match and merge by cover image, being sure to pick the one with the good title first.

It's still going to be wickedly tedious, of course... but still better than correcting things by hand.

However, the cover image from the metadata is going to be generally lower resolution than the cover image from the entry that has actual payload. Obviously I can do a bulk 'create cover image from ebook' instruction, but I'd be happy to save that step.

Is there a way to do it? Or do I just apply brute force and ignorance?

06-04-2020, 03:21 AM	#1
kjdavies Zealot Posts: 112 Karma: 53342 Join Date: Jun 2013 Device: Sony PRS-600	Merging two titles and keeping 'best cover' Hi All, I'm loading a fairly large number of files (>12k in this library) that have pretty wretched metadata. About all I can count on is the publisher name, and that's only because that's how the download tool stores the files. I can fix them by hand if I have to, but... 12k. I can get metadata (title, authors, artists, description, even a cover image) from the source, but I've got no good key to join them. Most of the PDFs have no title information, and file names are... iffy, at best. Even though I've got a publisher name, 'APV2006.pdf' is not terribly useful as search term. What I'm thinking is: Create an (empty) entry for each entry in the metadata I have, that has author, publisher, title, cover image. Present all entries for each publisher, with the cover images view up. Match and merge by cover image, being sure to pick the one with the good title first. It's still going to be wickedly tedious, of course... but still better than correcting things by hand. However, the cover image from the metadata is going to be generally lower resolution than the cover image from the entry that has actual payload. Obviously I can do a bulk 'create cover image from ebook' instruction, but I'd be happy to save that step. Is there a way to do it? Or do I just apply brute force and ignorance?