MobileRead Forums - View Single Post

repudi8or · 05-04-2013, 08:14 AM

Hi Folks,

Calibre is great but i find the thing i do the most of is cleaning up my metadata manually after bulk importing new files....

The "download metadata and covers" tends to bring me varied results. I would say after the first pass i still have 50% remaining with no metadata for mostly the following reasons
1. author and title backwards
2. title has series in it as well
3. author and title backwards and series is hyphenated after author
3. author and title backwards and author is in some variant of url syntax
4. author name is a slight variant on that found in amazon/google
5. title field has author, title and some other crud (like file type) hyphenated in the one string

I am wondering if there is any existing plugin that will grind away at these variants until it finds a metadata match on amazon/google (etc) ??

If not, I would be willing to have a go at creating this... I have reasonable python skills and basic java skills.

My idea would be to parse author and title fields as cleverly as i could then create a search matrix based upon the most common reasons (as above) and then just grind away with the amazon and google plugins until a likely match was found. I did think that maybe generating a replacement suggestion report requiring user approval before proceeding to update the db might be a good idea. Maybe just storing the isbn of grind-matches to pull into db after user approval and then use the normal "download meta and covers" by isbn to complete the job.

Some hints at the best way to approach this (ie sensible code hook point - between "download meta" and the amazon and google plugin calls) would help.

all thoughts welcome

Regards Rep

05-04-2013, 08:14 AM	#1
repudi8or Junior Member Posts: 4 Karma: 10 Join Date: Aug 2010 Device: kobo	metadata download grinder Hi Folks, Calibre is great but i find the thing i do the most of is cleaning up my metadata manually after bulk importing new files.... The "download metadata and covers" tends to bring me varied results. I would say after the first pass i still have 50% remaining with no metadata for mostly the following reasons 1. author and title backwards 2. title has series in it as well 3. author and title backwards and series is hyphenated after author 3. author and title backwards and author is in some variant of url syntax 4. author name is a slight variant on that found in amazon/google 5. title field has author, title and some other crud (like file type) hyphenated in the one string I am wondering if there is any existing plugin that will grind away at these variants until it finds a metadata match on amazon/google (etc) ?? If not, I would be willing to have a go at creating this... I have reasonable python skills and basic java skills. My idea would be to parse author and title fields as cleverly as i could then create a search matrix based upon the most common reasons (as above) and then just grind away with the amazon and google plugins until a likely match was found. I did think that maybe generating a replacement suggestion report requiring user approval before proceeding to update the db might be a good idea. Maybe just storing the isbn of grind-matches to pull into db after user approval and then use the normal "download meta and covers" by isbn to complete the job. Some hints at the best way to approach this (ie sensible code hook point - between "download meta" and the amazon and google plugin calls) would help. all thoughts welcome Regards Rep