AAGGGGGGGGGHHHHHHHHH - Memory Errors, Leaks and WTF's!

mightymouse2045 · 05-25-2011, 10:13 PM

Hi there,

I am importing my entire library of over 14,000 ebooks into Calibre.

I have gone to a lot of trouble pre organising all my books into ln, fn - series - title or ln, fn - title (and vice versa fn ln). So I have 4 groupings of books which has captured 90% of my Library - the rest (approx 2000) are in leftover groupings.

I then import the books into calibre and making sure I clean up the few imported incorrectly (over 1000) etc I wanted to run the ISBN import plugin - but it isn't accurate and also it takes aaaaaaaages and if you select more than 500 at a time chances are it will run into a format that hangs the plugin (right at about 99% of the way through) - So I skipped doing the ISBN import.

So I start runnning the download metadata process and I think everything is going along smoothly, however upon going through my library I find that even if you have the correct title, series, series index, and author (EVERYTHING is CORRECT) and it's taken like hours and hours of work to confirm everything is correct, if the search returns 3 results 2 of which are correct (and would suffice), if there is a newer result returned (by year), or similar name but different authors etc the one at the top is selected even if it's completely wrong.

Now your thinking how often can this happen - very often is my answer as i am time and time again having to go through and check everything individually and we are talking out of 1000 titles maybe 100 to 200 titles end up being renamed either to something entirely incorrect, or just the title is correct but different series or author and sometimes everything is incorrect but the correct cover was downloaded?!?

After spending so much time making sure everything is correct only to have everything renamed automatically is like WTF?!?

Also I get memory errors after 4 hours and the entire process was a waste of time because it doesn't save the results it just downloaded for the 980 out of 1000 books it processed before getting the memory error and the memory usage of calibre has jumped to 1.4GB RAM.

Ok so I select 500 books next time and depending on a number of things it takes 4 hours to do those also, and sometimes I get a memory errors also, there is a serious memory leak in Calibre and I've learned the hard way to restart it if it's memory usage gets above 300MB (it starts with 150MB) especially before any big jobs.

Long story short - I can use windows explorer to accurately search my books - I know they are correct and I don't have to think about it... I am very quickly losing confidence in calibre as being accurate and I dread having to manually go through 10,000 books one at a time to confirm they are all correct especially that once it imports the books in calibre if the book has no cover page and happens to be one of those books that doesn't state the name of the book anywhere else your then having to revert searching sizes, or text search (oops text searching .lit or .pdf or numerous other formats - had to buy a grep program to do that).

If I did nothing else for the next month and sat at my computer for 12 hours a day going through 1 book at a time to confirm it is correctly titled, or the series hasn't been changed or the correct series number or the correct author is being displayed then I would be just about done with my library.

Can you do something like ComicRack that will return the results for the titles you are downloading metadata for if more than one result is returned or if the title and or series and or authors are incorrect, allowing you to select the CORRECT title you are downloading for or SKIP that title?!??!!?

I wouldn't mind sitting there baby sitting the process doing 100 metadata downloads at a time if I knew it wasn't going to screw with what I have that I know is correct UNLESS it prompts me to do so.

You might want to have a history for a file as well - or a short summary for the metadata download that summarises BEFORE and AFTER for each file for example:

BEFORE
Dragons of Winter Night - Dragonlance: Chronicles Trilogy [2] - Margaret Weis & Tracy Hickman

AFTER
Dragonlance - Chronicles [2] - Tracy Hickman & Douglas Niles

(that would have been a skip choice for me or download cover only choice - there are other ways that give better cover and or metadata results (MANUALLY SEARCHING AND PASTING IT IN FOR ONE!!! than by downloading crap metadata which RENAMES EVERYTHING - just coz that's all it's returning)

Sigh Peace

Manichean · 05-26-2011, 01:00 PM

Do single metadata lookups. Those allow you to select the result you want.

kiwidude · 05-26-2011, 01:22 PM

The ISBN Extract plugin should not have issues with memory leaks if you are using the latest versions - it was explicitly rewritten to prevent that issue in v1.3.4. As for "not being accurate" - it is at the mercy of the inputs you give it, it can never replace a human lookup. If publishers decide to put ISBNs for related books at the front of a book there is not a thing the plugin can do about it. However it gets it right more often than not.

Doing metadata lookups in bulk is always a lottery - again the Calibre plugins are at the mercy of whatever the search engine on the website gives them back. A title which is exactly how you want it may cause ambiguity for the website you are querying. A title which is too long and explicit may result in no ideal matches in which case some websites throw back any old garbage as a "best match". I know my own plugins like Goodreads and B&N etc do their best to validate that the author/titles match what you asked for but there will always be corner cases where it cannot be perfect. Having the ISBN is your best chance of a match to suit. Alternatively use the Goodreads Sync plugin to link your books to a Goodreads one and have enabled the option to put the Goodreads ISBN into your book field. Then do the metadata lookup - its an extra step but will almost certainly guarantee you get the right results if you linked to the right book.

If you haven't done so already try using different metadata source plugins, different combinations turned on etc to find the best source for the books you prefer. Don't turn every single one on. For myself I "walk the talk" and only have enabled the B&N, Goodreads and Fantastic Fiction plugins. That way if I find any issues I can push a new version, and they give me back the sort of descriptions, series and covers I prefer. But to each their own.

As for overwriting title/series/authors - well a simple answer to that is to untick those options in the metadata download configuration, so they won't get overwritten. If you are happy with the values from your import then why would you want the values from elsewhere?

Putting 10,000 books in a library is not a trivial exercise if you want that data to have any quality in your library. That is a lifetime worth of reading. Calibre is not a miracle worker, there is no industry standard for book metadata or how websites make data available. Go to a website like LibraryThing and look at the pile of steaming dung their website data contains from allowing data from any source to be given to it with no validation.

05-25-2011, 10:13 PM	#1
mightymouse2045 Enthusiast Posts: 30 Karma: 10 Join Date: May 2011 Device: xoom	AAGGGGGGGGGHHHHHHHHH - Memory Errors, Leaks and WTF's! Hi there, I am importing my entire library of over 14,000 ebooks into Calibre. I have gone to a lot of trouble pre organising all my books into ln, fn - series - title or ln, fn - title (and vice versa fn ln). So I have 4 groupings of books which has captured 90% of my Library - the rest (approx 2000) are in leftover groupings. I then import the books into calibre and making sure I clean up the few imported incorrectly (over 1000) etc I wanted to run the ISBN import plugin - but it isn't accurate and also it takes aaaaaaaages and if you select more than 500 at a time chances are it will run into a format that hangs the plugin (right at about 99% of the way through) - So I skipped doing the ISBN import. So I start runnning the download metadata process and I think everything is going along smoothly, however upon going through my library I find that even if you have the correct title, series, series index, and author (EVERYTHING is CORRECT) and it's taken like hours and hours of work to confirm everything is correct, if the search returns 3 results 2 of which are correct (and would suffice), if there is a newer result returned (by year), or similar name but different authors etc the one at the top is selected even if it's completely wrong. Now your thinking how often can this happen - very often is my answer as i am time and time again having to go through and check everything individually and we are talking out of 1000 titles maybe 100 to 200 titles end up being renamed either to something entirely incorrect, or just the title is correct but different series or author and sometimes everything is incorrect but the correct cover was downloaded?!? After spending so much time making sure everything is correct only to have everything renamed automatically is like WTF?!? Also I get memory errors after 4 hours and the entire process was a waste of time because it doesn't save the results it just downloaded for the 980 out of 1000 books it processed before getting the memory error and the memory usage of calibre has jumped to 1.4GB RAM. Ok so I select 500 books next time and depending on a number of things it takes 4 hours to do those also, and sometimes I get a memory errors also, there is a serious memory leak in Calibre and I've learned the hard way to restart it if it's memory usage gets above 300MB (it starts with 150MB) especially before any big jobs. Long story short - I can use windows explorer to accurately search my books - I know they are correct and I don't have to think about it... I am very quickly losing confidence in calibre as being accurate and I dread having to manually go through 10,000 books one at a time to confirm they are all correct especially that once it imports the books in calibre if the book has no cover page and happens to be one of those books that doesn't state the name of the book anywhere else your then having to revert searching sizes, or text search (oops text searching .lit or .pdf or numerous other formats - had to buy a grep program to do that). If I did nothing else for the next month and sat at my computer for 12 hours a day going through 1 book at a time to confirm it is correctly titled, or the series hasn't been changed or the correct series number or the correct author is being displayed then I would be just about done with my library. Can you do something like ComicRack that will return the results for the titles you are downloading metadata for if more than one result is returned or if the title and or series and or authors are incorrect, allowing you to select the CORRECT title you are downloading for or SKIP that title?!??!!? I wouldn't mind sitting there baby sitting the process doing 100 metadata downloads at a time if I knew it wasn't going to screw with what I have that I know is correct UNLESS it prompts me to do so. You might want to have a history for a file as well - or a short summary for the metadata download that summarises BEFORE and AFTER for each file for example: BEFORE Dragons of Winter Night - Dragonlance: Chronicles Trilogy [2] - Margaret Weis & Tracy Hickman AFTER Dragonlance - Chronicles [2] - Tracy Hickman & Douglas Niles (that would have been a skip choice for me or download cover only choice - there are other ways that give better cover and or metadata results (MANUALLY SEARCHING AND PASTING IT IN FOR ONE!!! than by downloading crap metadata which RENAMES EVERYTHING - just coz that's all it's returning) Sigh Peace

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
What are your thoughts on the Wiki Leaks scandal?	naivejenni	Lounge	4	11-30-2010 10:39 AM
Samsung Galaxy S, Droid Incredible get 2.2 Leaks	kjk	Android Devices	0	07-28-2010 12:25 PM
I/O Errors copying from internal memory	Gafry	Bookeen	17	12-06-2007 03:49 PM

05-26-2011, 01:00 PM	#2
Manichean Wizard Posts: 3,130 Karma: 91256 Join Date: Feb 2008 Location: Germany Device: Cybook Gen3	Do single metadata lookups. Those allow you to select the result you want.

05-26-2011, 01:22 PM	#3
kiwidude Calibre Plugins Developer Posts: 4,741 Karma: 2208556 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	The ISBN Extract plugin should not have issues with memory leaks if you are using the latest versions - it was explicitly rewritten to prevent that issue in v1.3.4. As for "not being accurate" - it is at the mercy of the inputs you give it, it can never replace a human lookup. If publishers decide to put ISBNs for related books at the front of a book there is not a thing the plugin can do about it. However it gets it right more often than not. Doing metadata lookups in bulk is always a lottery - again the Calibre plugins are at the mercy of whatever the search engine on the website gives them back. A title which is exactly how you want it may cause ambiguity for the website you are querying. A title which is too long and explicit may result in no ideal matches in which case some websites throw back any old garbage as a "best match". I know my own plugins like Goodreads and B&N etc do their best to validate that the author/titles match what you asked for but there will always be corner cases where it cannot be perfect. Having the ISBN is your best chance of a match to suit. Alternatively use the Goodreads Sync plugin to link your books to a Goodreads one and have enabled the option to put the Goodreads ISBN into your book field. Then do the metadata lookup - its an extra step but will almost certainly guarantee you get the right results if you linked to the right book. If you haven't done so already try using different metadata source plugins, different combinations turned on etc to find the best source for the books you prefer. Don't turn every single one on. For myself I "walk the talk" and only have enabled the B&N, Goodreads and Fantastic Fiction plugins. That way if I find any issues I can push a new version, and they give me back the sort of descriptions, series and covers I prefer. But to each their own. As for overwriting title/series/authors - well a simple answer to that is to untick those options in the metadata download configuration, so they won't get overwritten. If you are happy with the values from your import then why would you want the values from elsewhere? Putting 10,000 books in a library is not a trivial exercise if you want that data to have any quality in your library. That is a lifetime worth of reading. Calibre is not a miracle worker, there is no industry standard for book metadata or how websites make data available. Go to a website like LibraryThing and look at the pile of steaming dung their website data contains from allowing data from any source to be given to it with no validation.

Advert