![]() |
#1 |
Enthusiast
![]() ![]() ![]() ![]() Posts: 29
Karma: 324
Join Date: Mar 2008
Device: ebookwise, n800, tablet, etc
|
![]()
The new metadata download plugins are awesome, but attempting to do more than about 100 books at a time in a single job causes memory issues for me, but creating multiple jobs, each with 100, seems to handle a much larger load of books (too many jobs and it still chokes, but 4-5 jobs of 100 each works, while 400-500 books in one job tends to be problematic)
I can't seem to find a way to (nor a discussion of) adding a 'job size limit' so that if you ask for 200 books to be scheduled for a metadata download, it'll create 2 jobs of 100 (or 4 jobs of 50, etc). Seems like an easy enough feature (just create multiple jobs, instead of one big job), and would make Calibre MUCH more robust for those of us with 20k+ book libraries, or even just those with 3-4k libraries. Otherwise, you have to manually grab 100 books, schedule a bulk job, grab the next 100, etc. |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,133
Karma: 27110892
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Define "memory issues"
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Enthusiast
![]() ![]() ![]() ![]() Posts: 29
Karma: 324
Join Date: Mar 2008
Device: ebookwise, n800, tablet, etc
|
Not a python coder, but this is essentially what I mean:
Seems like a small change to /src/calibre/gui2/metadata/bulk_download.py def start_download(gui, ids, callback): d = ConfirmDialog(ids, gui) ret = d.exec_() d.b.clicked.disconnect() if ret != d.Accepted: return [Add a loop to only add a Max of X (new variable setting?) each time and create ids/X jobs not just a single job of all ids...] job = ThreadedJob('metadata bulk download', _('Download metadata for %d books')%len(ids), download, (ids, gui.current_db, d.identify, d.covers), {}, callback) gui.job_manager.run_threaded_job(job) [end loop] gui.status_bar.show_message(_('Metadata download started'), 3000) # }}} |
![]() |
![]() |
![]() |
#4 |
Enthusiast
![]() ![]() ![]() ![]() Posts: 29
Karma: 324
Join Date: Mar 2008
Device: ebookwise, n800, tablet, etc
|
"memory issues", as in Out of Memory errors, outright crashes, with resulting loss of metadata downloaded in that job and all remaining jobs, etc. While Calibre is stable enough for short bulk runs, I think you'd agree that when you get into the multiple thousands of books in library, everything begins to slow down at the least.
Not running this on a massively robust machine, only a few gigs of ram. |
![]() |
![]() |
![]() |
#5 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,133
Karma: 27110892
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Batching up the downloads into job lots of 100 is very easy to do, open a bug report for it so I dont forget.
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Enthusiast
![]() ![]() ![]() ![]() Posts: 29
Karma: 324
Join Date: Mar 2008
Device: ebookwise, n800, tablet, etc
|
Kovid rocks...
And in under 24 hours later, the new feature is committed, and ready for release.
![]() |
![]() |
![]() |
![]() |
#7 |
Member
![]() Posts: 21
Karma: 10
Join Date: Mar 2011
Device: Kindle
|
As a side note I have to do my metadata downloads in small batches or it has memory errors consistent with those mentioned above,
Is there a way to have it auto "Proceed to update library" as well? Its not like we can go through the metadata of very large batches "easily" and see if its correct,, |
![]() |
![]() |
![]() |
#8 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,133
Karma: 27110892
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
No it cannot. The purpose of that dialog is not primarily to let you check the downloaded data, it is to ensure that updates to the database do not happen simultaneously, thereby potentially clobbering your data.
|
![]() |
![]() |
![]() |
#9 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: May 2011
Device: iPad
|
I was really glad to see the 100 book per job segmentation. It cuts down on the frustration when Calibre runs out of memory on a truly massive batch metadata download or bulk conversion. I still run into memory problems for jobs that run into the 3k+ book range. I've got 16 gigs of ram, but it still eventually eats it all up, though it takes hours. I'd love to see an option to resume jobs, so you could pick up where you left off in the event Calibre crashes for any reason. You'd probably lose any progress on the current job that it crashed on, but still, that isn't a huge problem with the way it's broken up now into 100 each.
|
![]() |
![]() |
![]() |
#10 | |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 127
Karma: 744
Join Date: Oct 2011
Device: Sony PRS-T1
|
Quote:
Resuming would be really nice! |
|
![]() |
![]() |
![]() |
#11 |
Junior Member
![]() Posts: 1
Karma: 10
Join Date: Jan 2012
Device: none
|
I seem to have the same issue except that I can only get metadata and covers for a maximum of 19 books. Calibre exits immediately after it has finished locating the data and I confirm the adding of the data.
|
![]() |
![]() |
![]() |
#12 |
Enthusiast
![]() ![]() ![]() ![]() Posts: 29
Karma: 324
Join Date: Mar 2008
Device: ebookwise, n800, tablet, etc
|
Hey, did this functionality get lost again?
I see plugins using a 100 item breakdown (such as extractISDN), but not the main 'metadata download' (ie pick 200 books, it's a 200 item job created) Looking over current code, the original changes are still there, but doesn't seem like current code uses it, and sets up a 10 item at a time internal job batch, but not breaking up the actual 'job'. Correct? Reason for losing this functionality? |
![]() |
![]() |
![]() |
#13 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,133
Karma: 27110892
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Since they are split internally and each internal batch is run in a new worker process, there is no point in also splitting them into multiple jobs.
|
![]() |
![]() |
![]() |
#14 |
Enthusiast
![]() ![]() ![]() ![]() Posts: 29
Karma: 324
Join Date: Mar 2008
Device: ebookwise, n800, tablet, etc
|
Interesting... It seems though that I get much better performance creating (by hand) multiple jobs of 100, than doing a single job of 500 for example. It only runs one 100 job at a time, but something seems to slow down doing the large grouping of 500... maybe it's an illusion, but....
My given example of ExtractISBN not only does the 'expected' thing (give it 500, it makes 5 jobs of 100 each), but then it multitasks and will run multiple jobs at once... That was my expectation (and hope) for the main metadata downloader. |
![]() |
![]() |
![]() |
#15 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,133
Karma: 27110892
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
That is actually another reason bulk download is not split into multiple jobs. When you split into multiple jobs, the server load restrictions no longer work, so you will end up hammering the servers and that is likely to get your IP banned. Remember that many metadata sources dont use an API, they work by scraping web sites.
|
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Bulk Metadata Download Problem | sweetevilbunies | Library Management | 6 | 07-04-2011 10:39 PM |
Bulk metadata download incoherent | madeinlisboa | Calibre | 6 | 06-24-2011 01:18 PM |
Split HTML Size to Speed-Up Page Turns | ade_mcc | Conversion | 2 | 02-01-2011 06:06 AM |
metadata in bulk | Lorraine Froggy | Calibre | 1 | 11-14-2009 09:42 PM |
Bulk Metadata Download | iain_benson | Calibre | 1 | 09-29-2009 11:42 AM |