Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 08-17-2011, 04:13 PM   #1
scruffy
Enthusiast
scruffy has a complete set of Star Wars action figures.scruffy has a complete set of Star Wars action figures.scruffy has a complete set of Star Wars action figures.scruffy has a complete set of Star Wars action figures.
 
Posts: 29
Karma: 324
Join Date: Mar 2008
Device: ebookwise, n800, tablet, etc
Cool Split Bulk Metadata download job size

The new metadata download plugins are awesome, but attempting to do more than about 100 books at a time in a single job causes memory issues for me, but creating multiple jobs, each with 100, seems to handle a much larger load of books (too many jobs and it still chokes, but 4-5 jobs of 100 each works, while 400-500 books in one job tends to be problematic)

I can't seem to find a way to (nor a discussion of) adding a 'job size limit' so that if you ask for 200 books to be scheduled for a metadata download, it'll create 2 jobs of 100 (or 4 jobs of 50, etc). Seems like an easy enough feature (just create multiple jobs, instead of one big job), and would make Calibre MUCH more robust for those of us with 20k+ book libraries, or even just those with 3-4k libraries. Otherwise, you have to manually grab 100 books, schedule a bulk job, grab the next 100, etc.
scruffy is offline   Reply With Quote
Old 08-17-2011, 04:22 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 29,064
Karma: 7373157
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Define "memory issues"
kovidgoyal is offline   Reply With Quote
Old 08-17-2011, 04:23 PM   #3
scruffy
Enthusiast
scruffy has a complete set of Star Wars action figures.scruffy has a complete set of Star Wars action figures.scruffy has a complete set of Star Wars action figures.scruffy has a complete set of Star Wars action figures.
 
Posts: 29
Karma: 324
Join Date: Mar 2008
Device: ebookwise, n800, tablet, etc
Not a python coder, but this is essentially what I mean:

Seems like a small change to /src/calibre/gui2/metadata/bulk_download.py

def start_download(gui, ids, callback):
d = ConfirmDialog(ids, gui)
ret = d.exec_()
d.b.clicked.disconnect()
if ret != d.Accepted:
return

[Add a loop to only add a Max of X (new variable setting?) each time and create ids/X jobs not just a single job of all ids...]
job = ThreadedJob('metadata bulk download',
_('Download metadata for %d books')%len(ids),
download, (ids, gui.current_db, d.identify, d.covers), {}, callback)
gui.job_manager.run_threaded_job(job)
[end loop]

gui.status_bar.show_message(_('Metadata download started'), 3000)
# }}}
scruffy is offline   Reply With Quote
Old 08-17-2011, 04:27 PM   #4
scruffy
Enthusiast
scruffy has a complete set of Star Wars action figures.scruffy has a complete set of Star Wars action figures.scruffy has a complete set of Star Wars action figures.scruffy has a complete set of Star Wars action figures.
 
Posts: 29
Karma: 324
Join Date: Mar 2008
Device: ebookwise, n800, tablet, etc
"memory issues", as in Out of Memory errors, outright crashes, with resulting loss of metadata downloaded in that job and all remaining jobs, etc. While Calibre is stable enough for short bulk runs, I think you'd agree that when you get into the multiple thousands of books in library, everything begins to slow down at the least.

Not running this on a massively robust machine, only a few gigs of ram.
scruffy is offline   Reply With Quote
Old 08-17-2011, 04:37 PM   #5
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 29,064
Karma: 7373157
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Batching up the downloads into job lots of 100 is very easy to do, open a bug report for it so I dont forget.
kovidgoyal is offline   Reply With Quote
Old 08-18-2011, 03:36 PM   #6
scruffy
Enthusiast
scruffy has a complete set of Star Wars action figures.scruffy has a complete set of Star Wars action figures.scruffy has a complete set of Star Wars action figures.scruffy has a complete set of Star Wars action figures.
 
Posts: 29
Karma: 324
Join Date: Mar 2008
Device: ebookwise, n800, tablet, etc
Kovid rocks...

And in under 24 hours later, the new feature is committed, and ready for release.

scruffy is offline   Reply With Quote
Old 08-19-2011, 02:42 AM   #7
collin8579
Member
collin8579 began at the beginning.
 
Posts: 21
Karma: 10
Join Date: Mar 2011
Device: Kindle
As a side note I have to do my metadata downloads in small batches or it has memory errors consistent with those mentioned above,
Is there a way to have it auto "Proceed to update library" as well?
Its not like we can go through the metadata of very large batches "easily" and see if its correct,,
collin8579 is offline   Reply With Quote
Old 08-19-2011, 09:31 AM   #8
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 29,064
Karma: 7373157
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
No it cannot. The purpose of that dialog is not primarily to let you check the downloaded data, it is to ensure that updates to the database do not happen simultaneously, thereby potentially clobbering your data.
kovidgoyal is offline   Reply With Quote
Old 10-18-2011, 02:40 AM   #9
Fluidfox
Junior Member
Fluidfox began at the beginning.
 
Fluidfox's Avatar
 
Posts: 3
Karma: 10
Join Date: May 2011
Device: iPad
I was really glad to see the 100 book per job segmentation. It cuts down on the frustration when Calibre runs out of memory on a truly massive batch metadata download or bulk conversion. I still run into memory problems for jobs that run into the 3k+ book range. I've got 16 gigs of ram, but it still eventually eats it all up, though it takes hours. I'd love to see an option to resume jobs, so you could pick up where you left off in the event Calibre crashes for any reason. You'd probably lose any progress on the current job that it crashed on, but still, that isn't a huge problem with the way it's broken up now into 100 each.
Fluidfox is offline   Reply With Quote
Old 10-18-2011, 09:15 AM   #10
salines
Zealot
salines will become famous soon enoughsalines will become famous soon enoughsalines will become famous soon enoughsalines will become famous soon enoughsalines will become famous soon enoughsalines will become famous soon enoughsalines will become famous soon enough
 
Posts: 127
Karma: 744
Join Date: Oct 2011
Device: Sony PRS-T1
Quote:
Originally Posted by Fluidfox View Post
I was really glad to see the 100 book per job segmentation. It cuts down on the frustration when Calibre runs out of memory on a truly massive batch metadata download or bulk conversion. I still run into memory problems for jobs that run into the 3k+ book range. I've got 16 gigs of ram, but it still eventually eats it all up, though it takes hours. I'd love to see an option to resume jobs, so you could pick up where you left off in the event Calibre crashes for any reason. You'd probably lose any progress on the current job that it crashed on, but still, that isn't a huge problem with the way it's broken up now into 100 each.
+1
Resuming would be really nice!
salines is offline   Reply With Quote
Old 01-13-2012, 08:31 AM   #11
sjack58
Junior Member
sjack58 began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jan 2012
Device: none
I seem to have the same issue except that I can only get metadata and covers for a maximum of 19 books. Calibre exits immediately after it has finished locating the data and I confirm the adding of the data.
sjack58 is offline   Reply With Quote
Old 03-01-2016, 04:21 PM   #12
scruffy
Enthusiast
scruffy has a complete set of Star Wars action figures.scruffy has a complete set of Star Wars action figures.scruffy has a complete set of Star Wars action figures.scruffy has a complete set of Star Wars action figures.
 
Posts: 29
Karma: 324
Join Date: Mar 2008
Device: ebookwise, n800, tablet, etc
Hey, did this functionality get lost again?

I see plugins using a 100 item breakdown (such as extractISDN), but not the main 'metadata download' (ie pick 200 books, it's a 200 item job created)

Looking over current code, the original changes are still there, but doesn't seem like current code uses it, and sets up a 10 item at a time internal job batch, but not breaking up the actual 'job'. Correct? Reason for losing this functionality?
scruffy is offline   Reply With Quote
Old 03-01-2016, 10:42 PM   #13
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 29,064
Karma: 7373157
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Since they are split internally and each internal batch is run in a new worker process, there is no point in also splitting them into multiple jobs.
kovidgoyal is offline   Reply With Quote
Old 03-01-2016, 11:10 PM   #14
scruffy
Enthusiast
scruffy has a complete set of Star Wars action figures.scruffy has a complete set of Star Wars action figures.scruffy has a complete set of Star Wars action figures.scruffy has a complete set of Star Wars action figures.
 
Posts: 29
Karma: 324
Join Date: Mar 2008
Device: ebookwise, n800, tablet, etc
Interesting... It seems though that I get much better performance creating (by hand) multiple jobs of 100, than doing a single job of 500 for example. It only runs one 100 job at a time, but something seems to slow down doing the large grouping of 500... maybe it's an illusion, but....

My given example of ExtractISBN not only does the 'expected' thing (give it 500, it makes 5 jobs of 100 each), but then it multitasks and will run multiple jobs at once... That was my expectation (and hope) for the main metadata downloader.
scruffy is offline   Reply With Quote
Old 03-01-2016, 11:14 PM   #15
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 29,064
Karma: 7373157
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
That is actually another reason bulk download is not split into multiple jobs. When you split into multiple jobs, the server load restrictions no longer work, so you will end up hammering the servers and that is likely to get your IP banned. Remember that many metadata sources dont use an API, they work by scraping web sites.
kovidgoyal is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Bulk Metadata Download Problem sweetevilbunies Library Management 6 07-04-2011 10:39 PM
Bulk metadata download incoherent madeinlisboa Calibre 6 06-24-2011 01:18 PM
Split HTML Size to Speed-Up Page Turns ade_mcc Conversion 2 02-01-2011 06:06 AM
metadata in bulk Lorraine Froggy Calibre 1 11-14-2009 09:42 PM
Bulk Metadata Download iain_benson Calibre 1 09-29-2009 11:42 AM


All times are GMT -4. The time now is 07:37 PM.


MobileRead.com is a privately owned, operated and funded community.