![]() |
#1 |
Junior Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3
Karma: 4500
Join Date: Aug 2013
Device: kindle
|
![]()
Hey guys,
I have a massive ebook collection of about 200,000 books. My problem is that they are completely unorganized, which makes finding a specific book a pain. After doing some searching online I decided the easiest way to get them organized is to add them through calibre. Since I have so many books, I have been looking into the command line interface, so that the process would hopefully be quicker and more reliable. So far I have figured out how to add the books through the command line using calibredb add -d {dir}. I am also splitting up the books I add using wildcards in the {dir} (e.g. calibredb add -d ./A*). Below outlines a few questions I have. 1) Would calibre be able to handle that many books? 2) Through the GUI, I know how to do a bulk metadownload. However, I ran this through last night (7 hours) and it only got about 3000 books done. I had also selected to not download the covers and only download metadata. Therefore, I estimate that it would take about 467 hours to download the metadata for all my books! ![]() 3) I have downloaded some plugins providing additional sources to find the metadata (mostly from kiwidude), hoping to decrease the number of books that calibre is unable to find the metadata for. The following is the metadata sources that I have currently selected: Amazon.com, Anobii Fetcher, Barnes & Noble, Edelweiss, Fantastic Fiction, Fantastic Fiction Adults, FictionDB, Goodreads, Google, and Open Library. Could having all these sources selected be slowing down my metadata downloads? If so, which sources would you recommend I use? I only care about getting the metadata and do not care about covers. 4) Last of all, is there any other software other than calibre that would be better suited for my needs? I greatly appreciate any feedback. Aaron |
![]() |
![]() |
![]() |
#2 |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,614
Karma: 29710338
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Try this
There are more sophisticated solutions involving database queries and some OS level scripting, but this will give you a feel of batch performance I doubt it will be much faster - the delays are mainly due to the speed of the servers and communications. It's my understanding that if you do too many books in a batch (I've seen people refer to a 100) the metadata source sites will probably throttle your request, in some cases they may blacklist your IP address for a while. BR |
![]() |
![]() |
![]() |
#3 | |||||
US Navy, Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,889
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
|
Quote:
I believe so. Let us know how it works. Quote:
Quote:
Quote:
Your best bet would be to get most of your metadata from the filename as you add the book to calibre even if it means you need to use a good file renamer to get the names in the correct format first. Then you should attempt to extract the ISBN from the books using the Extract ISBN plugin before downloading metadata. Quote:
Good Luck. Last edited by DoctorOhh; 09-01-2013 at 12:10 AM. |
|||||
![]() |
![]() |
![]() |
#4 | |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,614
Karma: 29710338
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
BR |
|
![]() |
![]() |
![]() |
#5 |
Handy Elephant
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,737
Karma: 26785684
Join Date: Dec 2009
Location: Southern Sweden, far out in the quiet woods
Device: Samsung Galaxy Tab S8 Ultra
|
Most likely some of the books in your collection already have good metadata and nice covers.
I would suggest that you do this in two steps. Start by only adding epub and mobi books. They are likely to have good metadata and covers embedded. When adding the books you can have calibre use the embedded metadata and ignore the filenames. If the titles and the authors are correct it should be easy to download extra metadata like series information and descriptions of the books. This could quickly and with a relatively small effort give you a very nice library to start with. Stop there and try to make it perfect. Make sure everything is just as you would like it. Series information. Cover sizes. No duplicates. Consistent author names. Correct sorting, just as you prefer it. Good genre system. ConsisRatingstent tagging system. Links to authors. Information about books read/unread. Conversions to correct formats for your reading device. Integration with your reading device and reading apps. Add news downloads. Setup a backup system to protect your investment in time and effort. Find complementing metadata sources. Virtual libraries. And so on. Enjoy your perfect calibre library and read some books. After that start adding and converting the rest of the books when you are in the mood. Preferably do it in small batches you can fully complete in a few hours, and preferably books that you are more likely to read soon. Perhaps books that can fill gaps in series you already have in the library, or by authors you enjoy especially. Or interesting topics. Or books that you have heard others talk about. Most likely you will not live long enough to be able to add and fix all your books. Perhaps some books are not worth the effort, and can be deleted? Take some time off now and then to read a book or two. Also consider adding books from other better sources than your own. If you try to add junk you're likely to end up with a junk library that is no fun. Last edited by Adoby; 09-01-2013 at 03:31 AM. |
![]() |
![]() |
![]() |
#6 |
Junior Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3
Karma: 4500
Join Date: Aug 2013
Device: kindle
|
Thank you everyone for the feedback! It sounds like it would still be slow using the command line interface. When I get some time though I will try to verify it using BetterRed’s advice.
DoctorOhh, I have been using the extract ISBN plugin and it seems to work for about 2/3 of the books. Unfortunately, a lot of the metadata from when I initially add the books to calibre has the author’s name in the title field and has the title in the author field. After some investigation it turns out most of my filenames have the following two formats: filename-author.filetype or author-filename.filetype For the first case, calibre correctly adds the metadata. However, for the second case, calibre adds the title and filename in the wrong fields. I have discovered when I select to edit metadata there is a check box to switch the filename and author fields! So, I am thinking that fixing the author and title fields will hopefully speed up the metadownload. Also, when doing a bulk metadownload, does calibre just use the first match? If so, under “Configure Metadata download” should I change the “Max. time to wait after first match is found” to zero? This also might speed things up by a lot. Adoby, due to the vast amount of books I am adding, I have most of them compressed to .rar files. This is saving a lot of file space. I will extract the books as I need them. Yeah, I agree that I do have some junk in my library that I would never read, but they are not worth the time remove from my library. I will keep you guys updated on how this progresses. I just started graduate school, so I am going to get pretty busy and I might need to take a break from this. Thanks again! Aaron |
![]() |
![]() |
![]() |
#7 | |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,614
Karma: 29710338
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
Enjoy your grad school sojourn ![]() BR |
|
![]() |
![]() |
![]() |
#8 | |
US Navy, Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,889
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
|
Quote:
|
|
![]() |
![]() |
![]() |
#9 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
is there an adjective for folks who hoard 100 x more books than they could read in a lifetime ?
CLD: compulsive library disorder ? it is also noteworthy that a side effect of CLD is a complete disregard for naming structure, metadata etc until the book collection reaches virtual eiffel tower proportions. normal folks store acquire only what they plan to read & they label as they go ![]() then there is the how much does it cost to honestly acquire 200,000 ebooks question- not that i am implying piracy or anything ![]() ![]() ![]() ps let's see... 200,000 books... well you read 3 complete books per day - every day - it will take you a tad under 200 years to read that pile once. so you'd better quit faffing around with labelling & start reading ! or see if you have a really good speed reading text somewhere in the pile ![]() Last edited by cybmole; 09-01-2013 at 11:06 AM. |
![]() |
![]() |
![]() |
#10 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 105
Karma: 414068
Join Date: Feb 2013
Device: iPad Pro, Kobo Aura One
|
woodapple, even if you manage to import 200,000 books into calibre, are you sure calibre will perform well enough on your system with that big of a library? You say your files are rared to save space so your system might not even handle 20% of your library.
And what BetterRed said about getting banned & blacklisted by sites is absolutely true. Few months back, I got blacklisted by a publisher's site when I was looking up ISBN numbers and was scraping title & authors information. Even though I put 10-15 sec random wait time between queries, I got banned after 6 hrs. I was able to rename about 85% of files so it wasn’t that bad. Heck, even Google.com will ban you for much less than that and start giving you a captcha to solve before you are able to use their search and to remove captcha, you'll have to file a request and wait few days (trust me, I had to do that after google blocked me for doing few dozen autmated image searches). cybmole, if they're books of fiction, I completely agree but if they're non-fiction, then having a lot of books can be extremely useful when you're synthesizing information and need to reference, cross-check and verify facts etc. I do full-text search all the time on my library and it's proven extremely useful to me. You never know what you're gonna find when you do that. Last edited by MelBr; 09-01-2013 at 12:56 PM. |
![]() |
![]() |
![]() |
#11 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
Quote:
1. isn't that what google is for ? 2.when & if 200,000 books can be fully indexed, the metadata becomes pretty redundant, for research purposes? We are talking upwards of 200Gb of compressed text here, so indexing that lot is not trivial. I don't think windows 7 is able to find occurences of a word or phrase within books that are in my much more modest calibre library - certainly not on default indexing settings still curious as to how 200,000 fiction or non-fiction texts ( published ones with ISBNs) could be legally accumulated- by some one who is only just starting grad school Last edited by cybmole; 09-01-2013 at 01:49 PM. |
|
![]() |
![]() |
![]() |
#12 |
Serpent Rider
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,123
Karma: 10219804
Join Date: Jun 2009
Device: Sony 350; Nook STR; Oasis
|
self edit...
Last edited by Ravensknight; 09-01-2013 at 07:33 PM. |
![]() |
![]() |
![]() |
#13 | |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 105
Karma: 414068
Join Date: Feb 2013
Device: iPad Pro, Kobo Aura One
|
Quote:
2. Correct. I find metadata pretty much useless for scientific purposes. Full-text search is vastly superior and I rely on it daily. I'm a grad student and I've converted almost every researcher in my lab to my method of searching and using all of this data (it's wasn’t hard + it's nothing super-advanced). We're all AI people and it's natural to us to find ways to harness all this knowledge. As for Win7, I don't know. I can't really help you there. My lab uses OS X and Linux for everything. Linux powers large number of servers (about 100+ and racks of Nvidia and ATI GPUs (we have probably around 200 GPUs) that we use for simulations. For everything else, it's Macs (mostly laptops). I can go in detail how we index all this stuff but it's probably not helpful to you since you're on Win. As for your last inquiry, I'm not OP woodapple so I have no idea how he got 200k books. As for us, we got them, legally, from publishers, libraries & other researchers. |
|
![]() |
![]() |
![]() |
#14 |
Junior Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3
Karma: 4500
Join Date: Aug 2013
Device: kindle
|
BetterRed and DoctorOhh, that is good to know that you can change what expression calibre uses to add the books. I just need to figure out a good way to separate my books into those two categories.
MelBr I am not sure if calibre will be able to handle 200,000 books. My goal for this project was to get these books organized into folders and to remove duplicate books (hopefully via duplicate plugin by kiwidude). So if I get to the point that calibre starts having issues, I can back up the calibre folder system that I have so far and then start adding the next books to an empty library. I will keep in mind about the potential issue with getting banned & blacklisted. By the way, I am running linux. Cybmole, my book collection also contains a lot of journals, articles, and other types of documents other than just fiction books. Additionally, I do have a lot of duplicates right now (e.g. same book in multiple formats). I am hoping to use the find duplicate plugin by kiwidude to help eliminate the duplicates. For me, having a large collection of books has been useful. Whenever I find a book or article that I would like to read or reference, a lot of times, I already have it in my collection. Family and friends have also used my collection. However, cybermole I do not see your purpose here on this thread. I created this thread to get some advice on how to perform certain functions in calibre. You have not posted anything helpful regarding my inquiries, but instead just posted negative comments about me having a large book collection. So unless you have something helpful to post, I would appreciate if you mind your own business and leave. Everyone else, Thank you so much for your feedback! |
![]() |
![]() |
![]() |
Tags |
bulk, bulk edit metadata, calibre, command line, metadata |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Bulk metadata download | anthony.burton4 | Library Management | 2 | 02-07-2013 06:42 AM |
Command Line Search and Replace Metadata | metalhammer | Calibre | 1 | 05-02-2012 05:13 AM |
calibredb command line question (setting metadata) | mezme | Calibre | 0 | 12-06-2011 12:17 AM |
Bulk Command Line Conversion? | blobbo | Conversion | 1 | 05-24-2011 05:32 AM |
Problem updating metadata (using mobi2mobi command line and gui) | whitearrow | Kindle Formats | 3 | 12-05-2009 07:07 PM |