Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 08-31-2013, 04:08 PM   #1
woodapple
Junior Member
woodapple is fluent in JavaScript as well as Klingon.woodapple is fluent in JavaScript as well as Klingon.woodapple is fluent in JavaScript as well as Klingon.woodapple is fluent in JavaScript as well as Klingon.woodapple is fluent in JavaScript as well as Klingon.woodapple is fluent in JavaScript as well as Klingon.woodapple is fluent in JavaScript as well as Klingon.woodapple is fluent in JavaScript as well as Klingon.woodapple is fluent in JavaScript as well as Klingon.woodapple is fluent in JavaScript as well as Klingon.woodapple is fluent in JavaScript as well as Klingon.
 
Posts: 3
Karma: 4500
Join Date: Aug 2013
Device: kindle
Question bulk metadata download through command line

Hey guys,

I have a massive ebook collection of about 200,000 books. My problem is that they are completely unorganized, which makes finding a specific book a pain. After doing some searching online I decided the easiest way to get them organized is to add them through calibre. Since I have so many books, I have been looking into the command line interface, so that the process would hopefully be quicker and more reliable. So far I have figured out how to add the books through the command line using calibredb add -d {dir}. I am also splitting up the books I add using wildcards in the {dir} (e.g. calibredb add -d ./A*). Below outlines a few questions I have.

1) Would calibre be able to handle that many books?

2) Through the GUI, I know how to do a bulk metadownload. However, I ran this through last night (7 hours) and it only got about 3000 books done. I had also selected to not download the covers and only download metadata. Therefore, I estimate that it would take about 467 hours to download the metadata for all my books! I have searched for a way to do the bulk download through command line hoping that it will be quicker. However, I have only been able to find how to do individual metadata downloads using fetch-ebook-metadata [options] where you have to specify the author, title, or isbn. Does anyone know how to do bulk metadata download through the command line?

3) I have downloaded some plugins providing additional sources to find the metadata (mostly from kiwidude), hoping to decrease the number of books that calibre is unable to find the metadata for. The following is the metadata sources that I have currently selected: Amazon.com, Anobii Fetcher, Barnes & Noble, Edelweiss, Fantastic Fiction, Fantastic Fiction Adults, FictionDB, Goodreads, Google, and Open Library. Could having all these sources selected be slowing down my metadata downloads? If so, which sources would you recommend I use? I only care about getting the metadata and do not care about covers.

4) Last of all, is there any other software other than calibre that would be better suited for my needs?


I greatly appreciate any feedback.


Aaron
woodapple is offline   Reply With Quote
Old 08-31-2013, 07:57 PM   #2
BetterRed
null operator
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 4,034
Karma: 2925589
Join Date: Mar 2012
Location: NSW Australia
Device: none
Try this
  1. write a list of books to a csv file with the Catalogue feature (you may need to add Create Catalogue to your icon bar or menubar)
  2. save the csv file to somewhere outside the Calibre library - eg the desktop
  3. edit the csv file into a list of fetch-ebook-metadata commands
  4. rename as a bat file
  5. run batch file

There are more sophisticated solutions involving database queries and some OS level scripting, but this will give you a feel of batch performance

I doubt it will be much faster - the delays are mainly due to the speed of the servers and communications.

It's my understanding that if you do too many books in a batch (I've seen people refer to a 100) the metadata source sites will probably throttle your request, in some cases they may blacklist your IP address for a while.

BR
BetterRed is online now   Reply With Quote
Old 09-01-2013, 01:03 AM   #3
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 8,909
Karma: 12755553
Join Date: Feb 2009
Location: North Carolina
Device: Nexus 7
Quote:
Originally Posted by woodapple View Post
After doing some searching online I decided the easiest way to get them organized is to add them through calibre. Since I have so many books, I have been looking into the command line interface, so that the process would hopefully be quicker and more reliable.
I doubt if the CL will be quicker or more reliable.

Quote:
Originally Posted by woodapple View Post
1) Would calibre be able to handle that many books?
I believe so. Let us know how it works.

Quote:
Originally Posted by woodapple View Post
I had also selected to not download the covers and only download metadata. Therefore, I estimate that it would take about 467 hours to download the metadata for all my books!
So 4 hours a day it would be less than a half a year. Welcome to ebook management.

Quote:
Originally Posted by woodapple View Post
I have searched for a way to do the bulk download through command line hoping that it will be quicker.
It may be easier to schedule but I doubt it will be quicker.

Quote:
Originally Posted by woodapple View Post
3) I have downloaded some plugins providing additional sources to find the metadata (mostly from kiwidude), hoping to decrease the number of books that calibre is unable to find the metadata for. The following is the metadata sources that I have currently selected: Amazon.com, Anobii Fetcher, Barnes & Noble, Edelweiss, Fantastic Fiction, Fantastic Fiction Adults, FictionDB, Goodreads, Google, and Open Library. Could having all these sources selected be slowing down my metadata downloads? If so, which sources would you recommend I use? I only care about getting the metadata and do not care about covers.
I use Amazon.com, Barnes & Noble, Fantastic Fiction, Goodreads, Google, and Open Library. But I only use Goodreads to download tags and in order to get things right I usually do one book at a time. Although when I went through and redid my tags I did up it to 50 books at a time.

Your best bet would be to get most of your metadata from the filename as you add the book to calibre even if it means you need to use a good file renamer to get the names in the correct format first. Then you should attempt to extract the ISBN from the books using the Extract ISBN plugin before downloading metadata.

Quote:
Originally Posted by woodapple View Post
4) Last of all, is there any other software other than calibre that would be better suited for my needs?
As BetterRed points out there are other methods to deal with what you want, but nothing as comprehensive as calibre.

Good Luck.

Last edited by DoctorOhh; 09-01-2013 at 01:10 AM.
DoctorOhh is offline   Reply With Quote
Old 09-01-2013, 04:15 AM   #4
BetterRed
null operator
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 4,034
Karma: 2925589
Join Date: Mar 2012
Location: NSW Australia
Device: none
Quote:
Originally Posted by DoctorOhh View Post
Quote:
Originally Posted by woodapple View Post
4) Last of all, is there any other software other than calibre that would be better suited for my needs?
As BetterRed points out there are other methods to deal with what you want, but nothing as comprehensive as calibre.

Good Luck.
The other methods I had in mind would use the Calibre command line programs and the Calibre database. I don't know of any other product that gets close to Calibre's functionality.

BR
BetterRed is online now   Reply With Quote
Old 09-01-2013, 04:17 AM   #5
Adoby
Handy Elephant
Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.Adoby ought to be getting tired of karma fortunes by now.
 
Adoby's Avatar
 
Posts: 1,124
Karma: 5735944
Join Date: Dec 2009
Location: Southern Sweden, far out in the quiet woods
Device: Ubuntu Linux, Cybook Opus, Motorola Xoom with Mantano Premium
Most likely some of the books in your collection already have good metadata and nice covers.

I would suggest that you do this in two steps.

Start by only adding epub and mobi books. They are likely to have good metadata and covers embedded. When adding the books you can have calibre use the embedded metadata and ignore the filenames.

If the titles and the authors are correct it should be easy to download extra metadata like series information and descriptions of the books.

This could quickly and with a relatively small effort give you a very nice library to start with. Stop there and try to make it perfect.

Make sure everything is just as you would like it. Series information. Cover sizes. No duplicates. Consistent author names. Correct sorting, just as you prefer it. Good genre system. ConsisRatingstent tagging system. Links to authors. Information about books read/unread. Conversions to correct formats for your reading device. Integration with your reading device and reading apps. Add news downloads. Setup a backup system to protect your investment in time and effort. Find complementing metadata sources. Virtual libraries. And so on.

Enjoy your perfect calibre library and read some books.

After that start adding and converting the rest of the books when you are in the mood. Preferably do it in small batches you can fully complete in a few hours, and preferably books that you are more likely to read soon. Perhaps books that can fill gaps in series you already have in the library, or by authors you enjoy especially. Or interesting topics. Or books that you have heard others talk about. Most likely you will not live long enough to be able to add and fix all your books. Perhaps some books are not worth the effort, and can be deleted? Take some time off now and then to read a book or two.

Also consider adding books from other better sources than your own.

If you try to add junk you're likely to end up with a junk library that is no fun.

Last edited by Adoby; 09-01-2013 at 04:31 AM.
Adoby is offline   Reply With Quote
Old 09-01-2013, 08:42 AM   #6
woodapple
Junior Member
woodapple is fluent in JavaScript as well as Klingon.woodapple is fluent in JavaScript as well as Klingon.woodapple is fluent in JavaScript as well as Klingon.woodapple is fluent in JavaScript as well as Klingon.woodapple is fluent in JavaScript as well as Klingon.woodapple is fluent in JavaScript as well as Klingon.woodapple is fluent in JavaScript as well as Klingon.woodapple is fluent in JavaScript as well as Klingon.woodapple is fluent in JavaScript as well as Klingon.woodapple is fluent in JavaScript as well as Klingon.woodapple is fluent in JavaScript as well as Klingon.
 
Posts: 3
Karma: 4500
Join Date: Aug 2013
Device: kindle
Thank you everyone for the feedback! It sounds like it would still be slow using the command line interface. When I get some time though I will try to verify it using BetterRed’s advice.

DoctorOhh, I have been using the extract ISBN plugin and it seems to work for about 2/3 of the books. Unfortunately, a lot of the metadata from when I initially add the books to calibre has the author’s name in the title field and has the title in the author field. After some investigation it turns out most of my filenames have the following two formats:
filename-author.filetype or author-filename.filetype

For the first case, calibre correctly adds the metadata. However, for the second case, calibre adds the title and filename in the wrong fields. I have discovered when I select to edit metadata there is a check box to switch the filename and author fields! So, I am thinking that fixing the author and title fields will hopefully speed up the metadownload.

Also, when doing a bulk metadownload, does calibre just use the first match? If so, under “Configure Metadata download” should I change the “Max. time to wait after first match is found” to zero? This also might speed things up by a lot.

Adoby, due to the vast amount of books I am adding, I have most of them compressed to .rar files. This is saving a lot of file space. I will extract the books as I need them. Yeah, I agree that I do have some junk in my library that I would never read, but they are not worth the time remove from my library.

I will keep you guys updated on how this progresses. I just started graduate school, so I am going to get pretty busy and I might need to take a break from this.

Thanks again!
Aaron
woodapple is offline   Reply With Quote
Old 09-01-2013, 09:10 AM   #7
BetterRed
null operator
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 4,034
Karma: 2925589
Join Date: Mar 2012
Location: NSW Australia
Device: none
Quote:
Originally Posted by woodapple View Post
Thank you everyone for the feedback! It sounds like it would still be slow using the command line interface. When I get some time though I will try to verify it using BetterRed’s advice.

DoctorOhh, I have been using the extract ISBN plugin and it seems to work for about 2/3 of the books. Unfortunately, a lot of the metadata from when I initially add the books to calibre has the author’s name in the title field and has the title in the author field. After some investigation it turns out most of my filenames have the following two formats:
filename-author.filetype or author-filename.filetype

For the first case, calibre correctly adds the metadata. However, for the second case, calibre adds the title and filename in the wrong fields. I have discovered when I select to edit metadata there is a check box to switch the filename and author fields! So, I am thinking that fixing the author and title fields will hopefully speed up the metadownload.

Also, when doing a bulk metadownload, does calibre just use the first match? If so, under “Configure Metadata download” should I change the “Max. time to wait after first match is found” to zero? This also might speed things up by a lot.

Adoby, due to the vast amount of books I am adding, I have most of them compressed to .rar files. This is saving a lot of file space. I will extract the books as I need them. Yeah, I agree that I do have some junk in my library that I would never read, but they are not worth the time remove from my library.

I will keep you guys updated on how this progresses. I just started graduate school, so I am going to get pretty busy and I might need to take a break from this.

Thanks again!
Aaron
@woodapple If you can separate the author-title books and the title-author books into two 'piles' then you can select the appropriate regular expression from the Add Books Configuration and you won't have to do the adjustments

Enjoy your grad school sojourn

BR
Attached Thumbnails
Click image for larger version

Name:	Screenshot - 2013-09-01 , 22_05_53.jpg
Views:	94
Size:	98.7 KB
ID:	110300  
BetterRed is online now   Reply With Quote
Old 09-01-2013, 09:22 AM   #8
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 8,909
Karma: 12755553
Join Date: Feb 2009
Location: North Carolina
Device: Nexus 7
Quote:
Originally Posted by woodapple View Post
Unfortunately, a lot of the metadata from when I initially add the books to calibre has the author’s name in the title field and has the title in the author field. After some investigation it turns out most of my filenames have the following two formats:
filename-author.filetype or author-filename.filetype

For the first case, calibre correctly adds the metadata. However, for the second case, calibre adds the title and filename in the wrong fields. I have discovered when I select to edit metadata there is a check box to switch the filename and author fields! So, I am thinking that fixing the author and title fields will hopefully speed up the metadownload.
Correct matching is up to the user, either rename your files to match the regex template under adding books or have multiple regex templates set up for the various cases you find your files. I use the Quick Preference plugin and have 5 different adding books regular expressions that I can quickly change to without opening preferences. Check out this Google search for more info.
DoctorOhh is offline   Reply With Quote
Old 09-01-2013, 11:42 AM   #9
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 2,997
Karma: 1285294
Join Date: Sep 2010
Device: Kobo aura HD, Kobo Arc, Kindle Fire HDX 8.9 , Kindle for PC
is there an adjective for folks who hoard 100 x more books than they could read in a lifetime ?
CLD: compulsive library disorder ?

it is also noteworthy that a side effect of CLD is a complete disregard for naming structure, metadata etc until the book collection reaches virtual eiffel tower proportions.

normal folks store acquire only what they plan to read & they label as they go

then there is the how much does it cost to honestly acquire 200,000 ebooks question- not that i am implying piracy or anything

ps let's see... 200,000 books... well you read 3 complete books per day - every day - it will take you a tad under 200 years to read that pile once. so you'd better quit faffing around with labelling & start reading !
or see if you have a really good speed reading text somewhere in the pile

Last edited by cybmole; 09-01-2013 at 12:06 PM.
cybmole is offline   Reply With Quote
Old 09-01-2013, 01:49 PM   #10
MelBr
Connoisseur
MelBr ought to be getting tired of karma fortunes by now.MelBr ought to be getting tired of karma fortunes by now.MelBr ought to be getting tired of karma fortunes by now.MelBr ought to be getting tired of karma fortunes by now.MelBr ought to be getting tired of karma fortunes by now.MelBr ought to be getting tired of karma fortunes by now.MelBr ought to be getting tired of karma fortunes by now.MelBr ought to be getting tired of karma fortunes by now.MelBr ought to be getting tired of karma fortunes by now.MelBr ought to be getting tired of karma fortunes by now.MelBr ought to be getting tired of karma fortunes by now.
 
Posts: 89
Karma: 405216
Join Date: Feb 2013
Device: iPad
woodapple, even if you manage to import 200,000 books into calibre, are you sure calibre will perform well enough on your system with that big of a library? You say your files are rared to save space so your system might not even handle 20% of your library.

And what BetterRed said about getting banned & blacklisted by sites is absolutely true. Few months back, I got blacklisted by a publisher's site when I was looking up ISBN numbers and was scraping title & authors information. Even though I put 10-15 sec random wait time between queries, I got banned after 6 hrs. I was able to rename about 85% of files so it wasn’t that bad. Heck, even Google.com will ban you for much less than that and start giving you a captcha to solve before you are able to use their search and to remove captcha, you'll have to file a request and wait few days (trust me, I had to do that after google blocked me for doing few dozen autmated image searches).

cybmole, if they're books of fiction, I completely agree but if they're non-fiction, then having a lot of books can be extremely useful when you're synthesizing information and need to reference, cross-check and verify facts etc.

I do full-text search all the time on my library and it's proven extremely useful to me. You never know what you're gonna find when you do that.

Last edited by MelBr; 09-01-2013 at 01:56 PM.
MelBr is offline   Reply With Quote
Old 09-01-2013, 02:46 PM   #11
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 2,997
Karma: 1285294
Join Date: Sep 2010
Device: Kobo aura HD, Kobo Arc, Kindle Fire HDX 8.9 , Kindle for PC
Quote:
Originally Posted by MelBr View Post
...but if they're non-fiction, then having a lot of books can be extremely useful when you're synthesizing information and need to reference, cross-check and verify facts etc.

I do full-text search all the time on my library and it's proven extremely useful to me. You never know what you're gonna find when you do that.
ok but
1. isn't that what google is for ?
2.when & if 200,000 books can be fully indexed, the metadata becomes pretty redundant, for research purposes?

We are talking upwards of 200Gb of compressed text here, so indexing that lot is not trivial. I don't think windows 7 is able to find occurences of a word or phrase within books that are in my much more modest calibre library - certainly not on default indexing settings

still curious as to how 200,000 fiction or non-fiction texts ( published ones with ISBNs) could be legally accumulated- by some one who is only just starting grad school

Last edited by cybmole; 09-01-2013 at 02:49 PM.
cybmole is offline   Reply With Quote
Old 09-01-2013, 07:57 PM   #12
Ravensknight
Serpent Rider
Ravensknight ought to be getting tired of karma fortunes by now.Ravensknight ought to be getting tired of karma fortunes by now.Ravensknight ought to be getting tired of karma fortunes by now.Ravensknight ought to be getting tired of karma fortunes by now.Ravensknight ought to be getting tired of karma fortunes by now.Ravensknight ought to be getting tired of karma fortunes by now.Ravensknight ought to be getting tired of karma fortunes by now.Ravensknight ought to be getting tired of karma fortunes by now.Ravensknight ought to be getting tired of karma fortunes by now.Ravensknight ought to be getting tired of karma fortunes by now.Ravensknight ought to be getting tired of karma fortunes by now.
 
Ravensknight's Avatar
 
Posts: 805
Karma: 5948888
Join Date: Jun 2009
Device: Sony 505, 350; Nook STR; Kindle T, NT4B; Nexus 7; Superpad 10in tablet
self edit...

Last edited by Ravensknight; 09-01-2013 at 08:33 PM.
Ravensknight is offline   Reply With Quote
Old 09-02-2013, 04:12 PM   #13
MelBr
Connoisseur
MelBr ought to be getting tired of karma fortunes by now.MelBr ought to be getting tired of karma fortunes by now.MelBr ought to be getting tired of karma fortunes by now.MelBr ought to be getting tired of karma fortunes by now.MelBr ought to be getting tired of karma fortunes by now.MelBr ought to be getting tired of karma fortunes by now.MelBr ought to be getting tired of karma fortunes by now.MelBr ought to be getting tired of karma fortunes by now.MelBr ought to be getting tired of karma fortunes by now.MelBr ought to be getting tired of karma fortunes by now.MelBr ought to be getting tired of karma fortunes by now.
 
Posts: 89
Karma: 405216
Join Date: Feb 2013
Device: iPad
Quote:
Originally Posted by cybmole View Post
ok but
1. isn't that what google is for ?
2.when & if 200,000 books can be fully indexed, the metadata becomes pretty redundant, for research purposes?

We are talking upwards of 200Gb of compressed text here, so indexing that lot is not trivial. I don't think windows 7 is able to find occurences of a word or phrase within books that are in my much more modest calibre library - certainly not on default indexing settings

still curious as to how 200,000 fiction or non-fiction texts ( published ones with ISBNs) could be legally accumulated- by some one who is only just starting grad school
1. No, since Google doesn't even have 10% of books scanned & indexed that I have access to. And Google sucks even more when it comes to journals & papers. My university & my research lab has accumulated 200+ gigs of papers in PDF format and Google has only a tiny sliver of them indexed (I'm at a CS/AI lab at a large uni). Amazon's better when it comes to books but they don't have sci papers scanned/indexed. You can build your own mini-google-like search system.

2. Correct. I find metadata pretty much useless for scientific purposes. Full-text search is vastly superior and I rely on it daily. I'm a grad student and I've converted almost every researcher in my lab to my method of searching and using all of this data (it's wasn’t hard + it's nothing super-advanced). We're all AI people and it's natural to us to find ways to harness all this knowledge.

As for Win7, I don't know. I can't really help you there. My lab uses OS X and Linux for everything. Linux powers large number of servers (about 100+ and racks of Nvidia and ATI GPUs (we have probably around 200 GPUs) that we use for simulations. For everything else, it's Macs (mostly laptops). I can go in detail how we index all this stuff but it's probably not helpful to you since you're on Win.

As for your last inquiry, I'm not OP woodapple so I have no idea how he got 200k books. As for us, we got them, legally, from publishers, libraries & other researchers.
MelBr is offline   Reply With Quote
Old 09-02-2013, 05:50 PM   #14
woodapple
Junior Member
woodapple is fluent in JavaScript as well as Klingon.woodapple is fluent in JavaScript as well as Klingon.woodapple is fluent in JavaScript as well as Klingon.woodapple is fluent in JavaScript as well as Klingon.woodapple is fluent in JavaScript as well as Klingon.woodapple is fluent in JavaScript as well as Klingon.woodapple is fluent in JavaScript as well as Klingon.woodapple is fluent in JavaScript as well as Klingon.woodapple is fluent in JavaScript as well as Klingon.woodapple is fluent in JavaScript as well as Klingon.woodapple is fluent in JavaScript as well as Klingon.
 
Posts: 3
Karma: 4500
Join Date: Aug 2013
Device: kindle
BetterRed and DoctorOhh, that is good to know that you can change what expression calibre uses to add the books. I just need to figure out a good way to separate my books into those two categories.

MelBr I am not sure if calibre will be able to handle 200,000 books. My goal for this project was to get these books organized into folders and to remove duplicate books (hopefully via duplicate plugin by kiwidude). So if I get to the point that calibre starts having issues, I can back up the calibre folder system that I have so far and then start adding the next books to an empty library. I will keep in mind about the potential issue with getting banned & blacklisted. By the way, I am running linux.

Cybmole, my book collection also contains a lot of journals, articles, and other types of documents other than just fiction books. Additionally, I do have a lot of duplicates right now (e.g. same book in multiple formats). I am hoping to use the find duplicate plugin by kiwidude to help eliminate the duplicates. For me, having a large collection of books has been useful. Whenever I find a book or article that I would like to read or reference, a lot of times, I already have it in my collection. Family and friends have also used my collection. However, cybermole I do not see your purpose here on this thread. I created this thread to get some advice on how to perform certain functions in calibre. You have not posted anything helpful regarding my inquiries, but instead just posted negative comments about me having a large book collection. So unless you have something helpful to post, I would appreciate if you mind your own business and leave.

Everyone else,
Thank you so much for your feedback!
woodapple is offline   Reply With Quote
Reply

Tags
bulk, bulk edit metadata, calibre, command line, metadata

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Bulk metadata download anthony.burton4 Library Management 2 02-07-2013 07:42 AM
Command Line Search and Replace Metadata metalhammer Calibre 1 05-02-2012 06:13 AM
calibredb command line question (setting metadata) mezme Calibre 0 12-06-2011 01:17 AM
Bulk Command Line Conversion? blobbo Conversion 1 05-24-2011 06:32 AM
Problem updating metadata (using mobi2mobi command line and gui) whitearrow Kindle Formats 3 12-05-2009 08:07 PM


All times are GMT -4. The time now is 01:57 AM.


MobileRead.com is a privately owned, operated and funded community.