MobileRead Forums - View Single Post - Advice on how to scrape or use an api for thousands of books?

Ico · 01-05-2024, 06:04 AM

Hello.

First of all I use Calibre but I am learning development and I decided to make a cli script that gets metadata for all books in a folder.

I managed to make most of it work but I have a problem figuring out how to do a query going over several thousands or even 100000.

Google books has a daily limit of 1000 and open library has a limit of 100 every 5 minutes.

I heard Kovid mention that he used duckduckgo. Mr. Goyal if you read this by any chance could you please tell me how you did it?

I wanted to use pypupeteer on duckduckgo or google but i can't figure out based on their robots.txt what is and isn't permitted.
I don't want to get blacklisted by a mistake.

I also found out that Google thinks queries above 100000 are chump change to them and they will increase it for free if i get issued an api key for which I have to put in my credit card information.

I don't think the users and frankly myself are comfortable with that.

Thank you for reading. Please let me know if i need to further elaborate.

01-05-2024, 06:04 AM	#1
Ico Enthusiast Posts: 27 Karma: 10000 Join Date: Jan 2019 Device: Kindle PW4	Advice on how to scrape or use an api for thousands of books? Hello. First of all I use Calibre but I am learning development and I decided to make a cli script that gets metadata for all books in a folder. I managed to make most of it work but I have a problem figuring out how to do a query going over several thousands or even 100000. Google books has a daily limit of 1000 and open library has a limit of 100 every 5 minutes. I heard Kovid mention that he used duckduckgo. Mr. Goyal if you read this by any chance could you please tell me how you did it? I wanted to use pypupeteer on duckduckgo or google but i can't figure out based on their robots.txt what is and isn't permitted. I don't want to get blacklisted by a mistake. I also found out that Google thinks queries above 100000 are chump change to them and they will increase it for free if i get issued an api key for which I have to put in my credit card information. I don't think the users and frankly myself are comfortable with that. Thank you for reading. Please let me know if i need to further elaborate.