Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 01-05-2024, 06:04 AM   #1
Ico
Enthusiast
Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'
 
Posts: 27
Karma: 10000
Join Date: Jan 2019
Device: Kindle PW4
Advice on how to scrape or use an api for thousands of books?

Hello.

First of all I use Calibre but I am learning development and I decided to make a cli script that gets metadata for all books in a folder.

I managed to make most of it work but I have a problem figuring out how to do a query going over several thousands or even 100000.

Google books has a daily limit of 1000 and open library has a limit of 100 every 5 minutes.

I heard Kovid mention that he used duckduckgo. Mr. Goyal if you read this by any chance could you please tell me how you did it?

I wanted to use pypupeteer on duckduckgo or google but i can't figure out based on their robots.txt what is and isn't permitted.
I don't want to get blacklisted by a mistake.

I also found out that Google thinks queries above 100000 are chump change to them and they will increase it for free if i get issued an api key for which I have to put in my credit card information.

I don't think the users and frankly myself are comfortable with that.

Thank you for reading. Please let me know if i need to further elaborate.
Ico is offline   Reply With Quote
Old 01-05-2024, 06:25 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,864
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
There's no free service that will let you make that many queries. Indeed nowadays even google restricts you to about 50 ish queries a day.
kovidgoyal is offline   Reply With Quote
Advert
Old 01-05-2024, 09:13 AM   #3
Ico
Enthusiast
Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'
 
Posts: 27
Karma: 10000
Join Date: Jan 2019
Device: Kindle PW4
Quote:
Originally Posted by kovidgoyal View Post
There's no free service that will let you make that many queries. Indeed nowadays even google restricts you to about 50 ish queries a day.
Thank you I see so you pay for the google cloud from donations?

I saw that they publicly state that unverified accounts get 1000 per day.

I guess the days of asking for even a measly 10000 are over?

I saw recent SO threads stating it could still be done, with a credit card, but I also saw a lot of posts about being denied.
I thought they asked for way to much.

If I decide to scrape the info using duckduckgo, google, and bing interchangeably, lets say i divide into 15000 for each, is there a possibility of them blacklisting me, blacklisting my MAC address and HWID?
I honestly can't tell from their robots.txt what is and isn't permitted.
Ico is offline   Reply With Quote
Old 01-05-2024, 09:30 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,864
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
calibre does not use google apis it queries the same urls as you do when you use a browser. And google rate limits these queries to 50 odd a day.
kovidgoyal is offline   Reply With Quote
Old 01-05-2024, 10:26 AM   #5
Ico
Enthusiast
Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'
 
Posts: 27
Karma: 10000
Join Date: Jan 2019
Device: Kindle PW4
Quote:
Originally Posted by kovidgoyal View Post
calibre does not use google apis it queries the same urls as you do when you use a browser. And google rate limits these queries to 50 odd a day.
That is great. That is what i was meaning to actually ask, sorry i kind of got confused i am trying to finish several projects.

That is what i meant.

Could you please tell me the workflow in a sentence or two.
What websites do you use and would headless browser and selenium be enough?

I wanted to also use a browser but was worried that that site or the search engine might block me.
Someone praised my project and said it would help them with 50000 books.

I immediately thought that my project couldn't accomplish that task and have spent days trying to make it happen as it would be a nice feature as I am working on my portfolio.

I thought about the complexity of the algorithm but even if it were O(n ^ n) even a million operations isn't much and if i get into trouble i could port that python code to go and revert back when Python supersedes the GIL.

The logic i found here:
https://github.com/kovidgoyal/calibr...mazon.py#L1094
https://github.com/kovidgoyal/calibr...ngines.py#L177

goes way over my head.

I don't know if i should be focusing exclusively on cached pages and instant searches or if i could just do a search for {Title} AND {Author} with {publisher}, {rating}, {rating_count} IN.
Ico is offline   Reply With Quote
Advert
Old 01-05-2024, 10:33 AM   #6
Ico
Enthusiast
Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'
 
Posts: 27
Karma: 10000
Join Date: Jan 2019
Device: Kindle PW4
Oh and of course it would not be O n^n that was just as an example it should be in the range of O^n2 or n3 at the worst while using asyncio and aiosqlite and sqlalchemy maybe even approaching O n.

I am just trying to do something to show to recruiters i manage the basics.
Ico is offline   Reply With Quote
Old 01-05-2024, 10:37 PM   #7
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,864
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
headless browser and selenium are fine but you will not be able to scrape large numbers of results.
kovidgoyal is offline   Reply With Quote
Old 01-08-2024, 07:33 PM   #8
tomsem
Grand Sorcerer
tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.
 
Posts: 6,478
Karma: 26425959
Join Date: Apr 2009
Location: USA
Device: iPhone 15PM, Kindle Scribe, iPad mini 6, PocketBook InkPad Color 3
After running into the Google 'too many requests' error too many times, I switched to using the Goodreads plugin to fetch metadata. Since then, no issues, and the metadata (particularly Series) seems better overall. I rarely need to correct it.

It may well be Goodreads API has similar limit, but by the time I switched I no longer had hundreds of books to fetch metadata for.
tomsem is offline   Reply With Quote
Old 01-12-2024, 05:52 AM   #9
Ico
Enthusiast
Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'Ico knows the difference between 'who' and 'whom'
 
Posts: 27
Karma: 10000
Join Date: Jan 2019
Device: Kindle PW4
Quote:
Originally Posted by tomsem View Post
After running into the Google 'too many requests' error too many times, I switched to using the Goodreads plugin to fetch metadata. Since then, no issues, and the metadata (particularly Series) seems better overall. I rarely need to correct it.

It may well be Goodreads API has similar limit, but by the time I switched I no longer had hundreds of books to fetch metadata for.
That is great.
Unfortunately I missed my chance and Goodreads does not issue new api keys.
Ico is offline   Reply With Quote
Reply

Tags
api, googlebooks api, openlibrary api, scraping


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
How to feed a Kobo with thousands of books? Mingyar Kobo Reader 31 03-15-2022 09:28 AM
Sony's New German Ebookstore Features Thousands Of DRM-Free Books kesey News 5 12-15-2012 01:13 PM
When an e-reader is loaded with thousands of books, does it gain any weight? Hoyt Clagwell General Discussions 29 11-10-2011 02:32 PM
Random House to digitize thousands of books DonaldL. News 34 12-04-2008 08:39 AM
Random House to digitize thousands of books zelda_pinwheel News 0 11-24-2008 09:58 AM


All times are GMT -4. The time now is 05:06 AM.


MobileRead.com is a privately owned, operated and funded community.