Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 07-15-2023, 07:15 AM   #1
fengli
Connoisseur
fengli began at the beginning.
 
Posts: 80
Karma: 10
Join Date: Aug 2022
Device: PC
Bloomberg Weekly also failed, unable to crawl content, full of invalid icons

Please take the time to check it out, thank you very much
fengli is offline   Reply With Quote
Old 07-17-2023, 12:01 AM   #2
unkn0wn
Evangelist
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 462
Karma: 82692
Join Date: May 2021
Device: kindle
https://github.com/unkn0w7n/calibre/...ef128ea65a5d3c

https://github.com/unkn0w7n/calibre/...ebf0c8faa5580e

Fixed both for now, but there must be a better way to do this.

I'm not able to get graphs/data images like before or the lists/tables from json, hyperlink tags are also missing.

If someone knows ways to make it better, feel free to make those changes to the recipe and submit it here or on github.
unkn0wn is offline   Reply With Quote
Advert
Old 07-17-2023, 06:03 AM   #3
fengli
Connoisseur
fengli began at the beginning.
 
Posts: 80
Karma: 10
Join Date: Aug 2022
Device: PC
At least it can be extracted, thanks thanks
fengli is offline   Reply With Quote
Old 07-18-2023, 01:10 AM   #4
unkn0wn
Evangelist
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 462
Karma: 82692
Join Date: May 2021
Device: kindle
https://github.com/unkn0w7n/calibre/...6cd643daffec24

fixed hyperlinks, charts etc.
unkn0wn is offline   Reply With Quote
Old 07-18-2023, 06:47 AM   #5
fengli
Connoisseur
fengli began at the beginning.
 
Posts: 80
Karma: 10
Join Date: Aug 2022
Device: PC
There is a problem, it is easy to fail in the middle of the download, increase the delay to 20 seconds, still the same problem, can not extract all the articles, only a few articles, please take a look again, thank you very much
fengli is offline   Reply With Quote
Advert
Old 07-18-2023, 09:08 AM   #6
unkn0wn
Evangelist
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 462
Karma: 82692
Join Date: May 2021
Device: kindle
I've also faced it when testing, so i increased the delay, it seems to be working now. 20 seconds is too much but I think delay is the only tool we have.. maybe we could include random pause.

it also became a lot faster than before as html is made locally.

Last edited by unkn0wn; 07-18-2023 at 09:28 AM.
unkn0wn is offline   Reply With Quote
Old 07-18-2023, 09:39 AM   #7
fengli
Connoisseur
fengli began at the beginning.
 
Posts: 80
Karma: 10
Join Date: Aug 2022
Device: PC
Bloomberg’s detection ability is very strong. When it captures about 20%, it will show a failure, and the subsequent content cannot be captured.
fengli is offline   Reply With Quote
Old 07-18-2023, 09:52 AM   #8
unkn0wn
Evangelist
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 462
Karma: 82692
Join Date: May 2021
Device: kindle
test this once, use vpn or some other new IP.
Attached Files
File Type: recipe Bloomberg.recipe (8.2 KB, 44 views)
unkn0wn is offline   Reply With Quote
Old 07-18-2023, 10:18 AM   #9
fengli
Connoisseur
fengli began at the beginning.
 
Posts: 80
Karma: 10
Join Date: Aug 2022
Device: PC
It worked, thanks a lot. There is no failure in the middle, but I don't know if it is stable or not
fengli is offline   Reply With Quote
Old 07-18-2023, 10:44 AM   #10
unkn0wn
Evangelist
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 462
Karma: 82692
Join Date: May 2021
Device: kindle
hmm, it also worked for me. I'll implement this. Here the delay is random, takes longer time for articles with more text and images.
unkn0wn is offline   Reply With Quote
Old 07-19-2023, 01:30 AM   #11
unkn0wn
Evangelist
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 462
Karma: 82692
Join Date: May 2021
Device: kindle
https://github.com/unkn0w7n/calibre/...c43dcccd1a4570

Don't know how this is working, as delay is removed, 5 articles are downloaded simultaneously, but still somehow the pauses makes things so random that bloomberg fails to detect.
unkn0wn is offline   Reply With Quote
Old 07-19-2023, 01:33 AM   #12
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,008
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Note that you can make simultaneous_downloads lower to reduce the 5 as well.
kovidgoyal is online now   Reply With Quote
Old 07-19-2023, 02:25 AM   #13
unkn0wn
Evangelist
unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.unkn0wn can do the Funky Gibbon.
 
Posts: 462
Karma: 82692
Join Date: May 2021
Device: kindle
I did try that, but this is working and the whole recipe takes less than 15 minutes to fetch.

Earlier it used to take half an hour to fetch and still bloomberg would somehow detect it half way through.

Last edited by unkn0wn; 07-19-2023 at 02:33 AM.
unkn0wn is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
bloomberg Failed to download articles fengli Recipes 0 07-15-2023 06:08 AM
Los Angeles Times Crawl failed, only title fengli Recipes 2 03-24-2023 04:27 AM
PC word Crawl failed fengli Recipes 4 01-06-2023 03:08 AM
Focus (DE)Only the title, content crawl failure fengli Recipes 0 12-20-2022 09:02 PM
LA Weekly - Trouble - Full articles? kidblue Recipes 21 10-09-2010 04:16 PM


All times are GMT -4. The time now is 11:19 AM.


MobileRead.com is a privately owned, operated and funded community.