|
![]() |
|
Thread Tools | Search this Thread |
![]() |
#1 |
Connoisseur
![]() Posts: 97
Karma: 10
Join Date: Aug 2022
Device: PC
|
Bloomberg Weekly also failed, unable to crawl content, full of invalid icons
Please take the time to check it out, thank you very much
|
![]() |
![]() |
![]() |
#2 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 615
Karma: 85520
Join Date: May 2021
Device: kindle
|
https://github.com/unkn0w7n/calibre/...ef128ea65a5d3c
https://github.com/unkn0w7n/calibre/...ebf0c8faa5580e Fixed both for now, but there must be a better way to do this. I'm not able to get graphs/data images like before or the lists/tables from json, hyperlink tags are also missing. If someone knows ways to make it better, feel free to make those changes to the recipe and submit it here or on github. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Connoisseur
![]() Posts: 97
Karma: 10
Join Date: Aug 2022
Device: PC
|
At least it can be extracted, thanks thanks
|
![]() |
![]() |
![]() |
#4 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 615
Karma: 85520
Join Date: May 2021
Device: kindle
|
|
![]() |
![]() |
![]() |
#5 |
Connoisseur
![]() Posts: 97
Karma: 10
Join Date: Aug 2022
Device: PC
|
There is a problem, it is easy to fail in the middle of the download, increase the delay to 20 seconds, still the same problem, can not extract all the articles, only a few articles, please take a look again, thank you very much
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 615
Karma: 85520
Join Date: May 2021
Device: kindle
|
I've also faced it when testing, so i increased the delay, it seems to be working now. 20 seconds is too much but I think delay is the only tool we have.. maybe we could include random pause.
it also became a lot faster than before as html is made locally. Last edited by unkn0wn; 07-18-2023 at 09:28 AM. |
![]() |
![]() |
![]() |
#7 |
Connoisseur
![]() Posts: 97
Karma: 10
Join Date: Aug 2022
Device: PC
|
Bloomberg’s detection ability is very strong. When it captures about 20%, it will show a failure, and the subsequent content cannot be captured.
|
![]() |
![]() |
![]() |
#8 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 615
Karma: 85520
Join Date: May 2021
Device: kindle
|
test this once, use vpn or some other new IP.
|
![]() |
![]() |
![]() |
#9 |
Connoisseur
![]() Posts: 97
Karma: 10
Join Date: Aug 2022
Device: PC
|
It worked, thanks a lot. There is no failure in the middle, but I don't know if it is stable or not
|
![]() |
![]() |
![]() |
#10 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 615
Karma: 85520
Join Date: May 2021
Device: kindle
|
hmm, it also worked for me. I'll implement this. Here the delay is random, takes longer time for articles with more text and images.
|
![]() |
![]() |
![]() |
#11 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 615
Karma: 85520
Join Date: May 2021
Device: kindle
|
https://github.com/unkn0w7n/calibre/...c43dcccd1a4570
Don't know how this is working, as delay is removed, 5 articles are downloaded simultaneously, but still somehow the pauses makes things so random that bloomberg fails to detect. |
![]() |
![]() |
![]() |
#12 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,310
Karma: 27111242
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Note that you can make simultaneous_downloads lower to reduce the 5 as well.
|
![]() |
![]() |
![]() |
#13 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 615
Karma: 85520
Join Date: May 2021
Device: kindle
|
I did try that, but this is working and the whole recipe takes less than 15 minutes to fetch.
Earlier it used to take half an hour to fetch and still bloomberg would somehow detect it half way through. Last edited by unkn0wn; 07-19-2023 at 02:33 AM. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
bloomberg Failed to download articles | fengli | Recipes | 0 | 07-15-2023 06:08 AM |
Los Angeles Times Crawl failed, only title | fengli | Recipes | 2 | 03-24-2023 04:27 AM |
PC word Crawl failed | fengli | Recipes | 4 | 01-06-2023 03:08 AM |
Focus (DE)Only the title, content crawl failure | fengli | Recipes | 0 | 12-20-2022 09:02 PM |
LA Weekly - Trouble - Full articles? | kidblue | Recipes | 21 | 10-09-2010 04:16 PM |