| 
	|||||||
![]()  | 
            
        
    
| 
             | 
        Thread Tools | Search this Thread | 
| 
			
			 | 
		#1 | 
| 
			
			
			
			 Connoisseur 
			
			![]() Posts: 98 
				Karma: 10 
				Join Date: Aug 2022 
				
				
				
				Device: PC 
				
				
				 | 
	
	
	
		
		
			
			 
				
				Bloomberg Weekly also failed, unable to crawl content, full of invalid icons
			 
			
			
			Please take the time to check it out, thank you very much
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#2 | 
| 
			
			
			
			 Guru 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 645 
				Karma: 85520 
				Join Date: May 2021 
				
				
				
				Device: kindle 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			https://github.com/unkn0w7n/calibre/...ef128ea65a5d3c 
		
	
		
		
		
		
		
		
		
		
		
		
	
	https://github.com/unkn0w7n/calibre/...ebf0c8faa5580e Fixed both for now, but there must be a better way to do this. I'm not able to get graphs/data images like before or the lists/tables from json, hyperlink tags are also missing. If someone knows ways to make it better, feel free to make those changes to the recipe and submit it here or on github.  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#3 | 
| 
			
			
			
			 Connoisseur 
			
			![]() Posts: 98 
				Karma: 10 
				Join Date: Aug 2022 
				
				
				
				Device: PC 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			At least it can be extracted, thanks thanks
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#4 | 
| 
			
			
			
			 Guru 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 645 
				Karma: 85520 
				Join Date: May 2021 
				
				
				
				Device: kindle 
				
				
				 | 
	
	|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#5 | 
| 
			
			
			
			 Connoisseur 
			
			![]() Posts: 98 
				Karma: 10 
				Join Date: Aug 2022 
				
				
				
				Device: PC 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			There is a problem, it is easy to fail in the middle of the download, increase the delay to 20 seconds, still the same problem, can not extract all the articles, only a few articles, please take a look again, thank you very much
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#6 | 
| 
			
			
			
			 Guru 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 645 
				Karma: 85520 
				Join Date: May 2021 
				
				
				
				Device: kindle 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			I've also faced it when testing, so i increased the delay, it seems to be working now. 20 seconds is too much but I think delay is the only tool we have.. maybe we could include random pause. 
		
	
		
		
		
		
		
		
		
		
		
		
		
			it also became a lot faster than before as html is made locally. Last edited by unkn0wn; 07-18-2023 at 10:28 AM.  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#7 | 
| 
			
			
			
			 Connoisseur 
			
			![]() Posts: 98 
				Karma: 10 
				Join Date: Aug 2022 
				
				
				
				Device: PC 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Bloomberg’s detection ability is very strong. When it captures about 20%, it will show a failure, and the subsequent content cannot be captured.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#8 | 
| 
			
			
			
			 Guru 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 645 
				Karma: 85520 
				Join Date: May 2021 
				
				
				
				Device: kindle 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			test this once, use vpn or some other new IP.
		 
		
	
		
		
			 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#9 | 
| 
			
			
			
			 Connoisseur 
			
			![]() Posts: 98 
				Karma: 10 
				Join Date: Aug 2022 
				
				
				
				Device: PC 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			It worked, thanks a lot. There is no failure in the middle, but I don't know if it is stable or not
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#10 | 
| 
			
			
			
			 Guru 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 645 
				Karma: 85520 
				Join Date: May 2021 
				
				
				
				Device: kindle 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			hmm, it also worked for me. I'll implement this. Here the delay is random, takes longer time for articles with more text and images.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#11 | 
| 
			
			
			
			 Guru 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 645 
				Karma: 85520 
				Join Date: May 2021 
				
				
				
				Device: kindle 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			https://github.com/unkn0w7n/calibre/...c43dcccd1a4570 
		
	
		
		
		
		
		
		
		
		
		
		
	
	Don't know how this is working, as delay is removed, 5 articles are downloaded simultaneously, but still somehow the pauses makes things so random that bloomberg fails to detect.  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#12 | 
| 
			
			
			
			 creator of calibre 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609 
				Karma: 28549044 
				Join Date: Oct 2006 
				Location: Mumbai, India 
				
				
				Device: Various 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Note that you can make simultaneous_downloads lower to reduce the 5 as well.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#13 | 
| 
			
			
			
			 Guru 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 645 
				Karma: 85520 
				Join Date: May 2021 
				
				
				
				Device: kindle 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			I did try that, but this is working and the whole recipe takes less than 15 minutes to fetch.  
		
	
		
		
		
		
		
		
		
		
		
		
		
			Earlier it used to take half an hour to fetch and still bloomberg would somehow detect it half way through. Last edited by unkn0wn; 07-19-2023 at 03:33 AM.  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
![]()  | 
            
        
    
            
  | 
    
			 
			Similar Threads
		 | 
	||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| bloomberg Failed to download articles | fengli | Recipes | 0 | 07-15-2023 07:08 AM | 
| Los Angeles Times Crawl failed, only title | fengli | Recipes | 2 | 03-24-2023 05:27 AM | 
| PC word Crawl failed | fengli | Recipes | 4 | 01-06-2023 04:08 AM | 
| Focus (DE)Only the title, content crawl failure | fengli | Recipes | 0 | 12-20-2022 10:02 PM | 
| LA Weekly - Trouble - Full articles? | kidblue | Recipes | 21 | 10-09-2010 05:16 PM |