|  10-12-2023, 08:36 PM | #1 | 
| Zealot            Posts: 131 Karma: 2136220 Join Date: May 2019 Device: Kindle | 
				
				The Spectator - only title and synopsis
			 
			
			The last two "The Spectator" only fetch title and synopsis, since October. The article body content is missing.
		 | 
|   |   | 
|  10-14-2023, 11:11 AM | #2 | 
| Guru            Posts: 644 Karma: 85520 Join Date: May 2021 Device: kindle | 
			
			you can use the attached recipe, it will load all articles but is still a temporary solution (Might fail due to too many requests). time for someone to figure out and add login code to the recipe. | 
|   |   | 
|  10-15-2023, 01:43 AM | #3 | 
| Guru            Posts: 644 Karma: 85520 Join Date: May 2021 Device: kindle | 
			
			@kovidgoyal how can I make use of wayback machine? is it nytimes exclusive?
		 | 
|   |   | 
|  10-15-2023, 02:49 AM | #4 | 
| creator of calibre            Posts: 45,604 Karma: 28548974 Join Date: Oct 2006 Location: Mumbai, India Device: Various | 
			
			yes i would need to add support for spectator to it. what is the url scheme for spectator? if it has a decent url scheme I might be able to do it.
		 | 
|   |   | 
|  10-15-2023, 03:13 AM | #5 | 
| Guru            Posts: 644 Karma: 85520 Join Date: May 2021 Device: kindle | 
			
			https://web.archive.org/web/20231013...support-hamas/ looks like wayback machine doesn't have access to these articles. https://archive.today/ works but has different url and captcha checks. Can we do something for archive.today? https://archive.ph/K6f5r | 
|   |   | 
|  10-15-2023, 03:39 AM | #6 | 
| creator of calibre            Posts: 45,604 Karma: 28548974 Join Date: Oct 2006 Location: Mumbai, India Device: Various | 
			
			the archive.org entries are paywalled as well, so no point there. As for archive.today no idea never used it.
		 | 
|   |   | 
|  10-15-2023, 03:58 AM | #7 | 
| creator of calibre            Posts: 45,604 Karma: 28548974 Join Date: Oct 2006 Location: Mumbai, India Device: Various | 
			
			I took a brief look at archive.is changing the recipe to use it should be as simple as replacing the article urls with urls of the form https://archive.is/latest/original_url I dont know what their rate limiting and captcha policies are that will require experimentation. | 
|   |   | 
|  10-15-2023, 07:05 AM | #8 | 
| Guru            Posts: 644 Karma: 85520 Join Date: May 2021 Device: kindle | 
			
			although it loads content in browser.. theres no response in calibre for these urls Code: Traceback (most recent call last): File "mechanize\_urllib2_fork.py", line 1238, in do_open File "http\client.py", line 1374, in getresponse File "http\client.py", line 318, in begin File "http\client.py", line 287, in _read_status http.client.RemoteDisconnected: Remote end closed connection without response if we can get response.. we can also fix WSJ recipe. | 
|   |   | 
|  10-15-2023, 10:34 AM | #9 | 
| creator of calibre            Posts: 45,604 Karma: 28548974 Join Date: Oct 2006 Location: Mumbai, India Device: Various | 
			
			Does it work if you use the read_url() function from calibre.scraper.simple
		 | 
|   |   | 
|  10-15-2023, 01:41 PM | #10 | 
| Guru            Posts: 644 Karma: 85520 Join Date: May 2021 Device: kindle | Code: from calibre.scraper.simple import read_url
from calibre.ptempfile import PersistentTemporaryFile
...
    storage = []
    articles_are_obfuscated = True
    def get_obfuscated_article(self, url):
        raw = read_url(self.storage, 'https://archive.is/latest/' + url)
        pt = PersistentTemporaryFile('.html')
        pt.write(raw.encode('utf-8'))
        pt.close()
        return pt.nameit works, but is there a simpler way? | 
|   |   | 
|  10-15-2023, 10:40 PM | #11 | 
| creator of calibre            Posts: 45,604 Karma: 28548974 Join Date: Oct 2006 Location: Mumbai, India Device: Various | 
			
			Easier in what sense?
		 | 
|   |   | 
|  10-15-2023, 11:01 PM | #12 | 
| Guru            Posts: 644 Karma: 85520 Join Date: May 2021 Device: kindle | 
			
			idk, is this the right method though? I am noob here. can we do it without writing into a temp file through get_obfuscated? | 
|   |   | 
|  10-16-2023, 01:58 AM | #13 | 
| creator of calibre            Posts: 45,604 Karma: 28548974 Join Date: Oct 2006 Location: Mumbai, India Device: Various | 
			
			The cost of creating a temp file is insignificant compared to actually downloading so it doesnt matter, but I added some code to allow avoiding the temp file: https://github.com/kovidgoyal/calibr...6689de07213fbe
		 | 
|   |   | 
|  10-16-2023, 12:13 PM | #14 | 
| Guru            Posts: 644 Karma: 85520 Join Date: May 2021 Device: kindle | 
			
			will be able to use this in the next update i guess. Thanks.
		 | 
|   |   | 
|  | 
| Thread Tools | Search this Thread | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| The Spectator failed | darrenma | Recipes | 8 | 11-17-2022 07:17 PM | 
| Spectator Magazine has no content | mkgtu | Recipes | 9 | 10-01-2022 01:17 PM | 
| Recipe fails - The Spectator UK | nano5 | Recipes | 4 | 08-02-2022 06:20 AM | 
| Business Spectator | soctec | Recipes | 0 | 09-27-2012 03:29 AM | 
| Recipe for UK Spectator? | 7db | Recipes | 1 | 03-23-2011 05:52 AM |