06-05-2020, 03:49 AM | #1 |
Enthusiast
Posts: 45
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
|
FetchError: Not Found likely due to double slash in URL
Hi All,
I have an issue with my custom recipe for a Hungarian political magazine. My recipe works usually fine except for some rare ocassions, when they are making the mistake of creating URLs with doulble slashes in it, for example this week: Code:
Fetching http://www.es.hu/cikk/2020-06-05//a-het-konyvei.html Could not fetch link http://www.es.hu/cikk/2020-06-05//a-het-konyvei.html Traceback (most recent call last): File "site-packages/calibre/web/fetch/simple.py", line 520, in process_links File "site-packages/calibre/web/fetch/simple.py", line 279, in fetch_url FetchError: Not Found How should I solve this? Thanks in advance, for any help! |
06-05-2020, 04:31 AM | #2 |
creator of calibre
Posts: 43,778
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
you're saying the version with the double slash works but not with the single slash?
|
Advert | |
|
06-06-2020, 05:09 AM | #3 |
Enthusiast
Posts: 45
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
|
Hi Kovid,
Yes, You can check here: - https://www.es.hu/cikk/2020-06-05//a-het-konyvei.html This is the article page - https://www.es.hu/cikk/2020-06-05/a-het-konyvei.html This is an 404 page The feed in my recipe gets build from the actual homepage of the magazine ( https://www.es.hu/ ) and the the feed does contain the correct link (with the double slash). Still while downloading via ebook-convert (version: 4.6.0) I get the FetchError: Not Found. Any help is apreciated. |
06-06-2020, 06:10 AM | #4 |
creator of calibre
Posts: 43,778
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I dont really see a way to fix this, mechanize normalizes double slashes in urls.
|
06-07-2020, 04:33 AM | #5 |
Enthusiast
Posts: 45
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
|
Hi Kovid,
I just experimented a bit with mechanize, and it happily fetched the proper page this way: Code:
import re import mechanize br = mechanize.Browser() br.open("https://www.es.hu/rovat/kritika") # this is a subpage of the main domain, with less articles resp = br.follow_link(text_regex=r"A H", nr=0) # this is a part of the specific // URL's article title resp.geturl() # 'https://www.es.hu/cikk/2020-06-05//a-het-konyvei.html' |
Advert | |
|
06-07-2020, 05:02 AM | #6 |
creator of calibre
Posts: 43,778
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
No idea then. Could be any number of places where the url is parsed an re-constituted.
|
06-07-2020, 05:59 AM | #7 |
Enthusiast
Posts: 45
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
|
Ok, thanks for spending some time on it. I might wright a wrapper to download these kind of articles locally then add them to the feed with a file:/// URL.
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
File names and slash / in the title | Joanna | Calibre Companion | 4 | 12-29-2015 05:04 AM |
Wrong sort order when slash (/) in ebook name | rfog | Calibre | 4 | 07-31-2012 09:55 AM |
Print friendly url unrelated to regular url (and javascript) | sleepless | Recipes | 3 | 12-03-2011 10:43 AM |
Dealing with double quotes " in URL | kinurev | Recipes | 6 | 10-03-2010 09:57 AM |