Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 06-05-2020, 03:49 AM   #1
hiperlink
Enthusiast
hiperlink began at the beginning.
 
Posts: 45
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
FetchError: Not Found likely due to double slash in URL

Hi All,

I have an issue with my custom recipe for a Hungarian political magazine.

My recipe works usually fine except for some rare ocassions, when they are making the mistake of creating URLs with doulble slashes in it, for example this week:

Code:
Fetching http://www.es.hu/cikk/2020-06-05//a-het-konyvei.html
Could not fetch link http://www.es.hu/cikk/2020-06-05//a-het-konyvei.html
Traceback (most recent call last):
  File "site-packages/calibre/web/fetch/simple.py", line 520, in process_links
  File "site-packages/calibre/web/fetch/simple.py", line 279, in fetch_url
FetchError: Not Found
Now the URL ( 'http://www.es.hu/cikk/2020-06-05//a-het-konyvei.html' ) works just fine, but when we try to navigate to the "normalized" version with a single slash after the date it really is a 404 page.

How should I solve this?
Thanks in advance, for any help!
hiperlink is offline   Reply With Quote
Old 06-05-2020, 04:31 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,778
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
you're saying the version with the double slash works but not with the single slash?
kovidgoyal is online now   Reply With Quote
Advert
Old 06-06-2020, 05:09 AM   #3
hiperlink
Enthusiast
hiperlink began at the beginning.
 
Posts: 45
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
Hi Kovid,

Yes, You can check here:

- https://www.es.hu/cikk/2020-06-05//a-het-konyvei.html This is the article page
- https://www.es.hu/cikk/2020-06-05/a-het-konyvei.html This is an 404 page

The feed in my recipe gets build from the actual homepage of the magazine ( https://www.es.hu/ ) and the the feed does contain the correct link (with the double slash).

Still while downloading via ebook-convert (version: 4.6.0) I get the FetchError: Not Found.

Any help is apreciated.
hiperlink is offline   Reply With Quote
Old 06-06-2020, 06:10 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,778
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I dont really see a way to fix this, mechanize normalizes double slashes in urls.
kovidgoyal is online now   Reply With Quote
Old 06-07-2020, 04:33 AM   #5
hiperlink
Enthusiast
hiperlink began at the beginning.
 
Posts: 45
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
Hi Kovid,

I just experimented a bit with mechanize, and it happily fetched the proper page this way:

Code:
import re
import mechanize
br = mechanize.Browser()
br.open("https://www.es.hu/rovat/kritika") # this is a subpage of the main domain, with less articles
resp = br.follow_link(text_regex=r"A H", nr=0) # this is a part of the specific // URL's article title
resp.geturl()
# 'https://www.es.hu/cikk/2020-06-05//a-het-konyvei.html'
Thus it follows the URL easily. Where it gets normalized then?
hiperlink is offline   Reply With Quote
Advert
Old 06-07-2020, 05:02 AM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,778
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
No idea then. Could be any number of places where the url is parsed an re-constituted.
kovidgoyal is online now   Reply With Quote
Old 06-07-2020, 05:59 AM   #7
hiperlink
Enthusiast
hiperlink began at the beginning.
 
Posts: 45
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
Ok, thanks for spending some time on it. I might wright a wrapper to download these kind of articles locally then add them to the feed with a file:/// URL.
hiperlink is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
File names and slash / in the title Joanna Calibre Companion 4 12-29-2015 05:04 AM
Wrong sort order when slash (/) in ebook name rfog Calibre 4 07-31-2012 09:55 AM
Print friendly url unrelated to regular url (and javascript) sleepless Recipes 3 12-03-2011 10:43 AM
Dealing with double quotes " in URL kinurev Recipes 6 10-03-2010 09:57 AM


All times are GMT -4. The time now is 02:00 AM.


MobileRead.com is a privately owned, operated and funded community.