|04-20-2016, 01:03 PM||#1|
Join Date: Jul 2010
Recipe broken due to missing index.html on site
I'm trying to repair the recipe for the Vancouver Sun. The pre-existing recipe references the pages that contain the article links with 'index.html'. For example, (u'National',u'/news/national/index.html').
However, the site has changed, and the pages are now in the format of (for example): '/category/news/national' - there is no longer an index.html (or an index.htm, or a default.html, or default.htm - I've tried them all in a browser), so the recipe breaks.
I've tried changing the recipe so that the page is referenced as (u'National',u'/news/national') but it does not work.
Does anyone have any suggestions for fixing it?
|04-20-2016, 10:49 PM||#2|
creator of calibre
Join Date: Oct 2006
Location: Mumbai, India
Since it is likely that the structure of the HTML on the index pages has also changed since the recipe was written, you'd also have to fix the code that actually parses the HTML to extract the links. Which, from looking briefly at the recipe means fixing parse_web_index() and handle_article()
|Thread Tools||Search this Thread|
|Thread||Thread Starter||Forum||Replies||Last Post|
|"The Atlantic" recipe broken by web site change||mikebw||Recipes||3||05-04-2016 07:25 AM|
|Providence Journal recipe broken by web site changes||mikebw||Recipes||3||04-06-2015 12:30 AM|
|Instapaper recipe - broken by site redesign?||adfadfsasdfafafd||Recipes||11||06-02-2014 09:31 AM|
|ESPN recipe broken due to new print urls||Odyseus||Recipes||1||01-18-2012 01:23 AM|
|Calibre Recipe HTML content differs from raw html of index.html.||krunk||Calibre||4||09-20-2010 10:48 PM|