04-20-2016, 12:03 PM | #1 |
Member
Posts: 22
Karma: 116
Join Date: Jul 2010
Device: none
|
Recipe broken due to missing index.html on site
I'm trying to repair the recipe for the Vancouver Sun. The pre-existing recipe references the pages that contain the article links with 'index.html'. For example, (u'National',u'/news/national/index.html').
However, the site has changed, and the pages are now in the format of (for example): '/category/news/national' - there is no longer an index.html (or an index.htm, or a default.html, or default.htm - I've tried them all in a browser), so the recipe breaks. I've tried changing the recipe so that the page is referenced as (u'National',u'/news/national') but it does not work. Does anyone have any suggestions for fixing it? |
04-20-2016, 09:49 PM | #2 |
creator of calibre
Posts: 43,853
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Since it is likely that the structure of the HTML on the index pages has also changed since the recipe was written, you'd also have to fix the code that actually parses the HTML to extract the links. Which, from looking briefly at the recipe means fixing parse_web_index() and handle_article()
|
Advert | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
"The Atlantic" recipe broken by web site change | mikebw | Recipes | 3 | 05-04-2016 06:25 AM |
Providence Journal recipe broken by web site changes | mikebw | Recipes | 3 | 04-05-2015 11:30 PM |
Instapaper recipe - broken by site redesign? | adfadfsasdfafafd | Recipes | 11 | 06-02-2014 08:31 AM |
ESPN recipe broken due to new print urls | Odyseus | Recipes | 1 | 01-18-2012 12:23 AM |
Calibre Recipe HTML content differs from raw html of index.html. | krunk | Calibre | 4 | 09-20-2010 09:48 PM |