Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 04-20-2016, 12:03 PM   #1
kentmatt
Member
kentmatt doesn't litterkentmatt doesn't litter
 
Posts: 22
Karma: 116
Join Date: Jul 2010
Device: none
Recipe broken due to missing index.html on site

I'm trying to repair the recipe for the Vancouver Sun. The pre-existing recipe references the pages that contain the article links with 'index.html'. For example, (u'National',u'/news/national/index.html').

However, the site has changed, and the pages are now in the format of (for example): '/category/news/national' - there is no longer an index.html (or an index.htm, or a default.html, or default.htm - I've tried them all in a browser), so the recipe breaks.

I've tried changing the recipe so that the page is referenced as (u'National',u'/news/national') but it does not work.

Does anyone have any suggestions for fixing it?
kentmatt is offline   Reply With Quote
Old 04-20-2016, 09:49 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,853
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Since it is likely that the structure of the HTML on the index pages has also changed since the recipe was written, you'd also have to fix the code that actually parses the HTML to extract the links. Which, from looking briefly at the recipe means fixing parse_web_index() and handle_article()
kovidgoyal is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
"The Atlantic" recipe broken by web site change mikebw Recipes 3 05-04-2016 06:25 AM
Providence Journal recipe broken by web site changes mikebw Recipes 3 04-05-2015 11:30 PM
Instapaper recipe - broken by site redesign? adfadfsasdfafafd Recipes 11 06-02-2014 08:31 AM
ESPN recipe broken due to new print urls Odyseus Recipes 1 01-18-2012 12:23 AM
Calibre Recipe HTML content differs from raw html of index.html. krunk Calibre 4 09-20-2010 09:48 PM


All times are GMT -4. The time now is 07:20 AM.


MobileRead.com is a privately owned, operated and funded community.