View Single Post
Old 06-19-2011, 12:20 PM   #10
scissors
Addict
scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.
 
Posts: 241
Karma: 1001369
Join Date: Sep 2010
Device: prs300, kindle keyboard 3g
Red face

Hi Starson.

Sorry I never noticed all those controls. Hopefully this is better.

I finally got the code to replace the URL with the print version. But it made no difference. Following on from Kovid and yourselfs tips I loaded the various articles into notepad++ and deleted the entire contents between <head></head>.

This stopped the crash on the articles where it's removed. (I've attached a copy of the resultant epub with the header from the first article removed). The only effect is the "previous next section and main" calibre generated header is larger text (and no crash of the sony).

Here is the recipe as it stands

Spoiler:

Code:
import time, re
class AdvancedUserRecipe1306061239(BasicNewsRecipe):
    title          = u'Out and about live'
    description = 'Camping and Caravan  - News and Reviews'

    author = 'Dave Asbury'
    
    cover_url= 'http://www.outandaboutlive.co.uk/img/template/footer/illustration_3.jpg'
    masthead_url  = 'http://www.outandaboutlive.co.uk/img/template/cloud_logo.gif'

    oldest_article = 56
    max_articles_per_feed = 100
    remove_empty_feeds = True
    remove_javascript     = True
    no_stylesheets = True

    
    #remove_tags_before = dict(id='Body')


    preprocess_regexps = [
	(re.compile(r'Other News'), lambda h2 : ''),
	(re.compile(r'Magazines'), lambda h4 : '')
	                ]
    keep_only_tags = [
	dict(attrs={'class':['Content']})
	          ]  
    

    remove_tags = [
	  dict(attrs={'class' : ['ItemSummary','Buttons','jcarousel-skin-oal_magselector']}),
#	  dict(name='head'),
#	  dict(name='style')
#	  dict(name='h4', attrs={'Magazines'})      
 	]
      
    remove_attributes = ['Other News']
    def print_version(self, url):
	myurl = url.replace('/_', '/Print-_')
	print 'New URL =' ,myurl
    	return myurl
    feeds          = [(u'Camping News', u'http://feeds.feedburner.com/OAL/News/Camping')
	#	      (u'Camping Features', u'http://feeds.feedburner.com/OAL/Features/Camping'),
	 #     (u'Camping Reviews',u'http://feeds.feedburner.com/OAL/Reviews/Camping'),
	  #    (u'Caravan News',u'http://feeds.feedburner.com/OAL/News/Caravans'),
	    #   (u'Caravan Features',u'http://feeds.feedburner.com/OAL/Features/Caravans'),
	      # (u'Caravan Reviews',u'http://feeds.feedburner.com/OAL/Reviews/Caravans')
		]


Here is the contents of the header I removed from the first article

Spoiler:

Code:
<head>
<title>Double honour at green awards for Cornish campsite</title>
<meta name="keywords" content=""/>
<meta name="description" content=""/>
<meta property="og:type" content="article"/>
<meta property="og:site_name" content="Camping"/>
<meta property="og:url" content="http://www.outandaboutlive.co.uk/Camping/News/General/Double-honour-at-green-awards-for-Cornish-campsite/Print-_ch3_nw1433"/>
<meta property="og:title" content="Double honour at green awards for Cornish campsite"/>
<meta property="og:description" content="Dolbeare Park campsite landed two prizes at the Green Tourism Week awards in London"/>
<meta property="og:image" content="http://www.outandaboutlive.co.uk//userfiles/news/116_414329.jpg"/>
<meta content="http://www.w3.org/1999/xhtml; charset=utf-8" http-equiv="Content-Type"/><link href="../../stylesheet.css" type="text/css" rel="stylesheet"/><style type="text/css">
		@page { margin-bottom: 5.000000pt; margin-top: 5.000000pt; }</style></head>


I did at 1 point remove the <head> from the second article - it too stopped crashing.

Can a post process be done to remove <head></head> contents a second run so to speak. Is it possible there is a bug in Calibre

(I'm on a course for 2 weeks tomorrow so replies may be difficult)

Edit forgot to attach epub - attached next message

Last edited by scissors; 06-19-2011 at 02:49 PM.
scissors is offline   Reply With Quote