| 
			
			 | 
		#1351 | 
| 
			
			
			
			 Guru 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800 
				Karma: 194644 
				Join Date: Dec 2007 
				Location: Argentina 
				
				
				Device: Kindle Voyage 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			New recipe for digital spy UK:
		 
		
	
		
		
			 | 
| 
		 | 
	
	
| 
			
			 | 
		#1352 | 
| 
			
			
			
			 Member 
			
			![]() Posts: 12 
				Karma: 42 
				Join Date: Jan 2010 
				
				
				
				Device: Kindle 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			keep_only_tags = [dict(attrs={'class':['print-title','print-subtitle','print-author','print-date-issue','print-content']})] 
		
	
		
		
		
		
		
		
		
		
		
		
	
	I put this in the recipe and it worked very nicely. However, the author and date are not coming through. Do I need to add something else? Denny  | 
| 
		 | 
	
	
| 
			
			 | 
		#1353 | 
| 
			
			
			
			 Guru 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800 
				Karma: 194644 
				Join Date: Dec 2007 
				Location: Argentina 
				
				
				Device: Kindle Voyage 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			OK try this one: 
		
	
		
		
		
		
		
		
		
		
		
		
	
	Code: 
	keep_only_tags = [dict(attrs={'class':['print-title','print-subtitle','print-author','author','print-date','print-date-issue','print-content']})]
 | 
| 
		 | 
	
	
| 
			
			 | 
		#1354 | 
| 
			
			
			
			 Member 
			
			![]() Posts: 12 
				Karma: 42 
				Join Date: Jan 2010 
				
				
				
				Device: Kindle 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Brilliant.  That worked.  Thank you.   
		
	
		
		
		
		
		
		
		
		
		
		
	
	BTW, what's the best method to capture the cover image when the url changes each time. In this case the url includes the volume number, issue number, and the date. Denny  | 
| 
		 | 
	
	
| 
			
			 | 
		#1355 | |
| 
			
			
			
			 US Navy, Retired 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,897 
				Karma: 13806776 
				Join Date: Feb 2009 
				Location: North Carolina 
				
				
				Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 Code: 
	masthead_url = 'http://www.weeklystandard.com/sites/all/themes/weeklystandard/images/logo_red.png'  | 
|
| 
		 | 
	
	
| 
			
			 | 
		#1356 | 
| 
			
			
			
			 Member 
			
			![]() Posts: 12 
				Karma: 42 
				Join Date: Jan 2010 
				
				
				
				Device: Kindle 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			I had included "print-logo" in the recipe that shows at the beginning of each article but that's a nice way to just include it at the beginning on the Kindle. 
		
	
		
		
		
		
		
		
		
		
		
		
	
	Thanks, Denny  | 
| 
		 | 
	
	
| 
			
			 | 
		#1357 | 
| 
			
			
			
			 US Navy, Retired 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,897 
				Karma: 13806776 
				Join Date: Feb 2009 
				Location: North Carolina 
				
				
				Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			When you zip it up to send to this forum include the icon in the zip.  I've attached it for you.
		 
		
	
		
		
			 | 
| 
		 | 
	
	
| 
			
			 | 
		#1358 | 
| 
			
			
			
			 Connoisseur 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 59 
				Karma: 4212 
				Join Date: Feb 2010 
				
				
				
				Device: Sony 
				
				
				 | 
	
	
	
		
		
			
			 
				
				Topeka Capital Journal recipe
			 
			
			
			Hello, 
		
	
		
		
		
		
		
		
		
		
		
		
	
	I am totally new to the ebook world and try to learn. I would like to have a recipe for the Topeka Capital Journal (http://cjonline.com/). I tried the "easy" way but all I can get is garbage. Thank you for any help you can provide! Gianfranco  | 
| 
		 | 
	
	
| 
			
			 | 
		#1359 | 
| 
			
			
			
			 Guru 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800 
				Karma: 194644 
				Join Date: Dec 2007 
				Location: Argentina 
				
				
				Device: Kindle Voyage 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			New recipe for Topeka Journal:
		 
		
	
		
		
			 | 
| 
		 | 
	
	
| 
			
			 | 
		#1360 | 
| 
			
			
			
			 Member 
			
			![]() Posts: 12 
				Karma: 42 
				Join Date: Jan 2010 
				
				
				
				Device: Kindle 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Walt, 
		
	
		
		
		
		
		
		
		
		
		
		
	
	1. why include the icon 2. I'm having trouble copying my recipe from calibre to Notepad. The indents change and the recipe won't work when it's copied back to calibre. Denny  | 
| 
		 | 
	
	
| 
			
			 | 
		#1361 | 
| 
			
			
			
			 onlinenewsreader.net 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 332 
				Karma: 10143 
				Join Date: Dec 2009 
				Location: Phoenix, AZ & Victoria, BC 
				
				
				Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire 
				
				
				 | 
	
	
	
		
		
			
			 
				
				The Register (biting the hand that feeds IT)
			 
			
			
			Recipe for The Register -- a UK Information Technology news site. 
		
	
		
		
		
		
		
		
		
		
		
		
	
	Code: 
	#!/usr/bin/env  python
__license__   = 'GPL v3'
__copyright__ = '2010, Nick Redding'
'''
www.theregister.co.uk
'''
import string, re
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup
from datetime import timedelta, datetime, date
class TheRegister(BasicNewsRecipe):
    title = u'The Register'
    language = 'en_GB'
    __author__ = 'Nick Redding'
    oldest_article = 2
    timefmt = '' # '[%b %d]'
    needs_subscription = False
    keep_only_tags = [dict(name='div', attrs={'id':'article'})]
    #remove_tags_before = []
    remove_tags = [
		{'id':['related-stories','ad-mpu1-spot'] },
		{'class':['orig-url','article-nav','wptl btm','wptl top']}
		]
    #remove_tags_after = []
    no_stylesheets = True
    extra_css = '''
                h2 {font-size: x-large; }
                h3 {font-size: large; font-weight: bold; }
                .byline {font-size: x-small; }
                .dateline {font-size: x-small; }
                '''
    def get_browser(self):
        br = BasicNewsRecipe.get_browser()
        return br
    def get_masthead_url(self):
        masthead = 'http://www.theregister.co.uk/Design/graphics/std/logo_414_80.png'
        br = BasicNewsRecipe.get_browser()
        try:
            br.open(masthead)
        except:
            self.log("\nMasthead unavailable")
            masthead = None
        return masthead
    def preprocess_html(self,soup):
        # this removes the explicit url after links
        for span_tag in soup.findAll('span','URL'):
            span_tag.previous.replaceWith(re.sub("\ \($","",self.tag_to_string(span_tag.previous)))
            span_tag.next.next.replaceWith(re.sub("^\)","",self.tag_to_string(span_tag.next.next)))
            span_tag.extract()
        return soup
                                   
    def parse_index(self):
        def decode_date(datestr):
            udate = datestr.strip().lower().split()
            m = ['jan','feb','mar','apr','may','jun','jul','aug','sep','oct','nov','dec'].index(udate[1])+1
            d = int(udate[0])
            y = date.today().year
            return date(y,m,d)
        articles = {}
        key = None
        ans = []
        def parse_index_page(page_name,page_title):
            def article_title(tag):
                atag = tag.find('a',href=True)
                return ''.join(atag.findAll(text=True, recursive=False)).strip()
            def article_date(tag):
                t = tag.find(True, {'class' : 'date'})
                if t:
                    return ''.join(t.findAll(text=True, recursive=False)).strip()
                return ''
            def article_summary(tag):
                t = tag.find(True, {'class' : 'standfirst'})
                if t:
                    return ''.join(t.findAll(text=True, recursive=False)).strip()
                return ''
            def article_url(tag):
                atag = tag.find('a',href=True)
                url = atag['href']
                return url
            mainurl = 'http://www.theregister.co.uk'
            soup = self.index_to_soup(mainurl+page_name)
            # Find each instance of class="section-headline", class="story", class="story headline"
            for div in soup.findAll('div',attrs={'class':re.compile('^story-ref')}):
                # div contains all article data
                # check if article is too old
                datetag = div.find('span','date')
                if datetag:
                    dateline_string = self.tag_to_string(datetag,False)
                    a_date = decode_date(dateline_string)
                    earliest_date = date.today() - timedelta(days=self.oldest_article)
                    if a_date < earliest_date:
                        self.log("Skipping article dated %s" % dateline_string)
                        continue
                url = article_url(div)
                if 'http' in url:
                    continue
                url = mainurl + url + 'print.html'
                self.log("URL %s" % url)
                title = article_title(div)
                self.log("Title %s" % title)
                pubdate = article_date(div)
                self.log("Date %s" % pubdate)
                description = article_summary(div)
                self.log("Description %s" % description)
                author = ''
                if not articles.has_key(page_title):
                    articles[page_title] = []
                articles[page_title].append(dict(title=title,url=url,date=pubdate,description=description,author=author,content=''))
        parse_index_page('','Front Page')
        ans.append('Front Page')
        parse_index_page('/hardware','Hardware')
        ans.append('Hardware')
        parse_index_page('/software','Software')
        ans.append('Software')
        parse_index_page('/music_media','Music & Media')
        ans.append('Music & Media')
        parse_index_page('/networks','Networks')
        ans.append('Networks')
        parse_index_page('/security','Security')
        ans.append('Security')
        parse_index_page('/public_sector','Public Sector')
        ans.append('Public Sector')
        parse_index_page('/business','Business')
        ans.append('Business')
        parse_index_page('/science','Science')
        ans.append('Science')
        parse_index_page('/odds','Odds & Sods')
        ans.append('Odds & Sods')
        ans = [(key, articles[key]) for key in ans if articles.has_key(key)]
        return ans
 | 
| 
		 | 
	
	
| 
			
			 | 
		#1362 | 
| 
			
			
			
			 Connoisseur 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 59 
				Karma: 4212 
				Join Date: Feb 2010 
				
				
				
				Device: Sony 
				
				
				 | 
	
	
	
		
		
			
			 
				
				Wow! Thanks!
			 
			 | 
| 
		 | 
	
	
| 
			
			 | 
		#1363 | |
| 
			
			
			
			 US Navy, Retired 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,897 
				Karma: 13806776 
				Join Date: Feb 2009 
				Location: North Carolina 
				
				
				Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 You can just paste the code in a post and wrap it in code tags (the #).  | 
|
| 
		 | 
	
	
| 
			
			 | 
		#1364 | 
| 
			
			
			
			 Member 
			
			![]() Posts: 21 
				Karma: 10 
				Join Date: Jul 2008 
				
				
				
				Device: EZ Reader Pocket Pro 
				
				
				 | 
	
	
	
		
		
			
			 
			
			thanks for the recipe I was looking for one for this site, I tried to do it myself but I dont know nothing about programming... just 2 questions, how do I change the default image? and its there a way to show the pictures of the snips saved on read it later (retrieves only text) thank you., 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 
		 | 
| 
		 | 
	
	
| 
			
			 | 
		#1365 | 
| 
			
			
			
			 Junior Member 
			
			![]() Posts: 3 
				Karma: 10 
				Join Date: Jan 2010 
				
				
				
				Device: none 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Thanks for the tip & it works 70% of the time. Problem is with RSS feeds. Occasionally I want to use RSS feed from a Blog or a discussion board and my fetch may not repeat more than once. Instapaper  solution on RSS feed will not work as I cannot ask Calibre to do a recessive get from Instapaper recipe.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
![]()  | 
            
        
            
            
  | 
    
			 
			Similar Threads
		 | 
	||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 03:52 AM | 
| Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 01:33 PM | 
| How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 06:08 AM | 
| Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 05:54 PM | 
| Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 03:37 PM |