| 
			
			 | 
		#1 | 
| 
			
			
			
			 Junior Member 
			
			![]() Posts: 2 
				Karma: 10 
				Join Date: May 2013 
				
				
				
				Device: Kindle Paperwhite 
				
				
				 | 
	
	
	
		
		
			
			 
				
				How To Geek - Recipe Update
			 
			
			
			Today i updated my first recipe, so I appreciate any suggestions. 
		
	
		
		
		
		
		
		
		
		
		
		
		
			Improvements 
 Bugs Page break after each converted <h2> tag in the created epub: <div class="mbp_pagebreak"></div> How to get rid of it? (Tried to change the common conversion options of Calibre, but they don't affect the news fetch, or?) This causes a page break after each article-heading, so the heading is alone on the first site, and the content starts on the next site. And Calibre can't fetch 'lazy load' images i guess? Images in the article won't be fetched, only a gray circle indicating to the 'lazy load'-feature of this images. Code: 
	# Based on TonytheBookworm's original recipe
__license__   = 'GPL v3'
__copyright__ = '2013, Johannes Kopf'
import re
from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1282101454(BasicNewsRecipe):
    title = u'How To Geek'
    language = 'en'
    __author__ = 'Johannes Kopf'
    description = 'Daily Computer Tips and Tricks'
    publisher = 'Howtogeek'
    category = 'PC,tips,tricks'
    oldest_article = 2
    max_articles_per_feed = 50
    no_stylesheets = True
    remove_javascript = True
    masthead_url = 'http://blog.stackoverflow.com/wp-content/uploads/how-to-geek-logo.png'
    cover_url = 'http://www.howtogeek.com/geekers/up/sshot4ebc09559ecbf.jpg'
    recursions = 1
    # Fetch only links from howtogeek.com/number
    match_regexps = [r'http://www.howtogeek.com/\d*']
    remove_tags = [
	dict(name='img',  attrs={'src':re.compile('.*readmore-button.png.*',re.IGNORECASE)}),
	dict(name='img',  attrs={'class':re.compile('.*lazyLoad.*',re.IGNORECASE)})]
    remove_tags_before = dict(name='div', attrs={'class':['thecontent']})
    remove_tags_after = dict(name='div', attrs={'class':['thecontent']})
    keep_only_tags = [
	dict(name='div', attrs={'class':['thecontent']}),
	dict(name=['h2', 'h3']),
	dict(name='a', attrs={'href':re.compile('.*http://www.howtogeek.com/\d*.*',re.IGNORECASE)})]
    feeds = [(u'Tips', u'http://feeds.howtogeek.com/howtogeek')]
Last edited by JoxX; 05-10-2013 at 02:55 PM.  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
![]()  | 
            
        
    
| Tags | 
| how to geek, recipe update | 
            
  | 
    
			 
			Similar Threads
		 | 
	||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| metro uk recipe update | fleclerc | Recipes | 2 | 01-20-2013 03:30 PM | 
| The Economist Recipe Update | rainrdx | Recipes | 1 | 01-17-2013 11:17 PM | 
| shortlist.com recipe update | scissors | Recipes | 3 | 05-19-2012 02:22 AM | 
| Den of Geek Recipe (Nerdy News Feed) | mrjaded | Recipes | 0 | 09-25-2011 12:10 PM | 
| Kurier recipe update | clanger9 | Recipes | 0 | 09-24-2011 10:45 AM |