Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes


Thread Tools Search this Thread
Old 05-10-2013, 09:53 AM   #1
Junior Member
JoxX began at the beginning.
Posts: 2
Karma: 10
Join Date: May 2013
Device: Kindle Paperwhite
How To Geek - Recipe Update

Today i updated my first recipe, so I appreciate any suggestions.

  • Instead of only fetching the first lines of every article,
    this fetches the whole articles
  • Fetch time is now very fast, fetches only the needed content
    My one 0:28 minutes vs 2:33 minutes Old one

Page break after each converted <h2> tag in the created epub:
<div class="mbp_pagebreak"></div>
How to get rid of it? (Tried to change the common conversion options
of Calibre, but they don't affect the news fetch, or?)
This causes a page break after each article-heading, so the heading
is alone on the first site, and the content starts on the next site.

And Calibre can't fetch 'lazy load' images i guess?
Images in the article won't be fetched, only
a gray circle indicating to the 'lazy load'-feature of this images.

# Based on TonytheBookworm's original recipe
__license__   = 'GPL v3'
__copyright__ = '2013, Johannes Kopf'

import re
from import BasicNewsRecipe
class AdvancedUserRecipe1282101454(BasicNewsRecipe):
    title = u'How To Geek'
    language = 'en'
    __author__ = 'Johannes Kopf'
    description = 'Daily Computer Tips and Tricks'
    publisher = 'Howtogeek'
    category = 'PC,tips,tricks'
    oldest_article = 2
    max_articles_per_feed = 50
    no_stylesheets = True
    remove_javascript = True
    masthead_url = ''
    cover_url = ''
    recursions = 1
    # Fetch only links from
    match_regexps = [r'\d*']
    remove_tags = [
	dict(name='img',  attrs={'src':re.compile('.*readmore-button.png.*',re.IGNORECASE)}),
	dict(name='img',  attrs={'class':re.compile('.*lazyLoad.*',re.IGNORECASE)})]
    remove_tags_before = dict(name='div', attrs={'class':['thecontent']})
    remove_tags_after = dict(name='div', attrs={'class':['thecontent']})
    keep_only_tags = [
	dict(name='div', attrs={'class':['thecontent']}),
	dict(name=['h2', 'h3']),
	dict(name='a', attrs={'href':re.compile('.*\d*.*',re.IGNORECASE)})]
    feeds = [(u'Tips', u'')]

Last edited by JoxX; 05-10-2013 at 01:55 PM.
JoxX is offline   Reply With Quote

how to geek, recipe update

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
metro uk recipe update fleclerc Recipes 2 01-20-2013 02:30 PM
The Economist Recipe Update rainrdx Recipes 1 01-17-2013 10:17 PM recipe update scissors Recipes 3 05-19-2012 01:22 AM
Den of Geek Recipe (Nerdy News Feed) mrjaded Recipes 0 09-25-2011 11:10 AM
Kurier recipe update clanger9 Recipes 0 09-24-2011 09:45 AM

All times are GMT -4. The time now is 05:20 AM. is a privately owned, operated and funded community.