Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 05-10-2013, 09:53 AM   #1
JoxX
Junior Member
JoxX began at the beginning.
 
Posts: 2
Karma: 10
Join Date: May 2013
Device: Kindle Paperwhite
How To Geek - Recipe Update

Today i updated my first recipe, so I appreciate any suggestions.

Improvements
  • Instead of only fetching the first lines of every article,
    this fetches the whole articles
  • Fetch time is now very fast, fetches only the needed content
    My one 0:28 minutes vs 2:33 minutes Old one

Bugs
Page break after each converted <h2> tag in the created epub:
<div class="mbp_pagebreak"></div>
How to get rid of it? (Tried to change the common conversion options
of Calibre, but they don't affect the news fetch, or?)
This causes a page break after each article-heading, so the heading
is alone on the first site, and the content starts on the next site.

And Calibre can't fetch 'lazy load' images i guess?
Images in the article won't be fetched, only
a gray circle indicating to the 'lazy load'-feature of this images.

Code:
# Based on TonytheBookworm's original recipe
__license__   = 'GPL v3'
__copyright__ = '2013, Johannes Kopf'

import re
from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1282101454(BasicNewsRecipe):
    title = u'How To Geek'
    language = 'en'
    __author__ = 'Johannes Kopf'
    description = 'Daily Computer Tips and Tricks'
    publisher = 'Howtogeek'
    category = 'PC,tips,tricks'
    oldest_article = 2
    max_articles_per_feed = 50
    no_stylesheets = True
    remove_javascript = True
    masthead_url = 'http://blog.stackoverflow.com/wp-content/uploads/how-to-geek-logo.png'
    cover_url = 'http://www.howtogeek.com/geekers/up/sshot4ebc09559ecbf.jpg'
    recursions = 1
    # Fetch only links from howtogeek.com/number
    match_regexps = [r'http://www.howtogeek.com/\d*']
    remove_tags = [
	dict(name='img',  attrs={'src':re.compile('.*readmore-button.png.*',re.IGNORECASE)}),
	dict(name='img',  attrs={'class':re.compile('.*lazyLoad.*',re.IGNORECASE)})]
    remove_tags_before = dict(name='div', attrs={'class':['thecontent']})
    remove_tags_after = dict(name='div', attrs={'class':['thecontent']})
    keep_only_tags = [
	dict(name='div', attrs={'class':['thecontent']}),
	dict(name=['h2', 'h3']),
	dict(name='a', attrs={'href':re.compile('.*http://www.howtogeek.com/\d*.*',re.IGNORECASE)})]
    feeds = [(u'Tips', u'http://feeds.howtogeek.com/howtogeek')]

Last edited by JoxX; 05-10-2013 at 01:55 PM.
JoxX is offline   Reply With Quote
Reply

Tags
how to geek, recipe update


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
metro uk recipe update fleclerc Recipes 2 01-20-2013 02:30 PM
The Economist Recipe Update rainrdx Recipes 1 01-17-2013 10:17 PM
shortlist.com recipe update scissors Recipes 3 05-19-2012 01:22 AM
Den of Geek Recipe (Nerdy News Feed) mrjaded Recipes 0 09-25-2011 11:10 AM
Kurier recipe update clanger9 Recipes 0 09-24-2011 09:45 AM


All times are GMT -4. The time now is 12:12 AM.


MobileRead.com is a privately owned, operated and funded community.