Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 11-02-2010, 01:30 PM   #16
orcpac7
Junior Member
orcpac7 began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jun 2010
Device: cailbre
Couldn't find the thread where Starson17 posted the original Fudzilla script so just tagging this on to a thread I found his comment on.
Fudzilla at some point changed the URL of their feed. It was a gentle introduction to the recipes for me when i wanted to change that. Ended up saving the code in a text file, modifying the code, delete the old recipe and import the new one from file.
The only line I changed was the feeds to point to "http://www.fudzilla.com/?format=feed'"
Quote:
#!/usr/bin/env python

__license__ = 'GPL v3'
__copyright__ = '2010 Starson17'
'''
fudzilla.com
'''

import re
from calibre.web.feeds.news import BasicNewsRecipe

class Fudzilla(BasicNewsRecipe):
title = u'Fudzilla'
__author__ = 'Starson17'
language = 'en'

description = 'Tech news'
oldest_article = 7
remove_javascript = True
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False


remove_tags_before = dict(name='div', attrs={'class':['padding']})

remove_tags = [dict(name='td', attrs={'class':['left','right']}),
dict(name='div', attrs={'id':['toolbar','buttons']}),
dict(name='div', attrs={'class':['artbannersxtd','back_button']}),
dict(name='span', attrs={'class':['pathway']}),
dict(name='th', attrs={'class':['pagenav_next','pagenav_prev']}),
dict(name='table', attrs={'class':['headlines']}),
]

feeds = [
(u'Posts', u'http://www.fudzilla.com/?format=feed')
]

preprocess_regexps = [
(re.compile(r'<p class="MsoNormal"> Welcome.*</p> ', re.DOTALL|re.IGNORECASE), lambda match: '')
]
orcpac7 is offline   Reply With Quote
Old 11-13-2011, 08:02 AM   #17
NotTaken
Connoisseur
NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.
 
Posts: 65
Karma: 4640
Join Date: Aug 2011
Device: kindle
A few changes I made to the guardian recipe...

Made a few changes to the guardian recipe:
  • Removed the adverts that appeared at the bottom of some articles - found that there were two html sections in the soup, with all the relevant stuff in the first.
  • Removed some of the info under the headline. This includes: the author mugshot, "a version of this article appeared" spiel and the link to article history. I have put a comment by each of these in the remove_tags list to make it easy to re-enable if you choose.
  • Removed the number next to the ratings stars (that appear in reviews) - you will probably want to remove this if you disable the images (just remove the relevant stuff in preprocess_html)
Attached Files
File Type: zip guardian.recipe.zip (2.1 KB, 203 views)

Last edited by NotTaken; 11-13-2011 at 08:15 AM.
NotTaken is offline   Reply With Quote
Advert
Old 11-22-2011, 03:14 PM   #18
dasym
Connoisseur
dasym began at the beginning.
 
Posts: 50
Karma: 10
Join Date: Dec 2008
Location: Scotland
Device: Kindle DX, Kindle. iPad 3
This works really well. Thanks for the updated recipe.
dasym is offline   Reply With Quote
Old 11-24-2011, 05:43 PM   #19
mrjaded
Junior Member
mrjaded began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Mar 2011
Device: kindle 4
Thanks from me too!

I've made this little addition to my own version of the recipe which adds a nice graphic masthead for each of the titles. I always found that the text version was a bit ugly and was so big it got truncated on my Kindle...

Code:
    title = u'The Guardian & The Observer'
    if date.today().weekday() == 6:
        base_url = "http://www.guardian.co.uk/theobserver"
        cover_pic = 'Observer digital edition'
        masthead_url = 'http://static.guim.co.uk/sys-images/Guardian/Pix/site_furniture/2010/10/19/1287478087992/The-Observer-001.gif'
    else:
        base_url = "http://www.guardian.co.uk/theguardian"
        cover_pic = 'Guardian digital edition'
        masthead_url = 'http://static.guim.co.uk/static/f76b43f9dcfd761f0ecf7099a127b603b2922118/common/images/logos/the-guardian/titlepiece.gif'
mrjaded is offline   Reply With Quote
Old 11-25-2011, 07:12 AM   #20
NotTaken
Connoisseur
NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.NotTaken is fluent in JavaScript as well as Klingon.
 
Posts: 65
Karma: 4640
Join Date: Aug 2011
Device: kindle
Glad people are finding it useful. I made a few more changes:
  • Hardcoded the encoding to utf-8 as I noticed a few artifacts when using auto detection
  • Added mrjaded's masthead_url addition (see above)
  • Removed embedded flash video/captions (possibly some use if you aren't using e-ink as provides link to video)
Attached Files
File Type: zip guardian.recipe.zip (2.3 KB, 191 views)

Last edited by NotTaken; 11-25-2011 at 07:18 AM.
NotTaken is offline   Reply With Quote
Advert
Old 01-10-2014, 11:02 AM   #21
paddyrm
Connoisseur
paddyrm began at the beginning.
 
Posts: 67
Karma: 10
Join Date: Oct 2012
Device: Kindle 3
Quote:
Originally Posted by ajnorman View Post
2. There is a variable "ignore_sections" which skips sections you don't want to see (I have absolutely no interest in seeing the Sport section, for example).
I've had no luck with this: I've tried ignore_sections = ['Sport'] and many variations, eg sport, Sports, football etc, but the whole Sports section still arrives. Any suggestions?

Thanks

Paddy
paddyrm is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Last Modified Date jjansen Calibre 1 09-12-2010 10:16 PM
Modified Cover Mickey330 PocketBook 11 06-18-2010 05:49 AM
Can a contract agreement be modified? ficbot Writers' Corner 5 05-21-2010 05:10 AM
Where are modified tags stored Giuseppe Chillem Calibre 12 05-18-2010 03:33 PM
jbl modified dictionary jeff862 Ectaco jetBook 1 02-25-2010 09:31 PM


All times are GMT -4. The time now is 08:21 AM.


MobileRead.com is a privately owned, operated and funded community.