![]() |
#1 |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Jan 2009
Location: East Midlands, UK
Device: Sony PRS-505
|
Recipe for Metro UK
Has anyone out there got a recipe for the Metro (UK) ?
I had a look on their site (www.metro.co.uk) and they do provide RSS feeds, but when I tried a basic recipe it got the articles but spent ages processing the style sheets and the resulting output had no photos, or headlining, and had extrainious links. Here is the basic recipe... Code:
class AdvancedUserRecipe1289146844(BasicNewsRecipe): title = u'MetroUK' oldest_article = 7 max_articles_per_feed = 40 feeds = [(u'News', u'http://metro.co.uk/rss/news'), (u'Travel', u'http://metro.co.uk/rss/travel'), (u'Film', u'http://metro.co.uk/rss/metrolife/film'), (u'TV', u'http://www.metro.co.uk/rss/tv/'), (u'Tech & Gadgets', u'http://www.metro.co.uk/rss/tech/'), (u'Weird', u'http://metro.co.uk/rss/weird'), (u'Sport', u'http://www.metro.co.uk/rss/sport')] BossHogg. |
![]() |
![]() |
![]() |
#2 |
Zealot
![]() Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
|
try adding
Code:
keep_only_tags = [ dict(name='h1', attrs={'':''}), dict(name='h2', attrs={'calss':'h2'}), dict(name='div', attrs={'calss':'art-lft'}), ] |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Jan 2009
Location: East Midlands, UK
Device: Sony PRS-505
|
Thanks. I tried that but I think I must have broken something else as it is now only producing a news doc that has links to where the articles came from rather than the text of the article itself.
I think it may be a bit beyond my current skill level so will leave it until someone who knows what their doing can produce a recipe for this popular UK daily. Thanks for your suggestion though. |
![]() |
![]() |
![]() |
#4 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 241
Karma: 1001369
Join Date: Sep 2010
Device: prs300, kindle keyboard 3g
|
Here's a recipe i've managed to codge together.
The epub output isn't so good, but i set the default ouput to LRF in Calibre and the result is much better on my PRS300. (takes +20 mins to process on my laptop) class AdvancedUserRecipe1306097511(BasicNewsRecipe): title = u'Metro UK' oldest_article = 1 max_articles_per_feed = 100 keep_only_tags = [ dict(name='h1', attrs={'':''}), dict(name='h2', attrs={'class':'h2'}), dict(name='div', attrs={'class':'art-lft'}) ] remove_tags = [dict(name='div', attrs={'class':[ 'metroCommentFormWrap', 'commentForm', 'metroCommentInnerWrap', 'art-rgt','pluck-app pluck-comm' ]}), dict(name='h3', attrs={'':''})] feeds = [ (u'News', u'http://www.metro.co.uk/rss/news/'), (u'Money', u'http://www.metro.co.uk/rss/money/'), (u'Sport', u'http://www.metro.co.uk/rss/sport/'), (u'Film', u'http://www.metro.co.uk/rss/metrolife/film/'), (u'Music', u'http://www.metro.co.uk/rss/metrolife/music/'), (u'TV', u'http://www.metro.co.uk/rss/tv/'), (u'Showbiz', u'http://www.metro.co.uk/rss/showbiz/'), (u'Weird News', u'http://www.metro.co.uk/rss/weird/'), (u'Travel', u'http://www.metro.co.uk/rss/travel/'), (u'Lifestyle', u'http://www.metro.co.uk/rss/lifestyle/'), (u'Books', u'http://www.metro.co.uk/rss/lifestyle/books/'), (u'Food', u'http://www.metro.co.uk/rss/lifestyle/restaurants/')] Last edited by scissors; 05-24-2011 at 02:36 PM. |
![]() |
![]() |
![]() |
#5 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 241
Karma: 1001369
Join Date: Sep 2010
Device: prs300, kindle keyboard 3g
|
Recipe editted.
css thrown out - this cuts processing down to less than 2 mins and epub works. ===================================== class AdvancedUserRecipe1306097511(BasicNewsRecipe): title = u'Metro UK' no_stylesheets = True oldest_article = 1 max_articles_per_feed = 200 author = 'Dave Asbury' simultaneous_downloads= 3 masthead_url = 'http://e-edition.metro.co.uk/images/metro_logo.gif' keep_only_tags = [ dict(name='h1', attrs={'':''}), dict(name='h2', attrs={'class':'h2'}), dict(name='div', attrs={'class':'art-lft'}) ] remove_tags = [dict(name='div', attrs={'class':[ 'metroCommentFormWrap', 'commentForm', 'metroCommentInnerWrap', 'art-rgt','pluck-app pluck-comm','news m12 clrd clr-l p5t', 'flt-r' ]})] feeds = [ (u'News', u'http://www.metro.co.uk/rss/news/'), (u'Money', u'http://www.metro.co.uk/rss/money/'), (u'Sport', u'http://www.metro.co.uk/rss/sport/'), (u'Film', u'http://www.metro.co.uk/rss/metrolife/film/'), (u'Music', u'http://www.metro.co.uk/rss/metrolife/music/'), (u'TV', u'http://www.metro.co.uk/rss/tv/'), (u'Showbiz', u'http://www.metro.co.uk/rss/showbiz/'), (u'Weird News', u'http://www.metro.co.uk/rss/weird/'), (u'Travel', u'http://www.metro.co.uk/rss/travel/'), (u'Lifestyle', u'http://www.metro.co.uk/rss/lifestyle/'), (u'Books', u'http://www.metro.co.uk/rss/lifestyle/books/'), (u'Food', u'http://www.metro.co.uk/rss/lifestyle/restaurants/')] |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 241
Karma: 1001369
Join Date: Sep 2010
Device: prs300, kindle keyboard 3g
|
Editted again...
![]() ====================================== Spoiler:
Last edited by scissors; 06-19-2011 at 01:58 PM. Reason: remove_empty_feeds = True added |
![]() |
![]() |
![]() |
#7 |
Junior Member
![]() Posts: 6
Karma: 10
Join Date: May 2011
Device: kindle 3
|
Scissors/David, this is an excellent recipe. Metro seems nice and "clean" and easy to read, just the simple stories.
It also convinced me to DONATE to Kovid's program. Superb chaps. |
![]() |
![]() |
![]() |
#8 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 241
Karma: 1001369
Join Date: Sep 2010
Device: prs300, kindle keyboard 3g
|
Latest edit. Takes out readers comments and moves 2nd headings above images.
Spoiler:
Last edited by scissors; 06-19-2011 at 01:56 PM. |
![]() |
![]() |
![]() |
#9 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 241
Karma: 1001369
Join Date: Sep 2010
Device: prs300, kindle keyboard 3g
|
Minor Fix (but still nowhere near perfect)
Bit of extra code added - removes the word "Tweet" that appeared occasionally at the end of articles.
Spoiler:
Last edited by scissors; 06-19-2011 at 01:56 PM. |
![]() |
![]() |
![]() |
#10 |
Junior Member
![]() Posts: 6
Karma: 10
Join Date: May 2011
Device: kindle 3
|
inspiring. I'm going to get the manual out and start making my own recipes
![]() Thanks! |
![]() |
![]() |
![]() |
#11 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 241
Karma: 1001369
Join Date: Sep 2010
Device: prs300, kindle keyboard 3g
|
Reduced the size of the headlines and some other minor text formatting.
(looks better on my prs300) Code:
import re from calibre.web.feeds.news import BasicNewsRecipe class AdvancedUserRecipe1306097511(BasicNewsRecipe): title = u'Metro UK' description = 'News as provide by The Metro -UK' __author__ = 'Dave Asbury' cover_url = 'http://profile.ak.fbcdn.net/hprofile-ak-snc4/276636_117118184990145_2132092232_n.jpg' no_stylesheets = True oldest_article = 1 max_articles_per_feed = 20 remove_empty_feeds = True remove_javascript = True #preprocess_regexps = [(re.compile(r'Tweet'), lambda a : '')] preprocess_regexps = [ (re.compile(r'<span class="img-cap legend">', re.IGNORECASE | re.DOTALL), lambda match: '<p></p><span class="img-cap legend"> ')] preprocess_regexps = [ (re.compile(r'tweet', re.IGNORECASE | re.DOTALL), lambda match: '')] language = 'en_GB' masthead_url = 'http://e-edition.metro.co.uk/images/metro_logo.gif' keep_only_tags = [ dict(name='h1'),dict(name='h2', attrs={'class':'h2'}), dict(attrs={'class':['img-cnt figure']}), dict(attrs={'class':['art-img']}), dict(name='div', attrs={'class':'art-lft'}), dict(name='p') ] remove_tags = [dict(name='div', attrs={'class':[ 'news m12 clrd clr-b p5t shareBtm', 'commentForm', 'metroCommentInnerWrap', 'art-rgt','pluck-app pluck-comm','news m12 clrd clr-l p5t', 'flt-r' ]}), dict(attrs={'class':[ 'metroCommentFormWrap','commentText','commentsNav','avatar','submDateAndTime']}) ,dict(name='div', attrs={'class' : 'clrd art-fd fd-gr1-b'}) ] feeds = [ (u'News', u'http://www.metro.co.uk/rss/news/'), (u'Money', u'http://www.metro.co.uk/rss/money/'), (u'Sport', u'http://www.metro.co.uk/rss/sport/'), (u'Film', u'http://www.metro.co.uk/rss/metrolife/film/'), (u'Music', u'http://www.metro.co.uk/rss/metrolife/music/'), (u'TV', u'http://www.metro.co.uk/rss/tv/'), (u'Showbiz', u'http://www.metro.co.uk/rss/showbiz/'), (u'Weird News', u'http://www.metro.co.uk/rss/weird/'), (u'Travel', u'http://www.metro.co.uk/rss/travel/'), (u'Lifestyle', u'http://www.metro.co.uk/rss/lifestyle/'), (u'Books', u'http://www.metro.co.uk/rss/lifestyle/books/'), (u'Food', u'http://www.metro.co.uk/rss/lifestyle/restaurants/')] extra_css = ''' body {font: sans-serif medium;}' h1 {text-align : center; font-family:Arial,Helvetica,sans-serif; font-size:20px; font-size-adjust:none; font-stretch:normal; font-style:normal; font-variant:normal; font-weight:bold;} h2 {text-align : center;color:#4D4D4D;font-family:Arial,Helvetica,sans-serif; font-size:15px; font-size-adjust:none; font-stretch:normal; font-style:normal; font-variant:normal; font-weight:bold; } span{ font-size:9.5px; font-weight:bold;font-style:italic} p { text-align: justify; font-family:Arial,Helvetica,sans-serif; font-size:11px; font-size-adjust:none; font-stretch:normal; font-style:normal; font-variant:normal; font-weight:normal;} ''' Last edited by scissors; 10-07-2011 at 02:15 PM. Reason: cover url added |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Recipe works when mocked up as Python file, fails when converted to Recipe | ode | Recipes | 7 | 09-04-2011 04:57 AM |
Metro Map Viewer | faxi | PocketBook | 7 | 07-31-2010 07:50 AM |
Chit-Chat Le journal Métro parle des Ebooks | Mikael le Fou | Forum Français | 26 | 04-03-2010 03:53 PM |
Article and competition in the London Metro: | Riocaz | News | 0 | 09-02-2008 11:14 AM |
Intel Metro Notebook: a new use for e-ink | Hadrien | News | 2 | 04-17-2007 03:21 PM |