06-09-2013, 12:34 PM | #1 |
Enthusiast
Posts: 32
Karma: 10
Join Date: Apr 2012
Device: Amazon Kindle Paperwhite
|
How to exclude Video from CNN Recipe?
I’m enjoying reading CNN news on my Kindle but I’m having a problem with links to videos being included in the mobi file. Since the actual video isn’t included in the mobi file, these links bring up the next valid story rather than the one I expected to read.
I found this code from Starson17 in the Re-usable Code posting in the Recipe forum that can be used to remove videos: def parse_feeds (self): feeds = BasicNewsRecipe.parse_feeds(self) for feed in feeds: for article in feed.articles[:]: print 'article.title is: ', article.title if 'VIDEO' in article.title.upper() or 'GOAT' in article.url: feed.articles.remove(article) return feeds I can’t get this code to block videos in the CNN recipe. I believe this is because the recipe code is looking at the link on the CNN feed page which doesn’t contain the word ‘video’. However, the feed page link is redirected to a link that does contain the word ‘video’. Is there a way to change the CNN recipe to eliminate feeds based on the content of the redirected link? Here are some examples. Link to CNN feeds: http://www.cnn.com/services/rss/ Example 1 A feed titled ‘See ship explode Hollywood style’ dated June 9 @ 10:29 AM contains this link: http://rss.cnn.com/~r/rss/cnn_topsto...osion.cnn.html Following the above link, the browser redirects to this link that contains “/video/”: http://www.cnn.com/video/data/2.0/vi...osion.cnn.html Example 2 A feed titled ‘Who is the suspected gunman?’ dated June 9 @ 11:07 AM contains this link: http://rss.cnn.com/~r/rss/cnn_topsto...ified.cnn.html Following the above line, the browser redirects to this link that contains “/video/”: http://www.cnn.com/video/data/2.0/vi...ified.cnn.html |
06-09-2013, 12:36 PM | #2 |
creator of calibre
Posts: 43,844
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You need to implement preprocess_html, detect the video element and either remove it or return None, in which case the whole article will be skipped.
|
Advert | |
|
06-09-2013, 07:43 PM | #3 |
Enthusiast
Posts: 32
Karma: 10
Join Date: Apr 2012
Device: Amazon Kindle Paperwhite
|
Thanks Kovid. I was able to find where preprocess_html is mentioned in the documentation but couldn't find any examples of how to use it to filter articles based on the redirected URL. I searched other recipes hoping to find a working example but so far have had no success.
If you or one of the other experts in this forum have time to take a look at this issue, I would appreciate the help. |
06-09-2013, 08:15 PM | #4 |
creator of calibre
Posts: 43,844
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
If you wish to filter based on URL, you can implement get_article_url and return None for those articles you want skipped.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to exclude strings before and after | ElMiko | Sigil | 14 | 07-21-2012 06:34 PM |
How can I exclude all the images from NYT? | Steven630 | Recipes | 1 | 05-11-2012 08:54 AM |
Exclude some parts from build | MartinJT | Calibre | 4 | 09-15-2011 08:39 AM |
Recipe Request: CNN Expansion | der_geistmx | Recipes | 2 | 03-18-2011 01:06 AM |
Exclude files from indexing? | HansTWN | iRex | 8 | 04-20-2010 05:02 AM |