![]() |
#1 |
Junior Member
![]() Posts: 6
Karma: 10
Join Date: Oct 2010
Device: Kindle
|
Partial Feeds and Using Info from XML content
Hi,
I am not sure if this has been asked but, if so I couldn't find it. I am trying to download feeds from http://www.sciencebasedmedicine.org/, and my recipe is as follows: Code:
#!/usr/bin/env python from calibre.web.feeds.news import BasicNewsRecipe from calibre.ebooks.BeautifulSoup import Tag class SBM(BasicNewsRecipe): title = 'Science Based Medicine' __author__ = 'Multiple Authors' oldest_article = 5 max_articles_per_feed = 15 no_stylesheets = True use_embedded_content = False encoding = 'utf-8' publisher = 'SBM' category = 'science, sbm, ebm, blog' language = 'en' lang = 'en-US' conversion_options = { 'tags' : category , 'publisher' : publisher , 'language' : lang , 'pretty_print' : True } keep_only_tags = [dict(name='div', attrs={'class':'entry'})] feeds = [(u'Science Based Medicine', u'http://www.sciencebasedmedicine.org/?feed=rss2')] def preprocess_html(self, soup): mtag = Tag(soup,'meta',[('http-equiv','Content-Type'),('context','text/html; charset=utf-8')]) soup.head.insert(0,mtag) soup.html['lang'] = self.lang return self.adeify_images(soup) Code:
<dc:creator>Kimball Atwood</dc:creator> Code:
<div class="meta"> Published by <a href= "http://www.sciencebasedmedicine.org/?author=6" title= "Posts by Kimball Atwood">Kimball Atwood</a> under ..... Any clue would be much appreciated. BuzzKill |
![]() |
![]() |
![]() |
#2 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
If you don't like the additional stuff in the div tag, you could keep the name by keeping only the <a> tag with the "Posts by" title using this: Code:
keep_only_tags = [ dict(name='a', attrs={'title':re.compile(r'Posts by.*', re.DOTALL|re.IGNORECASE)}), dict(name='div', attrs={'class':'entry'}) ] Code:
import re |
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Junior Member
![]() Posts: 6
Karma: 10
Join Date: Oct 2010
Device: Kindle
|
Starson17,
Thank you very much for the answer. That did it. I knew regular expressions could be used, but I just don't understand them yet. |
![]() |
![]() |
![]() |
#4 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
When your recipe is done, you should submit it here. I enjoyed reading some of the posts. (I needed to see the page to understand your problem.)
|
![]() |
![]() |
![]() |
Tags |
calibre, recipe, xml |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Getting Full Content from Partial Content Feeds | thread314 | Calibre | 5 | 05-05-2012 10:49 AM |
Read full-content feeds on iPhone Kindle App | bthoven | Apple Devices | 15 | 08-08-2010 04:11 AM |
Is there a good way to convert partial rss to full rss feeds. | Zorz | Other formats | 5 | 05-29-2010 12:17 PM |
A rather partial review of the 700 | akira28 | Sony Reader | 6 | 04-14-2009 05:19 AM |
iLiad Partial screen refresh? | hansel | iRex Developer's Corner | 11 | 09-15-2008 09:51 AM |