|
|
#1 |
|
Junior Member
![]() Posts: 6
Karma: 10
Join Date: Oct 2010
Device: Kindle
|
Partial Feeds and Using Info from XML content
Hi,
I am not sure if this has been asked but, if so I couldn't find it. I am trying to download feeds from http://www.sciencebasedmedicine.org/, and my recipe is as follows: Code:
#!/usr/bin/env python
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Tag
class SBM(BasicNewsRecipe):
title = 'Science Based Medicine'
__author__ = 'Multiple Authors'
oldest_article = 5
max_articles_per_feed = 15
no_stylesheets = True
use_embedded_content = False
encoding = 'utf-8'
publisher = 'SBM'
category = 'science, sbm, ebm, blog'
language = 'en'
lang = 'en-US'
conversion_options = {
'tags' : category
, 'publisher' : publisher
, 'language' : lang
, 'pretty_print' : True
}
keep_only_tags = [dict(name='div', attrs={'class':'entry'})]
feeds = [(u'Science Based Medicine', u'http://www.sciencebasedmedicine.org/?feed=rss2')]
def preprocess_html(self, soup):
mtag = Tag(soup,'meta',[('http-equiv','Content-Type'),('context','text/html; charset=utf-8')])
soup.head.insert(0,mtag)
soup.html['lang'] = self.lang
return self.adeify_images(soup)
Code:
<dc:creator>Kimball Atwood</dc:creator> Code:
<div class="meta">
Published by <a href=
"http://www.sciencebasedmedicine.org/?author=6" title=
"Posts by Kimball Atwood">Kimball Atwood</a> under
.....
Any clue would be much appreciated. BuzzKill |
|
|
|
|
|
#2 | |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
If you don't like the additional stuff in the div tag, you could keep the name by keeping only the <a> tag with the "Posts by" title using this: Code:
keep_only_tags = [
dict(name='a', attrs={'title':re.compile(r'Posts by.*', re.DOTALL|re.IGNORECASE)}),
dict(name='div', attrs={'class':'entry'})
]
Code:
import re |
|
|
|
|
|
|
#3 |
|
Junior Member
![]() Posts: 6
Karma: 10
Join Date: Oct 2010
Device: Kindle
|
Starson17,
Thank you very much for the answer. That did it. I knew regular expressions could be used, but I just don't understand them yet. |
|
|
|
|
|
#4 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
When your recipe is done, you should submit it here. I enjoyed reading some of the posts. (I needed to see the page to understand your problem.)
|
|
|
|
![]() |
| Tags |
| calibre, recipe, xml |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Getting Full Content from Partial Content Feeds | thread314 | Calibre | 5 | 05-05-2012 11:49 AM |
| Read full-content feeds on iPhone Kindle App | bthoven | Apple Devices | 15 | 08-08-2010 05:11 AM |
| Is there a good way to convert partial rss to full rss feeds. | Zorz | Other formats | 5 | 05-29-2010 01:17 PM |
| A rather partial review of the 700 | akira28 | Sony Reader | 6 | 04-14-2009 06:19 AM |
| iLiad Partial screen refresh? | hansel | iRex Developer's Corner | 11 | 09-15-2008 10:51 AM |