Quote:
Originally Posted by Starson17
... Otherwise, you can just use index_to_soup to grab a soup of the feed page and parse it to find what you want (e.g. search for the article title and grab the other elements/text you want once it's found).
Once you have the text, you would use preprocess_html or postprocess_html and modify the page soup.
|
Ok, spending a morning looking through this and really not making much progress. I've narrowed down what I need more information on the section quoted. I assume the parsing would go into the "for id in soup.findAll" loop below but not sure of the format, and yes I am not a python developer.
Code:
def get_feeds(self):
feeds = []
soup = self.index_to_soup('http://www.google.com/reader/api/0/tag/list')
for id in soup.findAll(True, attrs={'name':['id']}):
url = id.contents[0].replace('broadcast','reading-list')
feeds.append((re.search('/([^/]*)$', url).group(1),
self.base_url + urllib.quote(url.encode('utf-8')) + self.get_options))
return feeds
Need to parse out from the source xml:
<title type="html">Article Title</title> <-- Need this
<author>
<name>Article Author</name> <-- Need this
</author>
<source gr:stream-id="feed URL">
<id>Google ID Tag</id>
<title type="html">Feed Title</title> <-- Need this
<link rel="alternate" href="Source Link" type="text/html"/> <--Need this
</source>
Thanks,
Dave F.