MobileRead Forums - View Single Post

badhaggis · 10-10-2011, 03:37 PM

Quote:

Originally Posted by Starson17

... Otherwise, you can just use index_to_soup to grab a soup of the feed page and parse it to find what you want (e.g. search for the article title and grab the other elements/text you want once it's found).

Once you have the text, you would use preprocess_html or postprocess_html and modify the page soup.

Ok, spending a morning looking through this and really not making much progress. I've narrowed down what I need more information on the section quoted. I assume the parsing would go into the "for id in soup.findAll" loop below but not sure of the format, and yes I am not a python developer.

Code:

    def get_feeds(self):
        feeds = []
        soup = self.index_to_soup('http://www.google.com/reader/api/0/tag/list')
        for id in soup.findAll(True, attrs={'name':['id']}):
            url = id.contents[0].replace('broadcast','reading-list')
            feeds.append((re.search('/([^/]*)$', url).group(1),
                          self.base_url + urllib.quote(url.encode('utf-8')) + self.get_options))
        return feeds

Need to parse out from the source xml:
<title type="html">Article Title</title> <-- Need this

<author>
<name>Article Author</name> <-- Need this
</author>

<source gr:stream-id="feed URL">

<id>Google ID Tag</id>
<title type="html">Feed Title</title> <-- Need this
<link rel="alternate" href="Source Link" type="text/html"/> <--Need this

</source>

Thanks,
Dave F.