Change article display format

badhaggis · 10-09-2011, 04:18 PM

Hi all,

I'm trying to modify the Google Reader uber recipe. I am specifically trying to address the following issues.

Issue #1
Reverse the order of articles so that oldest is first. (Done)
"reverse_article_order = True" attribute to the GoogleReaderUber(BasicNewsRecipe) class.

Issue #2
Reformat the article display:
From

Article Title
Content

To

Feed Title
Author
Article Title
Content
Source Link

-
Issue #2 is the area I need help in. The feed from http://www.google.com/reader/atom/ includes the tags I need I'm just not sure how to get Calibre to reformat the articles.

The included tags in the feed are:

<title type="html">Article Title</title>

<author>
<name>Article Author</name>
</author>

<source gr:stream-id="feed URL">
<id>Google ID Tag</id>
<title type="html">Feed Title</title>
<link rel="alternate" href="Source Link" type="text/html"/>
</source>

Any help on this is GREATLY appreciated.

Thanks,
Dave F.

Starson17 · 10-10-2011, 10:29 AM

Quote:

Originally Posted by badhaggis

Hi all,

I'm trying to modify the Google Reader uber recipe.
Issue #2 is the area I need help in. The feed from http://www.google.com/reader/atom/ includes the tags I need I'm just not sure how to get Calibre to reformat the articles.

It looks like you want to add text to the article page and the text is available from the RSS feed? If that's right, then there are two parts to do what you want - 1) how to get the text you want to add, and 2) how to put it on the article page.

If the text you want on the article is appearing in your finished ebook on the page that links to the article, then calibre has already found it, and you could use populate_article_metadata to access it. Otherwise, you can just use index_to_soup to grab a soup of the feed page and parse it to find what you want (e.g. search for the article title and grab the other elements/text you want once it's found).

Once you have the text, you would use preprocess_html or postprocess_html and modify the page soup.

If you don't know what a "soup" is, it's just html from the page, but made accessible in a database with BeautifulSoup.

badhaggis · 10-10-2011, 12:09 PM

Quote:

Originally Posted by Starson17

It looks like you want to add text to the article page and the text is available from the RSS feed? If that's right, then there are two parts to do what you want - 1) how to get the text you want to add, and 2) how to put it on the article page.

Yes, The informtion is availble in in the feed but not currently displayed as part of the final product. So, I want to pull the information from the feed and get it placed in the article.

Your recommendation sounds like what I need so I'll run back to my corner and do some research on the functions you listed and see what I can horribly mangle.

Thank you very much for the feedback.

DaveF

badhaggis · 10-10-2011, 04:37 PM

Quote:

Originally Posted by Starson17

... Otherwise, you can just use index_to_soup to grab a soup of the feed page and parse it to find what you want (e.g. search for the article title and grab the other elements/text you want once it's found).

Once you have the text, you would use preprocess_html or postprocess_html and modify the page soup.

Ok, spending a morning looking through this and really not making much progress. I've narrowed down what I need more information on the section quoted. I assume the parsing would go into the "for id in soup.findAll" loop below but not sure of the format, and yes I am not a python developer.

Code:

    def get_feeds(self):
        feeds = []
        soup = self.index_to_soup('http://www.google.com/reader/api/0/tag/list')
        for id in soup.findAll(True, attrs={'name':['id']}):
            url = id.contents[0].replace('broadcast','reading-list')
            feeds.append((re.search('/([^/]*)$', url).group(1),
                          self.base_url + urllib.quote(url.encode('utf-8')) + self.get_options))
        return feeds

Need to parse out from the source xml:
<title type="html">Article Title</title> <-- Need this

<author>
<name>Article Author</name> <-- Need this
</author>

<source gr:stream-id="feed URL">

<id>Google ID Tag</id>
<title type="html">Feed Title</title> <-- Need this
<link rel="alternate" href="Source Link" type="text/html"/> <--Need this

</source>

Thanks,
Dave F.

Starson17 · 10-10-2011, 05:00 PM

Quote:

Originally Posted by badhaggis

Ok, spending a morning looking through this and really not making much progress. I've narrowed down what I need more information on the section quoted. I assume the parsing would go into the "for id in soup.findAll" loop below

I was thinking of two options. One was in get_feeds. You'd grab what you needed while the feeds were being worked on.

The other option was to do it at the article stage. You quoted my "Otherwise" which was to do it at the article stage, so you're not at the right point. You want to be in preprocess_html which works on the articles as they are fetched.

To do it there: Basically, after the article has been fetched, you can modify it, either before it's processed or after (using pre or postprocess_html). I was thinking you would regrab the RSS feed page (yes, at this point it's already been processed, the articles have been identified, etc. but that's OK).

You are just going to grab the RSS feed page again (you'd do it multiple times, once for each article) and grab some parts from it. So how do you do this? I was thinking - at the pre/post process stage you know the Article Title. It's part of the "soup" of the article page. (You need to use BS to find it there so you can use it) You want something from the feed page. That "something" is associated with the matching Article Title on the feed page, so while you are in preprocess_html (or postprocess - it doesn't matter) you use index_to_soup to grab a second soup - the soup of the feed page. As I posted, you would "parse it (the second soup form the feed page) to find what you want (e.g. search for the article title and grab the other elements/text you want once it's found)."

It would basically be a loop that looks through the feed page for the article title tag that matches the current article being worked on in pre/postprocess_html, then grabs whatever you need from that the second soup (the RSS feed page) that you need for the current article. Then use BeautifulSoup to stick it into the first soup (the article being worked on).

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
PRS-650 how to change display cover on 650?	wlwbishop	Sony Reader	12	10-26-2010 08:06 PM
Great Article about New Display Technologies in IEEE Spectrum	kennyc	News	4	04-10-2010 09:53 PM
Nice article on the Mirasol color display technology	Daithi	News	9	10-22-2009 11:44 AM
Change display of titles	hippy1948	Workshop	2	01-25-2009 05:19 PM
Dual display navigation - New Scientist article	ePossum	News	37	06-30-2008 05:42 AM

10-09-2011, 04:18 PM	#1
badhaggis Junior Member Posts: 3 Karma: 10 Join Date: Oct 2011 Device: Kindle	Change article display format Hi all, I'm trying to modify the Google Reader uber recipe. I am specifically trying to address the following issues. Issue #1 Reverse the order of articles so that oldest is first. (Done) "reverse_article_order = True" attribute to the GoogleReaderUber(BasicNewsRecipe) class. Issue #2 Reformat the article display: From Article Title Content To Feed Title Author Article Title Content Source Link - Issue #2 is the area I need help in. The feed from http://www.google.com/reader/atom/ includes the tags I need I'm just not sure how to get Calibre to reformat the articles. The included tags in the feed are: <title type="html">Article Title</title> <author> <name>Article Author</name> </author> <source gr:stream-id="feed URL"> <id>Google ID Tag</id> <title type="html">Feed Title</title> <link rel="alternate" href="Source Link" type="text/html"/> </source> Any help on this is GREATLY appreciated. Thanks, Dave F.