Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 10-09-2011, 03:18 PM   #1
badhaggis
Junior Member
badhaggis began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Oct 2011
Device: Kindle
Change article display format

Hi all,

I'm trying to modify the Google Reader uber recipe. I am specifically trying to address the following issues.

Issue #1
Reverse the order of articles so that oldest is first. (Done)
"reverse_article_order = True" attribute to the GoogleReaderUber(BasicNewsRecipe) class.

Issue #2
Reformat the article display:
From
Article Title
Content
To
Feed Title
Author
Article Title
Content
Source Link

-
Issue #2 is the area I need help in. The feed from http://www.google.com/reader/atom/ includes the tags I need I'm just not sure how to get Calibre to reformat the articles.

The included tags in the feed are:

<title type="html">Article Title</title>

<author>
<name>Article Author</name>
</author>

<source gr:stream-id="feed URL">
<id>Google ID Tag</id>
<title type="html">Feed Title</title>
<link rel="alternate" href="Source Link" type="text/html"/>
</source>

Any help on this is GREATLY appreciated.

Thanks,
Dave F.
badhaggis is offline   Reply With Quote
Old 10-10-2011, 09:29 AM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by badhaggis View Post
Hi all,

I'm trying to modify the Google Reader uber recipe.
Issue #2 is the area I need help in. The feed from http://www.google.com/reader/atom/ includes the tags I need I'm just not sure how to get Calibre to reformat the articles.
It looks like you want to add text to the article page and the text is available from the RSS feed? If that's right, then there are two parts to do what you want - 1) how to get the text you want to add, and 2) how to put it on the article page.

If the text you want on the article is appearing in your finished ebook on the page that links to the article, then calibre has already found it, and you could use populate_article_metadata to access it. Otherwise, you can just use index_to_soup to grab a soup of the feed page and parse it to find what you want (e.g. search for the article title and grab the other elements/text you want once it's found).

Once you have the text, you would use preprocess_html or postprocess_html and modify the page soup.

If you don't know what a "soup" is, it's just html from the page, but made accessible in a database with BeautifulSoup.
Starson17 is offline   Reply With Quote
Old 10-10-2011, 11:09 AM   #3
badhaggis
Junior Member
badhaggis began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Oct 2011
Device: Kindle
Quote:
Originally Posted by Starson17 View Post
It looks like you want to add text to the article page and the text is available from the RSS feed? If that's right, then there are two parts to do what you want - 1) how to get the text you want to add, and 2) how to put it on the article page.
Yes, The informtion is availble in in the feed but not currently displayed as part of the final product. So, I want to pull the information from the feed and get it placed in the article.

Your recommendation sounds like what I need so I'll run back to my corner and do some research on the functions you listed and see what I can horribly mangle.

Thank you very much for the feedback.

DaveF
badhaggis is offline   Reply With Quote
Old 10-10-2011, 03:37 PM   #4
badhaggis
Junior Member
badhaggis began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Oct 2011
Device: Kindle
Quote:
Originally Posted by Starson17 View Post
... Otherwise, you can just use index_to_soup to grab a soup of the feed page and parse it to find what you want (e.g. search for the article title and grab the other elements/text you want once it's found).

Once you have the text, you would use preprocess_html or postprocess_html and modify the page soup.
Ok, spending a morning looking through this and really not making much progress. I've narrowed down what I need more information on the section quoted. I assume the parsing would go into the "for id in soup.findAll" loop below but not sure of the format, and yes I am not a python developer.

Code:
    def get_feeds(self):
        feeds = []
        soup = self.index_to_soup('http://www.google.com/reader/api/0/tag/list')
        for id in soup.findAll(True, attrs={'name':['id']}):
            url = id.contents[0].replace('broadcast','reading-list')
            feeds.append((re.search('/([^/]*)$', url).group(1),
                          self.base_url + urllib.quote(url.encode('utf-8')) + self.get_options))
        return feeds
Need to parse out from the source xml:
<title type="html">Article Title</title> <-- Need this

<author>
<name>Article Author</name> <-- Need this
</author>

<source gr:stream-id="feed URL">
<id>Google ID Tag</id>
<title type="html">Feed Title</title> <-- Need this
<link rel="alternate" href="Source Link" type="text/html"/> <--Need this
</source>

Thanks,
Dave F.
badhaggis is offline   Reply With Quote
Old 10-10-2011, 04:00 PM   #5
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by badhaggis View Post
Ok, spending a morning looking through this and really not making much progress. I've narrowed down what I need more information on the section quoted. I assume the parsing would go into the "for id in soup.findAll" loop below
I was thinking of two options. One was in get_feeds. You'd grab what you needed while the feeds were being worked on.

The other option was to do it at the article stage. You quoted my "Otherwise" which was to do it at the article stage, so you're not at the right point. You want to be in preprocess_html which works on the articles as they are fetched.

To do it there: Basically, after the article has been fetched, you can modify it, either before it's processed or after (using pre or postprocess_html). I was thinking you would regrab the RSS feed page (yes, at this point it's already been processed, the articles have been identified, etc. but that's OK).

You are just going to grab the RSS feed page again (you'd do it multiple times, once for each article) and grab some parts from it. So how do you do this? I was thinking - at the pre/post process stage you know the Article Title. It's part of the "soup" of the article page. (You need to use BS to find it there so you can use it) You want something from the feed page. That "something" is associated with the matching Article Title on the feed page, so while you are in preprocess_html (or postprocess - it doesn't matter) you use index_to_soup to grab a second soup - the soup of the feed page. As I posted, you would "parse it (the second soup form the feed page) to find what you want (e.g. search for the article title and grab the other elements/text you want once it's found)."

It would basically be a loop that looks through the feed page for the article title tag that matches the current article being worked on in pre/postprocess_html, then grabs whatever you need from that the second soup (the RSS feed page) that you need for the current article. Then use BeautifulSoup to stick it into the first soup (the article being worked on).

Last edited by Starson17; 10-10-2011 at 04:04 PM.
Starson17 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PRS-650 how to change display cover on 650? wlwbishop Sony Reader 12 10-26-2010 07:06 PM
Great Article about New Display Technologies in IEEE Spectrum kennyc News 4 04-10-2010 08:53 PM
Nice article on the Mirasol color display technology Daithi News 9 10-22-2009 10:44 AM
Change display of titles hippy1948 Workshop 2 01-25-2009 04:19 PM
Dual display navigation - New Scientist article ePossum News 37 06-30-2008 04:42 AM


All times are GMT -4. The time now is 08:20 AM.


MobileRead.com is a privately owned, operated and funded community.