![]() |
#1 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Oct 2011
Device: Kindle
|
Change article display format
Hi all,
I'm trying to modify the Google Reader uber recipe. I am specifically trying to address the following issues. Issue #1 Reverse the order of articles so that oldest is first. (Done) "reverse_article_order = True" attribute to the GoogleReaderUber(BasicNewsRecipe) class. Issue #2 Reformat the article display: From Article Title ToContent Feed Title -Author Article Title Content Source Link Issue #2 is the area I need help in. The feed from http://www.google.com/reader/atom/ includes the tags I need I'm just not sure how to get Calibre to reformat the articles. The included tags in the feed are: <title type="html">Article Title</title> <author> <name>Article Author</name> </author> <source gr:stream-id="feed URL"> <id>Google ID Tag</id> <title type="html">Feed Title</title> <link rel="alternate" href="Source Link" type="text/html"/> </source> Any help on this is GREATLY appreciated. Thanks, Dave F. |
![]() |
![]() |
![]() |
#2 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
If the text you want on the article is appearing in your finished ebook on the page that links to the article, then calibre has already found it, and you could use populate_article_metadata to access it. Otherwise, you can just use index_to_soup to grab a soup of the feed page and parse it to find what you want (e.g. search for the article title and grab the other elements/text you want once it's found). Once you have the text, you would use preprocess_html or postprocess_html and modify the page soup. If you don't know what a "soup" is, it's just html from the page, but made accessible in a database with BeautifulSoup. |
|
![]() |
![]() |
![]() |
#3 | |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Oct 2011
Device: Kindle
|
Quote:
Your recommendation sounds like what I need so I'll run back to my corner and do some research on the functions you listed and see what I can horribly mangle. Thank you very much for the feedback. DaveF |
|
![]() |
![]() |
![]() |
#4 | |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Oct 2011
Device: Kindle
|
Quote:
Code:
def get_feeds(self): feeds = [] soup = self.index_to_soup('http://www.google.com/reader/api/0/tag/list') for id in soup.findAll(True, attrs={'name':['id']}): url = id.contents[0].replace('broadcast','reading-list') feeds.append((re.search('/([^/]*)$', url).group(1), self.base_url + urllib.quote(url.encode('utf-8')) + self.get_options)) return feeds <title type="html">Article Title</title> <-- Need this <author> <name>Article Author</name> <-- Need this </author> <source gr:stream-id="feed URL"> <id>Google ID Tag</id> </source><title type="html">Feed Title</title> <-- Need this <link rel="alternate" href="Source Link" type="text/html"/> <--Need this Thanks, Dave F. |
|
![]() |
![]() |
![]() |
#5 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
The other option was to do it at the article stage. You quoted my "Otherwise" which was to do it at the article stage, so you're not at the right point. You want to be in preprocess_html which works on the articles as they are fetched. To do it there: Basically, after the article has been fetched, you can modify it, either before it's processed or after (using pre or postprocess_html). I was thinking you would regrab the RSS feed page (yes, at this point it's already been processed, the articles have been identified, etc. but that's OK). You are just going to grab the RSS feed page again (you'd do it multiple times, once for each article) and grab some parts from it. So how do you do this? I was thinking - at the pre/post process stage you know the Article Title. It's part of the "soup" of the article page. (You need to use BS to find it there so you can use it) You want something from the feed page. That "something" is associated with the matching Article Title on the feed page, so while you are in preprocess_html (or postprocess - it doesn't matter) you use index_to_soup to grab a second soup - the soup of the feed page. As I posted, you would "parse it (the second soup form the feed page) to find what you want (e.g. search for the article title and grab the other elements/text you want once it's found)." It would basically be a loop that looks through the feed page for the article title tag that matches the current article being worked on in pre/postprocess_html, then grabs whatever you need from that the second soup (the RSS feed page) that you need for the current article. Then use BeautifulSoup to stick it into the first soup (the article being worked on). Last edited by Starson17; 10-10-2011 at 04:04 PM. |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
PRS-650 how to change display cover on 650? | wlwbishop | Sony Reader | 12 | 10-26-2010 07:06 PM |
Great Article about New Display Technologies in IEEE Spectrum | kennyc | News | 4 | 04-10-2010 08:53 PM |
Nice article on the Mirasol color display technology | Daithi | News | 9 | 10-22-2009 10:44 AM |
Change display of titles | hippy1948 | Workshop | 2 | 01-25-2009 04:19 PM |
Dual display navigation - New Scientist article | ePossum | News | 37 | 06-30-2008 04:42 AM |