12-15-2010, 04:16 PM | #1 |
Member
Posts: 11
Karma: 10
Join Date: Feb 2009
Device: Sony PRS505
|
Changing article titles in recipes
I have a new Kindle 3. I have used Calibre before (with my old PRS 505) but it works so well with the Kindle. I set it to download The Guardian, and email it to my Kindle and it magically appeared on my device
I have done some searching and understand that there is no TOC on the Kindle for Periodicals. I want to keep the recipe as a periodical as that way the Kindle handles new versions. But I'd also like some indication in the article title of which feed the article came from. The feeds in the built in recipe are: feeds = [ ('Front Page', 'http://www.guardian.co.uk/rss'), ('Business', 'http://www.guardian.co.uk/business/rss'), ('Sport', 'http://www.guardian.co.uk/sport/rss'), ('Culture', 'http://www.guardian.co.uk/culture/rss'), ('Money', 'http://www.guardian.co.uk/money/rss'), ('Life & Style', 'http://www.guardian.co.uk/lifeandstyle/rss'), ('Travel', 'http://www.guardian.co.uk/travel/rss'), ('Environment', 'http://www.guardian.co.uk/environment/rss'), ('Comment','http://www.guardian.co.uk/commentisfree/rss'), ] And I'd like to append the feed name to the end of the article name, as an indication as to which section it came from. Is this easy to do? I tried adding the url to the end of title in this section: yield { 'title': title, 'url':url, 'description':desc, 'date' : strftime('%a, %d %b'), } But perhaps didn't do it right and perhaps its the wrong section So has anyone tried this in recipes, do they have a better suggestion or is it a silly idea? I was trying to use the url, but the name of the feed would be better if possible. Thank you. |
12-16-2010, 02:31 PM | #2 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
https://www.mobileread.com/forums/sho...62&postcount=6 It will show you how to access the article title. I'm pretty sure you can grab the feed title there as well, then concatenate it onto the article title. You can probably also do this in populate_article_metadata, but it's not used much. Look at the API for info on it. |
|
Advert | |
|
12-18-2010, 08:20 PM | #3 |
Member
Posts: 11
Karma: 10
Join Date: Feb 2009
Device: Sony PRS505
|
Thanks for that Starson17.
So you're saying that within parse_feeds I should be able to retrieve and then set the value of article.title? Hopefully there is a variable feed.title which contains the feed name to add to the article title. I won't get to my laptop until Monday but I'll give it a go then. Thanks again :-) |
12-21-2010, 09:40 AM | #4 |
Member
Posts: 11
Karma: 10
Join Date: Feb 2009
Device: Sony PRS505
|
Thanks for that. I ended up with this:
def parse_feeds (self): feeds = BasicNewsRecipe.parse_feeds(self) for feed in feeds: for article in feed.articles[:]: feedps = feed.title + ' ' newtitle = feedps + article.title article.title = newtitle print 'New article title is: ', article.title return feeds Edit: The indenting looks correct in the above code segment but not when its shown in the post... Unfortunately it seems to have no effect and the print line doesn't seem to get run. I've noticed that in the existing Guardian recipe and in the Wikileaks recipe that someone nicely posted I get an error: Parsing feed_1/index.html ... Initial parse failed: Traceback (most recent call last): File "/usr/lib/calibre/calibre/ebooks/oeb/base.py", line 816, in first_pass data = etree.fromstring(data, parser=parser) File "lxml.etree.pyx", line 2532, in lxml.etree.fromstring (src/lxml/lxml.etree.c:48634) File "parser.pxi", line 1545, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:72245) File "parser.pxi", line 1417, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:71041) File "parser.pxi", line 898, in lxml.etree._BaseParser._parseUnicodeDoc (src/lxml/lxml.etree.c:67581) File "parser.pxi", line 539, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:64257) File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:65178) File "parser.pxi", line 565, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64521) XMLSyntaxError: Opening and ending tag mismatch: hr line 38 and div, line 39, column 7 I'm running the latest version on Ubuntu linux. Anyone had this error and know a solution? The recipes still create output. Thanks. Last edited by tbaac; 12-21-2010 at 09:42 AM. |
12-21-2010, 10:44 AM | #5 | ||||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
Quote:
Last edited by Starson17; 12-21-2010 at 10:46 AM. |
||||
Advert | |
|
12-21-2010, 10:51 AM | #6 |
Member
Posts: 11
Karma: 10
Join Date: Feb 2009
Device: Sony PRS505
|
Thanks for the swift reply Starson17.
I agree (with my limited experience) that it appeared to not be a recipe problem because it happens with the built in Guardian recipe and with the Wikileaks recipe posted in this forum. It just seemed to be something which in my case was preventing the added code from working. I'd never noticed the error before, to see it I had to look at job details. When you say "page with bad html" you mean that the html from the RSS feed is bad and that there's nothing that I can do about it? |
12-21-2010, 11:27 AM | #7 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
So why did your code not work for you? |
|
12-22-2010, 10:53 AM | #8 |
Member
Posts: 11
Karma: 10
Join Date: Feb 2009
Device: Sony PRS505
|
Hi again Starson17. I was assuming that as the error referred to a problem parsing and the code that I added related to parsing, it had fallen over after retrieving all the articles but prior to doing any parsing. Which recipe did you try the code with if you don't mind me asking? I had the error with the base Guardian recipe, although it did not cause any problem with the retrieval.
I also noticed that the error seems to mention the "div" tag and the div tag is mentioned in the "remove_tags" part of the recipe so I wondered if it was a slight (but usually non problematic) problem with the recipe? I also had a look at populate_article_metadata but couldn't see if I'd have access to the feed name at that time? Thanks again. |
12-22-2010, 12:03 PM | #9 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Code:
#!/usr/bin/env python __license__ = 'GPL v3' import re from calibre.web.feeds.news import BasicNewsRecipe class SkepticBlog(BasicNewsRecipe): oldest_article = 5 max_articles_per_feed = 15 no_stylesheets = True use_embedded_content = False encoding = 'utf-8' publisher = 'Skeptic Magazine' category = 'science, pseudoscience' def get_browser(self): br = BasicNewsRecipe.get_browser(self) br.addheaders = [('Accept', 'text/html')] return br feeds = [(u'SkepticBlog', u'http://skepticblog.org/feed')] def parse_feeds (self): feeds = BasicNewsRecipe.parse_feeds(self) for feed in feeds: for article in feed.articles[:]: print 'New1 article title is: ', article.title feedps = feed.title + ' ' newtitle = feedps + article.title article.title = newtitle print 'New2 article title is: ', article.title return feeds |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
CNN article on Changing/Future Libraries | kennyc | News | 0 | 09-04-2009 02:32 PM |
WSJ Article on Ebooks Changing the way we read and write | robynebr | News | 0 | 04-22-2009 03:16 PM |
Changing Book Titles in the Library | MickeyC | Sony Reader | 3 | 06-15-2008 07:37 AM |
Changing titles on threads | Strether | Upload Help | 3 | 04-20-2008 09:22 PM |