Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 12-15-2010, 04:16 PM   #1
tbaac
Member
tbaac began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Feb 2009
Device: Sony PRS505
Changing article titles in recipes

I have a new Kindle 3. I have used Calibre before (with my old PRS 505) but it works so well with the Kindle. I set it to download The Guardian, and email it to my Kindle and it magically appeared on my device

I have done some searching and understand that there is no TOC on the Kindle for Periodicals. I want to keep the recipe as a periodical as that way the Kindle handles new versions. But I'd also like some indication in the article title of which feed the article came from.

The feeds in the built in recipe are:


feeds = [
('Front Page', 'http://www.guardian.co.uk/rss'),
('Business', 'http://www.guardian.co.uk/business/rss'),
('Sport', 'http://www.guardian.co.uk/sport/rss'),
('Culture', 'http://www.guardian.co.uk/culture/rss'),
('Money', 'http://www.guardian.co.uk/money/rss'),
('Life & Style', 'http://www.guardian.co.uk/lifeandstyle/rss'),
('Travel', 'http://www.guardian.co.uk/travel/rss'),
('Environment', 'http://www.guardian.co.uk/environment/rss'),
('Comment','http://www.guardian.co.uk/commentisfree/rss'),
]

And I'd like to append the feed name to the end of the article name, as an indication as to which section it came from. Is this easy to do?

I tried adding the url to the end of title in this section:

yield {
'title': title, 'url':url, 'description':desc,
'date' : strftime('%a, %d %b'),
}

But perhaps didn't do it right and perhaps its the wrong section

So has anyone tried this in recipes, do they have a better suggestion or is it a silly idea?
I was trying to use the url, but the name of the feed would be better if possible.

Thank you.
tbaac is offline   Reply With Quote
Old 12-16-2010, 02:31 PM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by tbaac View Post
I'd also like some indication in the article title of which feed the article came from....

So has anyone tried this in recipes, do they have a better suggestion or is it a silly idea?
There are two places you could modify the article title by inserting the feed title. You could do this in parse_feeds or in populate_article_metadata. Take a look at this post:
https://www.mobileread.com/forums/sho...62&postcount=6
It will show you how to access the article title. I'm pretty sure you can grab the feed title there as well, then concatenate it onto the article title.

You can probably also do this in populate_article_metadata, but it's not used much. Look at the API for info on it.
Starson17 is offline   Reply With Quote
Old 12-18-2010, 08:20 PM   #3
tbaac
Member
tbaac began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Feb 2009
Device: Sony PRS505
Thanks for that Starson17.

So you're saying that within parse_feeds I should be able to retrieve and then set the value of article.title?

Hopefully there is a variable feed.title which contains the feed name to add to the article title.

I won't get to my laptop until Monday but I'll give it a go then. Thanks again :-)
tbaac is offline   Reply With Quote
Old 12-21-2010, 09:40 AM   #4
tbaac
Member
tbaac began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Feb 2009
Device: Sony PRS505
Thanks for that. I ended up with this:

def parse_feeds (self):
feeds = BasicNewsRecipe.parse_feeds(self)
for feed in feeds:
for article in feed.articles[:]:
feedps = feed.title + ' '
newtitle = feedps + article.title
article.title = newtitle
print 'New article title is: ', article.title
return feeds

Edit: The indenting looks correct in the above code segment but not when its shown in the post...

Unfortunately it seems to have no effect and the print line doesn't seem to get run.

I've noticed that in the existing Guardian recipe and in the Wikileaks recipe that someone nicely posted I get an error:

Parsing feed_1/index.html ...
Initial parse failed:
Traceback (most recent call last):
File "/usr/lib/calibre/calibre/ebooks/oeb/base.py", line 816, in first_pass
data = etree.fromstring(data, parser=parser)
File "lxml.etree.pyx", line 2532, in lxml.etree.fromstring (src/lxml/lxml.etree.c:48634)
File "parser.pxi", line 1545, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:72245)
File "parser.pxi", line 1417, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:71041)
File "parser.pxi", line 898, in lxml.etree._BaseParser._parseUnicodeDoc (src/lxml/lxml.etree.c:67581)
File "parser.pxi", line 539, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:64257)
File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:65178)
File "parser.pxi", line 565, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64521)
XMLSyntaxError: Opening and ending tag mismatch: hr line 38 and div, line 39, column 7

I'm running the latest version on Ubuntu linux. Anyone had this error and know a solution?
The recipes still create output.

Thanks.

Last edited by tbaac; 12-21-2010 at 09:42 AM.
tbaac is offline   Reply With Quote
Old 12-21-2010, 10:44 AM   #5
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by tbaac View Post
Thanks for that. I ended up with this:

Code:
    def parse_feeds (self): 
          feeds = BasicNewsRecipe.parse_feeds(self) 
          for feed in feeds:
              for article in feed.articles[:]:
                 feedps = feed.title + ' '
                 newtitle = feedps + article.title
                 article.title = newtitle
                 print 'New article title is: ', article.title
          return feeds
Edit: The indenting looks correct in the above code segment but not when its shown in the post...
It's correct now. Use the hash/pound mark to apply CODE tags around and preserve indenting.
Quote:

Unfortunately it seems to have no effect and the print line doesn't seem to get run.
I pasted your code into a working recipe and it worked perfectly. The article titles had the feed title prepended, and the print worked.

Quote:
I've noticed that in the existing Guardian recipe and in the Wikileaks recipe that someone nicely posted I get an error:
Quote:
XMLSyntaxError: Opening and ending tag mismatch: hr line 38 and div, line 39, column 7
I'm running the latest version on Ubuntu linux. Anyone had this error and know a solution?
The recipes still create output.
It looks like a bad page with bad html, not a recipe error.

Last edited by Starson17; 12-21-2010 at 10:46 AM.
Starson17 is offline   Reply With Quote
Old 12-21-2010, 10:51 AM   #6
tbaac
Member
tbaac began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Feb 2009
Device: Sony PRS505
Thanks for the swift reply Starson17.

I agree (with my limited experience) that it appeared to not be a recipe problem because it happens with the built in Guardian recipe and with the Wikileaks recipe posted in this forum. It just seemed to be something which in my case was preventing the added code from working.
I'd never noticed the error before, to see it I had to look at job details.

When you say "page with bad html" you mean that the html from the RSS feed is bad and that there's nothing that I can do about it?
tbaac is offline   Reply With Quote
Old 12-21-2010, 11:27 AM   #7
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by tbaac View Post
When you say "page with bad html" you mean that the html from the RSS feed is bad and that there's nothing that I can do about it?
That is what I meant. I've seen that error before and ignored it. I can't tell you if what I meant is in fact correct. It was just a guess and without doing some tests, I don't really know if my guess is correct. Feel free to test more and report

So why did your code not work for you?
Starson17 is offline   Reply With Quote
Old 12-22-2010, 10:53 AM   #8
tbaac
Member
tbaac began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Feb 2009
Device: Sony PRS505
Hi again Starson17. I was assuming that as the error referred to a problem parsing and the code that I added related to parsing, it had fallen over after retrieving all the articles but prior to doing any parsing. Which recipe did you try the code with if you don't mind me asking? I had the error with the base Guardian recipe, although it did not cause any problem with the retrieval.

I also noticed that the error seems to mention the "div" tag and the div tag is mentioned in the "remove_tags" part of the recipe so I wondered if it was a slight (but usually non problematic) problem with the recipe?

I also had a look at populate_article_metadata but couldn't see if I'd have access to the feed name at that time?

Thanks again.
tbaac is offline   Reply With Quote
Old 12-22-2010, 12:03 PM   #9
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by tbaac View Post
Which recipe did you try the code with if you don't mind me asking?
I keep a test recipe and batch file all set up to paste code into when trying to give assistance. I just pasted your code into the end of whatever was already in that recipe. It happened to be SkepticBlog. I knew it worked before pasting in your code. Feel free to test it yourself.
Code:
#!/usr/bin/env  python
__license__   = 'GPL v3'
import re
from calibre.web.feeds.news import BasicNewsRecipe

class SkepticBlog(BasicNewsRecipe):
    oldest_article        = 5
    max_articles_per_feed = 15
    no_stylesheets        = True
    use_embedded_content  = False
    encoding              = 'utf-8'
    publisher             = 'Skeptic Magazine'
    category              = 'science, pseudoscience'

    def get_browser(self):
        br = BasicNewsRecipe.get_browser(self)
        br.addheaders = [('Accept', 'text/html')]
        return br

    feeds = [(u'SkepticBlog', u'http://skepticblog.org/feed')]

    def parse_feeds (self): 
          feeds = BasicNewsRecipe.parse_feeds(self) 
          for feed in feeds:
              for article in feed.articles[:]:
                 print 'New1 article title is: ', article.title
                 feedps = feed.title + ' '
                 newtitle = feedps + article.title
                 article.title = newtitle
                 print 'New2 article title is: ', article.title
          return feeds
Starson17 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
CNN article on Changing/Future Libraries kennyc News 0 09-04-2009 02:32 PM
WSJ Article on Ebooks Changing the way we read and write robynebr News 0 04-22-2009 03:16 PM
Changing Book Titles in the Library MickeyC Sony Reader 3 06-15-2008 07:37 AM
Changing titles on threads Strether Upload Help 3 04-20-2008 09:22 PM


All times are GMT -4. The time now is 04:11 AM.


MobileRead.com is a privately owned, operated and funded community.