Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 07-02-2011, 11:27 AM   #1
UnWeave
Junior Member
UnWeave began at the beginning.
 
UnWeave's Avatar
 
Posts: 4
Karma: 12
Join Date: Jun 2011
Device: none
[Solved] Engadget recipe - full article text

Is there a way to alter the engadget recipe such that it downloads the entire article?

E.g. for the recent Windows Phone 7.5 Preview, the recipe only gives the first paragraph, and then a link. This is fine for short articles, but having to load up the browser to read the entirety of the longer ones seems (to me) to undermine the point of saving to an eBook format in the first place.

I don't mind having a go myself (I have some experience with Python), but I don't know where to start. I have looked at other recipes, and the API docs, but the few things I thought might work just broke it ...


Thanks!

Last edited by UnWeave; 07-03-2011 at 09:27 AM. Reason: Clarification
UnWeave is offline   Reply With Quote
Old 07-02-2011, 11:37 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,987
Karma: 5036765
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You'll need to set use_embedded_content=False then add code to cleanup the article html
kovidgoyal is offline   Reply With Quote
 
Enthusiast
Old 07-02-2011, 08:56 PM   #3
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by UnWeave View Post
Is there a way to alter the engadget recipe such that it downloads the entire article?
Here is a new recipe that gets the entire article. The previous recipe just pulled the RSS feed summary and didn't retrieve articles. For most articles, that was enough, but scattered here and there were longer articles with the feed including only the first paragraph.

Kovid - This can probably replace the existing Engadget, but I've named it Engadget_Full
Spoiler:
Code:
#!/usr/bin/env  python

__license__   = 'GPL v3'
__copyright__ = 'Copyright 2011 Starson17'
'''
engadget.com
'''

from calibre.web.feeds.news import BasicNewsRecipe

class Engadget(BasicNewsRecipe):
    title                 = u'Engadget_Full'
    __author__            = 'Starson17'
    __version__           = 'v1.00'
    __date__              = '02, July 2011'
    description           = 'Tech news'
    language              = 'en'
    oldest_article        = 7
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    remove_javascript     = True
    remove_empty_feeds    = True

    keep_only_tags = [dict(name='div', attrs={'class':['post_content permalink ','post_content permalink alt-post-full']})]
    remove_tags = [dict(name='div', attrs={'class':['filed_under','post_footer']})]
    remove_tags_after =  [dict(name='div', attrs={'class':['post_footer']})]
    
    feeds = [(u'Posts', u'http://www.engadget.com/rss.xml')]

    extra_css = '''
                    h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
                    h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;}
                    p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
                    body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
		'''
Starson17 is offline   Reply With Quote
Old 07-03-2011, 09:26 AM   #4
UnWeave
Junior Member
UnWeave began at the beginning.
 
UnWeave's Avatar
 
Posts: 4
Karma: 12
Join Date: Jun 2011
Device: none
Awesome! Thank you for the quick (and very helpful) responses. Might have a go at writing a couple of my own in light of this.
UnWeave is offline   Reply With Quote
Old 07-03-2011, 09:50 AM   #5
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by UnWeave View Post
Awesome! Thank you for the quick (and very helpful) responses. Might have a go at writing a couple of my own in light of this.
My wife reads Engadget, and she'd occasionally comment that the recipe-created ebook seemed to have short summaries instead of full articles, but when I looked, I'd always find that the articles were just as short as the summary in the feed.

I never saw the longer articles, and I didn't realize that the current recipe pulled the feed summaries, not the articles, so I never found anything to fix.

I saw your post, and tried to find the article you referenced, but Engadget changes so quickly, the post had already scrolled off. (There was a sublink to that article in an article on their podcast, but no RSS direct link) I almost posted that you were seeing all there was to see, but I had my wife review your post, and she said "He's right!" It wasn't until I looked at the recipe and read Kovid's post that I realized it would never find a long article - it just grabbed summaries. I had to hunt through a dozen RSS feed links to find a long article, but from there it was fairly easy to write the recipe.

Let me know if you ever find anything missing. A few of the articles were formatted oddly. I fixed those in the recipe, but there could be some more from time to time that it won't handle correctly.
Starson17 is offline   Reply With Quote
Old 07-03-2011, 11:01 PM   #6
UnWeave
Junior Member
UnWeave began at the beginning.
 
UnWeave's Avatar
 
Posts: 4
Karma: 12
Join Date: Jun 2011
Device: none
I think that the article was probably gone from the feed before I posted - it was the only long article from them I could think of off the top of my head.

And thanks, again - it seems to be working perfectly so far, but if I do notice any problems with it I will let you know (and also offer a fix, if I can work one out).
UnWeave is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Wall Street Journal, WSJ, Free version, recipe improvement for full text of all ar winterescape Recipes 16 02-07-2011 01:51 PM
Engadget article on the 1.1 update boswd Nook Color & Nook Tablet 0 01-27-2011 05:59 PM
Decorate article headings as hyperlinks to full article? tomsem Recipes 5 10-15-2010 08:30 PM
Classic Full review at Engadget now Mac Jones Barnes & Noble NOOK 13 12-07-2009 08:49 PM
Engadget Article on Sony Reader Fain Sony Reader 7 08-26-2007 12:44 AM


All times are GMT -4. The time now is 08:29 PM.


MobileRead.com is a privately owned, operated and funded community.