Custom recipes (archive, read-only) - Page 102

Starson17 · 03-02-2010, 08:07 AM

Quote:

Originally Posted by jrollmorton

A newbie, here. I've spent some time trying to write a recipe for my favourite Canadian journal, The Walrus: http://www.walrusmagazine.com

Following the calibre user guide, I've written recipes to generate the RSS feed from the magazine, which links to full articles elsewhere on the website. I've tried accessing the print version of the articles, but get loads of crud and ads and only page one of the multi-page articles.

I'm having fun trying this, but not having satisfactory success. Can anyone help me to get at least the full print version of these articles? Then I can try getting out the crud by modifying the CSS or whatever.

Many thanks!

I don't know if someone is going to do this for you, but if you want to do it yourself, and just get help on the steps, you may want to post your RSS feed and what you've tried so far.
I looked at the page quickly. The print articles seem to be a simple change, so:

http://www.walrusmagazine.com/articl...lighted-stage/

becomes:
http://www.walrusmagazine.com/print/...lighted-stage/

I think you just want:

Code:

classmethod BasicNewsRecipe.print_version(url)¶

    Take a url pointing to the webpage with article content and return the URL pointing to the print version of the article. 
By default does nothing. For example:

    def print_version(self, url):
        return url + '?&pagewanted=print'

You'd change the example to process url and substitute "print" for "articles" in the url.

This may work:

Code:

     def print_version(self, url):
         return url.replace('articles', 'print')

Multipage is trickier, but once you've got singlepage working, more help is available in the examples.

anabelee · 03-02-2010, 08:45 AM

Quote:

Originally Posted by kiklop74

Recipe for Diario Vasco:

thank you a lot, kiklop!!

aoitenshi · 03-02-2010, 11:16 AM

Quote:

Originally Posted by aoitenshi

Can I request for receipe for TODAY online (SG)

?

http://www.todayonline.com/RSS

Hello, I posted a request but I decided to try my hand at it by using the Basic editor for custom news source on Calibre. I followed the guide on the website but I obviously don't understand enough code to get very far...

This is what I have done:

Code:

class AdvancedUserRecipe1267281853(BasicNewsRecipe):
    title          = u'Today ONLINE'
    oldest_article = 7
    max_articles_per_feed = 100
    no_stylesheets = True


    feeds          = [(u'TODAYonline', u'http://www.todayonline.com/RSS/Todayonline'), (u'Hot News', u'http://www.todayonline.com/RSS/Hotnews'), (u'Singapore', u'http://www.todayonline.com/RSS/Singapore'), (u'World', u'http://www.todayonline.com/RSS/World'), (u'Plus', u'http://www.todayonline.com/RSS/Plus'), (u'Sports', u'http://www.todayonline.com/RSS/Sports'), (u'Business', u'http://www.todayonline.com/RSS/Business'), (u'Tech', u'http://www.todayonline.com/RSS/Tech'), (u'Health', u'http://www.todayonline.com/RSS/Health'), (u'Traveller', u'http://www.todayonline.com/RSS/Traveller'), (u'Voices', u'http://www.todayonline.com/RSS/Voices'), (u'Weekend Voices', u'http://www.todayonline.com/RSS/Weekendvoices'), (u'Comment', u'http://www.todayonline.com/RSS/Comment'), (u'Food', u'http://www.todayonline.com/RSS/Food'), (u'Style', u'http://www.todayonline.com/RSS/Style'), (u'Shop', u'http://www.todayonline.com/RSS/Shop'), (u'At Home', u'http://www.todayonline.com/RSS/Athome'), (u'Car', u'http://www.todayonline.com/RSS/Car'), (u'Silver', u'http://www.todayonline.com/RSS/Silver'), (u'Kids', u'http://www.todayonline.com/RSS/Kids'), (u'Two Days', u'http://www.todayonline.com/RSS/Twodays')]

def print_version(self, url):
    return url.replace('http://', 'http://www.todayonline.com/Print/')

Starson17 · 03-02-2010, 12:48 PM

Quote:

Originally Posted by aoitenshi

Hello, I posted a request but I decided to try my hand at it by using the Basic editor for custom news source on Calibre. I followed the guide on the website but I obviously don't understand enough code to get very far......
The website doesn't provide full RSS feeds so I try to load up the print version of linked articles. What I don't understand is that I seem to be getting the header / footer but I can't see the article itself. It's a free newspaper so all the content should load.

Why is this so?

I'm no expert, but I can help a bit.
I loaded up LiveHeaders and took a look at your feeds. They don't actually link to articles. You can see that by just clicking on the links for your feeds. There's some cookies, fancy scripting and redirecting going on.

To get the recipe to follow your feeds, you'll need to have it set up a browser session and then it can handle the cookies, etc. to get to your content.

Start here and look at the get_browser method.

I like to do a search of all recipes that use the method I'm trying to figure out and look at them first. A search of file contents for "get_browser" in the recipes folder will find them.

Edit: You're probably also going to need the get_obfuscated_article method here.

kovidgoyal · 03-02-2010, 01:09 PM

@Logseman: Not at the moment, no.

Starson17 · 03-02-2010, 02:23 PM

Quote:

Originally Posted by aoitenshi

Hello, I posted a request but I decided to try my hand at it by using the Basic editor for custom news source on Calibre. I followed the guide on the website but I obviously don't understand enough code to get very far......
The website doesn't provide full RSS feeds so I try to load up the print version of linked articles. What I don't understand is that I seem to be getting the header / footer but I can't see the article itself. It's a free newspaper so all the content should load.

Why is this so?

I became interested in this, so I decided to rewrite my comments as a new post. You can mostly ignore the one above as it relates to getting through the RSS feed, and that is not needed.

Your RSS feeds do have cookies, etc., but AFAICT, they simply send you to the same place every time. So the .../RSS/HotNews RSS feed just sends you to the .../Hotnews page. Modifying your feed addresses by removing "/RSS" will give you everything that the RSS feed does with a lot less trouble.

The destination page for each of your feeds has an article teaser, with a "Read More" link inside an <a> tag having id=moreLink.

I'd approach this as follows:

Use the parse_index method described here:

Then use soup (as described there) to grab the moreLink as your article for that feed.

jrollmorton · 03-02-2010, 02:31 PM

Thank you for your generous response re: The Walrus magazine, Starson17! Your analysis of the publication was bang on.

It is definitely the multipage problem that continues to vex me. I tried both of your options and they yield the first of several pages for each article, plus all the crud. I am thinking the replacement of 'articles' with 'print' is not working, because that action does not yield the print version of the page at all.

Any further suggestions?

Starson17 · 03-02-2010, 03:02 PM

Quote:

Originally Posted by jrollmorton

Thank you for your generous response re: The Walrus magazine, Starson17! Your analysis of the publication was bang on.

It is definitely the multipage problem that continues to vex me. I tried both of your options and they yield the first of several pages for each article, plus all the crud. I am thinking the replacement of 'articles' with 'print' is not working, because that action does not yield the print version of the page at all.

Any further suggestions?

I set up a testbed for aoitenshi's problem, but not yours. I was just about to try writing some merger of records code, but I'll give you a hand for a minute. I'm not great at recipes, but yours looked easier than aoitenshi's problem.

What happened when you tried my suggested replace?

Hold on, I'll set up a recipe testbed for you and see for myself.

Starson17 · 03-02-2010, 03:25 PM

Quote:

Originally Posted by Starson17

Hold on, I'll set up a recipe testbed for you and see for myself.

OK, I tried it, and it seemed like it worked as I expected. Here's the basic recipe I tried. It only has two things in it, the RSS feed and my suggested replacement:

Spoiler:

What's happening with your recipe? You said you tried my two suggestions, but I only gave one, the url.replace('articles', 'print') change. The other one was just the quote from the API about how print_version worked.

Edit: I can't work on the merger code because Calibre is busy doing a massive metadata update (I love the automation since Kovid let me add an option that locks author/title).

The code above seemed to pull a pretty clean print version for me, with some links, that when followed were mostly crud. Recursion to 0 would turn that off. You'll have to be more specific on what problems you're having.

jrollmorton · 03-02-2010, 07:08 PM

Quote:

Originally Posted by Starson17

OK, I tried it, and it seemed like it worked as I expected. Here's the basic recipe I tried. It only has two things in it, the RSS feed and my suggested replacement:

This worked beautifully! Nice clean text, nicely rendered images and mastheads. Couldn't ask for a better overall appearance.

I ran it a few times and encountered errors ... the fetch process could not complete after labouring away for 2.5-3 minutes.

I corrected some indentation errors (as a result of my copy/paste job). But only when I removed the first four lines of your code did I get the desired result:

Quote:

from __future__ import with_statement
__license__ = 'GPL 3'
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
__docformat__ = 'restructuredtext en'

Now I realize this might not be ideal for Kovid, whose name is stripped out of my recipe, now. I can't see errors in the above lines, but when they are in the recipe, I don't get results. If you can suggest a way to re-insert this information, I certainly will. Credit where credit is due!

So the successful recipe is:

Spoiler:

(I also changed the recursions value to '0').

Thank you for doing this, Starson17! The recipe has rendered a great version of the magazine, and it was very instructive for me to see how to import the BasicNewsRecipe and play with some of the variables. When I have a little more time, I will try to adapt some more news recipes!

Cheers!

Starson17 · 03-02-2010, 08:39 PM

Quote:

Originally Posted by jrollmorton

This worked beautifully!

I'm glad it's working for you.

mrgrossm · 03-03-2010, 10:57 AM

I didn't get any responses to my last post, so I figured I'd repost and instead of a zip file I'd copy and paste my recipes.

I created recipes for both the Detroit News and Free Press, but I can't get it right! The biggest problem is that both have a background, the News one is light enough, but the Free Press is really dark. Also both have lots of junk after the article that I don't know how to get rid of.

Can anybody help?

Free Press recipe:
class BasicUserRecipe1266601171(AutomaticNewsRecipe):
title = u'Detroit Free Press'
oldest_article = 1
max_articles_per_feed = 40

feeds = [(u'News', u'http://www.freep.com/apps/pbcs.dll/section?category=news&template=rss&mime=xml'), (u'General', u'http://www.freep.com/feeds/RSS.xml'), (u'Entertainment', u'http://www.freep.com/apps/pbcs.dll/section?category=ent&template=rss&mime=xml'), (u'Columns', u'http://www.freep.com/apps/pbcs.dll/section?category=col&template=rss&mime=xml')]

Detroit News recipe:
class BasicUserRecipe1266555833(AutomaticNewsRecipe):
title = u'Detroit News'
oldest_article = 1
max_articles_per_feed = 40

feeds = [(u'Local News', u'http://www.detnews.com/feeds/rss36.xml'), (u'Oakland', u'http://www.detnews.com/feeds/rss02.xml'), (u'Wayne', u'http://www.detnews.com/feeds/rss01.xml'), (u'Macomb', u'http://www.detnews.com/feeds/rss03.xml'), (u'Politics', u'http://www.detnews.com/feeds/rss10.xml'), (u'Editorials', u'http://www.detnews.com/feeds/rss07.xml'), (u'Nation', u'http://www.detnews.com/feeds/rss09.xml'), (u'Business', u'http://www.detnews.com/feeds/rss21.xml')]

Starson17 · 03-03-2010, 02:23 PM

Quote:

Originally Posted by mrgrossm

The biggest problem is that both have a background, the News one is light enough, but the Free Press is really dark. Also both have lots of junk after the article that I don't know how to get rid of.

Can anybody help?

Code:

no_stylesheets = True

Will kill the backgrounds.
Junk after article is usually easiest to kill with

Code:

remove_tags_after = [dict(name='div', attrs={'id':'whatever_id_firebug_says_is_last_desired_content'})]

Use Firebug to find tags for:
keep_only_tags (this may be enough)
remove_tags
remove_tags_after
remove_tags_before

This will do it for the junk in FreePress:

Code:

    keep_only_tags = [dict(name='div', attrs={'id':'article-wrapper'})]
    remove_tags = [dict(name='div', attrs={'id':'sharelinks'})]

t3d · 03-03-2010, 05:35 PM

Quote:

Originally Posted by kovidgoyal

@t3d: Cool, but if you want them included in calibre, you'll have to ping me.

Here I am again

Two new good quality recipes from our repository:
http://github.com/t3d/kalibrator/raw.../fronda.recipe
http://github.com/t3d/kalibrator/raw/master/runa.recipe

BTW, my first name is Tomasz, and not Tomaz like you have written in changelog for 0.6.41

Moreover some of recipes (including one listed above) are made by my pal nicknamed 'Mori'.

mrgrossm · 03-03-2010, 09:57 PM

Thanks so much Starson17! The Free press is perfect now! I'm working on the Detroit news one, got some of the junk eliminated. I'll keep working on it!

03-02-2010, 02:31 PM	#1522
jrollmorton Junior Member Posts: 3 Karma: 10 Join Date: Feb 2010 Location: Vancouver, BC, Canada Device: Amazon Kindle, iPhone	The Walrus magazine Thank you for your generous response re: The Walrus magazine, Starson17! Your analysis of the publication was bang on. It is definitely the multipage problem that continues to vex me. I tried both of your options and they yield the first of several pages for each article, plus all the crud. I am thinking the replacement of 'articles' with 'print' is not working, because that action does not yield the print version of the page at all. Any further suggestions? Last edited by jrollmorton; 03-02-2010 at 02:35 PM. Reason: missed information

03-03-2010, 10:57 AM	#1527
mrgrossm Junior Member Posts: 3 Karma: 10 Join Date: Feb 2010 Device: Barnes & Noble Nook, Sony 505	Detroit papers I didn't get any responses to my last post, so I figured I'd repost and instead of a zip file I'd copy and paste my recipes. I created recipes for both the Detroit News and Free Press, but I can't get it right! The biggest problem is that both have a background, the News one is light enough, but the Free Press is really dark. Also both have lots of junk after the article that I don't know how to get rid of. Can anybody help? Free Press recipe: class BasicUserRecipe1266601171(AutomaticNewsRecipe): title = u'Detroit Free Press' oldest_article = 1 max_articles_per_feed = 40 feeds = [(u'News', u'http://www.freep.com/apps/pbcs.dll/section?category=news&template=rss&mime=xml'), (u'General', u'http://www.freep.com/feeds/RSS.xml'), (u'Entertainment', u'http://www.freep.com/apps/pbcs.dll/section?category=ent&template=rss&mime=xml'), (u'Columns', u'http://www.freep.com/apps/pbcs.dll/section?category=col&template=rss&mime=xml')] Detroit News recipe: class BasicUserRecipe1266555833(AutomaticNewsRecipe): title = u'Detroit News' oldest_article = 1 max_articles_per_feed = 40 feeds = [(u'Local News', u'http://www.detnews.com/feeds/rss36.xml'), (u'Oakland', u'http://www.detnews.com/feeds/rss02.xml'), (u'Wayne', u'http://www.detnews.com/feeds/rss01.xml'), (u'Macomb', u'http://www.detnews.com/feeds/rss03.xml'), (u'Politics', u'http://www.detnews.com/feeds/rss10.xml'), (u'Editorials', u'http://www.detnews.com/feeds/rss07.xml'), (u'Nation', u'http://www.detnews.com/feeds/rss09.xml'), (u'Business', u'http://www.detnews.com/feeds/rss21.xml')]

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Custom column read ?	pchrist7	Calibre	2	10-04-2010 03:52 AM
Archive for custom screensavers	sleeplessdave	Amazon Kindle	1	07-07-2010 01:33 PM
How to back up preferences and custom recipes?	greenapple	Calibre	3	03-29-2010 06:08 AM
Donations for Custom Recipes	ddavtian	Calibre	5	01-23-2010 05:54 PM
Help understanding custom recipes	andersent	Calibre	0	12-17-2009 03:37 PM

03-02-2010, 01:09 PM	#1520
kovidgoyal creator of calibre Posts: 46,056 Karma: 29579868 Join Date: Oct 2006 Location: Mumbai, India Device: Various	@Logseman: Not at the moment, no.

03-03-2010, 09:57 PM	#1530
mrgrossm Junior Member Posts: 3 Karma: 10 Join Date: Feb 2010 Device: Barnes & Noble Nook, Sony 505	Thanks so much Starson17! The Free press is perfect now! I'm working on the Detroit news one, got some of the junk eliminated. I'll keep working on it!