Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 03-02-2010, 07:07 AM   #1516
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by jrollmorton View Post
A newbie, here. I've spent some time trying to write a recipe for my favourite Canadian journal, The Walrus: http://www.walrusmagazine.com

Following the calibre user guide, I've written recipes to generate the RSS feed from the magazine, which links to full articles elsewhere on the website. I've tried accessing the print version of the articles, but get loads of crud and ads and only page one of the multi-page articles.

I'm having fun trying this, but not having satisfactory success. Can anyone help me to get at least the full print version of these articles? Then I can try getting out the crud by modifying the CSS or whatever.

Many thanks!
I don't know if someone is going to do this for you, but if you want to do it yourself, and just get help on the steps, you may want to post your RSS feed and what you've tried so far.
I looked at the page quickly. The print articles seem to be a simple change, so:

http://www.walrusmagazine.com/articl...lighted-stage/

becomes:
http://www.walrusmagazine.com/print/...lighted-stage/


I think you just want:
Code:
classmethod BasicNewsRecipe.print_version(url)¶

    Take a url pointing to the webpage with article content and return the URL pointing to the print version of the article. 
By default does nothing. For example:

    def print_version(self, url):
        return url + '?&pagewanted=print'
You'd change the example to process url and substitute "print" for "articles" in the url.

This may work:
Code:
     def print_version(self, url):
         return url.replace('articles', 'print')
Multipage is trickier, but once you've got singlepage working, more help is available in the examples.

Last edited by Starson17; 03-02-2010 at 07:29 AM.
Starson17 is offline  
Old 03-02-2010, 07:45 AM   #1517
anabelee
Zealot
anabelee will become famous soon enoughanabelee will become famous soon enoughanabelee will become famous soon enoughanabelee will become famous soon enoughanabelee will become famous soon enoughanabelee will become famous soon enough
 
anabelee's Avatar
 
Posts: 114
Karma: 583
Join Date: Dec 2009
Location: Vigo, Spain
Device: Woxter Scriba 150, pocketbook 360
Quote:
Originally Posted by kiklop74 View Post
Recipe for Diario Vasco:
thank you a lot, kiklop!!
anabelee is offline  
Old 03-02-2010, 10:16 AM   #1518
aoitenshi
Junior Member
aoitenshi began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Feb 2010
Device: Sony PRS 600
Quote:
Originally Posted by aoitenshi View Post
Can I request for receipe for TODAY online (SG) ?

http://www.todayonline.com/RSS
Hello, I posted a request but I decided to try my hand at it by using the Basic editor for custom news source on Calibre. I followed the guide on the website but I obviously don't understand enough code to get very far...

This is what I have done:
Code:
class AdvancedUserRecipe1267281853(BasicNewsRecipe):
    title          = u'Today ONLINE'
    oldest_article = 7
    max_articles_per_feed = 100
    no_stylesheets = True


    feeds          = [(u'TODAYonline', u'http://www.todayonline.com/RSS/Todayonline'), (u'Hot News', u'http://www.todayonline.com/RSS/Hotnews'), (u'Singapore', u'http://www.todayonline.com/RSS/Singapore'), (u'World', u'http://www.todayonline.com/RSS/World'), (u'Plus', u'http://www.todayonline.com/RSS/Plus'), (u'Sports', u'http://www.todayonline.com/RSS/Sports'), (u'Business', u'http://www.todayonline.com/RSS/Business'), (u'Tech', u'http://www.todayonline.com/RSS/Tech'), (u'Health', u'http://www.todayonline.com/RSS/Health'), (u'Traveller', u'http://www.todayonline.com/RSS/Traveller'), (u'Voices', u'http://www.todayonline.com/RSS/Voices'), (u'Weekend Voices', u'http://www.todayonline.com/RSS/Weekendvoices'), (u'Comment', u'http://www.todayonline.com/RSS/Comment'), (u'Food', u'http://www.todayonline.com/RSS/Food'), (u'Style', u'http://www.todayonline.com/RSS/Style'), (u'Shop', u'http://www.todayonline.com/RSS/Shop'), (u'At Home', u'http://www.todayonline.com/RSS/Athome'), (u'Car', u'http://www.todayonline.com/RSS/Car'), (u'Silver', u'http://www.todayonline.com/RSS/Silver'), (u'Kids', u'http://www.todayonline.com/RSS/Kids'), (u'Two Days', u'http://www.todayonline.com/RSS/Twodays')]

def print_version(self, url):
    return url.replace('http://', 'http://www.todayonline.com/Print/')
I was attempting to configure the following feeds

Today Online
http://www.todayonline.com/RSS/Todayonline

Hot News
http://www.todayonline.com/RSS/Hotnews

Singapore
http://www.todayonline.com/RSS/Singapore

World
http://www.todayonline.com/RSS/World

Plus
http://www.todayonline.com/RSS/Plus

Sports
http://www.todayonline.com/RSS/Sports

Business
http://www.todayonline.com/RSS/Business

Tech
http://www.todayonline.com/RSS/Tech

Health
http://www.todayonline.com/RSS/Health

Traveller
http://www.todayonline.com/RSS/Traveller

Voices
http://www.todayonline.com/RSS/Voices

Weekend Voices
http://www.todayonline.com/RSS/Weekendvoices

Comment
http://www.todayonline.com/RSS/Comment

Food
http://www.todayonline.com/RSS/Food

Style
http://www.todayonline.com/RSS/Style

Shop
http://www.todayonline.com/RSS/Shop

At Home
http://www.todayonline.com/RSS/Athome

Car
http://www.todayonline.com/RSS/Car

Silver
http://www.todayonline.com/RSS/Silver

Kids
http://www.todayonline.com/RSS/Kids

Two Days
http://www.todayonline.com/RSS/Twodays


The website doesn't provide full RSS feeds so I try to load up the print version of linked articles. What I don't understand is that I seem to be getting the header / footer but I can't see the article itself. It's a free newspaper so all the content should load.

Why is this so?
aoitenshi is offline  
Old 03-02-2010, 11:48 AM   #1519
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by aoitenshi View Post
Hello, I posted a request but I decided to try my hand at it by using the Basic editor for custom news source on Calibre. I followed the guide on the website but I obviously don't understand enough code to get very far......
The website doesn't provide full RSS feeds so I try to load up the print version of linked articles. What I don't understand is that I seem to be getting the header / footer but I can't see the article itself. It's a free newspaper so all the content should load.

Why is this so?
I'm no expert, but I can help a bit.
I loaded up LiveHeaders and took a look at your feeds. They don't actually link to articles. You can see that by just clicking on the links for your feeds. There's some cookies, fancy scripting and redirecting going on.

To get the recipe to follow your feeds, you'll need to have it set up a browser session and then it can handle the cookies, etc. to get to your content.

Start here and look at the get_browser method.

I like to do a search of all recipes that use the method I'm trying to figure out and look at them first. A search of file contents for "get_browser" in the recipes folder will find them.

Edit: You're probably also going to need the get_obfuscated_article method here.

Last edited by Starson17; 03-02-2010 at 12:33 PM.
Starson17 is offline  
Old 03-02-2010, 12:09 PM   #1520
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,410
Karma: 27757236
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
@Logseman: Not at the moment, no.
kovidgoyal is offline  
Old 03-02-2010, 01:23 PM   #1521
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by aoitenshi View Post
Hello, I posted a request but I decided to try my hand at it by using the Basic editor for custom news source on Calibre. I followed the guide on the website but I obviously don't understand enough code to get very far......
The website doesn't provide full RSS feeds so I try to load up the print version of linked articles. What I don't understand is that I seem to be getting the header / footer but I can't see the article itself. It's a free newspaper so all the content should load.

Why is this so?
I became interested in this, so I decided to rewrite my comments as a new post. You can mostly ignore the one above as it relates to getting through the RSS feed, and that is not needed.

Your RSS feeds do have cookies, etc., but AFAICT, they simply send you to the same place every time. So the .../RSS/HotNews RSS feed just sends you to the .../Hotnews page. Modifying your feed addresses by removing "/RSS" will give you everything that the RSS feed does with a lot less trouble.

The destination page for each of your feeds has an article teaser, with a "Read More" link inside an <a> tag having id=moreLink.

I'd approach this as follows:

Use the parse_index method described here:

Then use soup (as described there) to grab the moreLink as your article for that feed.
Starson17 is offline  
Old 03-02-2010, 01:31 PM   #1522
jrollmorton
Junior Member
jrollmorton began at the beginning.
 
jrollmorton's Avatar
 
Posts: 3
Karma: 10
Join Date: Feb 2010
Location: Vancouver, BC, Canada
Device: Amazon Kindle, iPhone
The Walrus magazine

Thank you for your generous response re: The Walrus magazine, Starson17! Your analysis of the publication was bang on.

It is definitely the multipage problem that continues to vex me. I tried both of your options and they yield the first of several pages for each article, plus all the crud. I am thinking the replacement of 'articles' with 'print' is not working, because that action does not yield the print version of the page at all.

Any further suggestions?

Last edited by jrollmorton; 03-02-2010 at 01:35 PM. Reason: missed information
jrollmorton is offline  
Old 03-02-2010, 02:02 PM   #1523
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by jrollmorton View Post
Thank you for your generous response re: The Walrus magazine, Starson17! Your analysis of the publication was bang on.

It is definitely the multipage problem that continues to vex me. I tried both of your options and they yield the first of several pages for each article, plus all the crud. I am thinking the replacement of 'articles' with 'print' is not working, because that action does not yield the print version of the page at all.

Any further suggestions?
I set up a testbed for aoitenshi's problem, but not yours. I was just about to try writing some merger of records code, but I'll give you a hand for a minute. I'm not great at recipes, but yours looked easier than aoitenshi's problem.

What happened when you tried my suggested replace?

Hold on, I'll set up a recipe testbed for you and see for myself.
Starson17 is offline  
Old 03-02-2010, 02:25 PM   #1524
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Starson17 View Post
Hold on, I'll set up a recipe testbed for you and see for myself.
OK, I tried it, and it seemed like it worked as I expected. Here's the basic recipe I tried. It only has two things in it, the RSS feed and my suggested replacement:
Spoiler:
from __future__ import with_statement
__license__ = 'GPL 3'
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
__docformat__ = 'restructuredtext en'

from calibre.web.feeds.news import BasicNewsRecipe

class Walrus(BasicNewsRecipe):
title = u'The_Walrus'
__author__ = 'Starson17'
description = 'The_Walrus'
language = 'en'
use_embedded_content = False
no_stylesheets = True
remove_javascript = True
recursions = 1
oldest_article = 90

def print_version(self, url):
return url.replace('articles', 'print')

feeds = [
(u'Main', u'http://feeds.feedburner.com/WalrusFeatureArticles')
]


What's happening with your recipe? You said you tried my two suggestions, but I only gave one, the url.replace('articles', 'print') change. The other one was just the quote from the API about how print_version worked.

Edit: I can't work on the merger code because Calibre is busy doing a massive metadata update (I love the automation since Kovid let me add an option that locks author/title).

The code above seemed to pull a pretty clean print version for me, with some links, that when followed were mostly crud. Recursion to 0 would turn that off. You'll have to be more specific on what problems you're having.

Last edited by Starson17; 03-02-2010 at 02:52 PM.
Starson17 is offline  
Old 03-02-2010, 06:08 PM   #1525
jrollmorton
Junior Member
jrollmorton began at the beginning.
 
jrollmorton's Avatar
 
Posts: 3
Karma: 10
Join Date: Feb 2010
Location: Vancouver, BC, Canada
Device: Amazon Kindle, iPhone
The Walrus magazine - recipe

Quote:
Originally Posted by Starson17 View Post
OK, I tried it, and it seemed like it worked as I expected. Here's the basic recipe I tried. It only has two things in it, the RSS feed and my suggested replacement:
This worked beautifully! Nice clean text, nicely rendered images and mastheads. Couldn't ask for a better overall appearance.

I ran it a few times and encountered errors ... the fetch process could not complete after labouring away for 2.5-3 minutes.

I corrected some indentation errors (as a result of my copy/paste job). But only when I removed the first four lines of your code did I get the desired result:

Quote:
from __future__ import with_statement
__license__ = 'GPL 3'
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
__docformat__ = 'restructuredtext en'
Now I realize this might not be ideal for Kovid, whose name is stripped out of my recipe, now. I can't see errors in the above lines, but when they are in the recipe, I don't get results. If you can suggest a way to re-insert this information, I certainly will. Credit where credit is due!

So the successful recipe is:

Spoiler:
from calibre.web.feeds.news import BasicNewsRecipe

class Walrus(BasicNewsRecipe):
title = u'The_Walrus'
__author__ = 'Starson17'
description = 'The_Walrus'
language = 'en'
use_embedded_content = False
no_stylesheets = True
remove_javascript = True
recursions = 0
oldest_article = 90

def print_version(self, url):
return url.replace('articles', 'print')

feeds = [
(u'Main', u'http://feeds.feedburner.com/WalrusFeatureArticles')
]


(I also changed the recursions value to '0').

Thank you for doing this, Starson17! The recipe has rendered a great version of the magazine, and it was very instructive for me to see how to import the BasicNewsRecipe and play with some of the variables. When I have a little more time, I will try to adapt some more news recipes!

Cheers!
jrollmorton is offline  
Old 03-02-2010, 07:39 PM   #1526
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by jrollmorton View Post
This worked beautifully!
I'm glad it's working for you.
Starson17 is offline  
Old 03-03-2010, 09:57 AM   #1527
mrgrossm
Junior Member
mrgrossm began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Feb 2010
Device: Barnes & Noble Nook, Sony 505
Detroit papers

I didn't get any responses to my last post, so I figured I'd repost and instead of a zip file I'd copy and paste my recipes.

I created recipes for both the Detroit News and Free Press, but I can't get it right! The biggest problem is that both have a background, the News one is light enough, but the Free Press is really dark. Also both have lots of junk after the article that I don't know how to get rid of.

Can anybody help?

Free Press recipe:
class BasicUserRecipe1266601171(AutomaticNewsRecipe):
title = u'Detroit Free Press'
oldest_article = 1
max_articles_per_feed = 40

feeds = [(u'News', u'http://www.freep.com/apps/pbcs.dll/section?category=news&template=rss&mime=xml'), (u'General', u'http://www.freep.com/feeds/RSS.xml'), (u'Entertainment', u'http://www.freep.com/apps/pbcs.dll/section?category=ent&template=rss&mime=xml'), (u'Columns', u'http://www.freep.com/apps/pbcs.dll/section?category=col&template=rss&mime=xml')]


Detroit News recipe:
class BasicUserRecipe1266555833(AutomaticNewsRecipe):
title = u'Detroit News'
oldest_article = 1
max_articles_per_feed = 40

feeds = [(u'Local News', u'http://www.detnews.com/feeds/rss36.xml'), (u'Oakland', u'http://www.detnews.com/feeds/rss02.xml'), (u'Wayne', u'http://www.detnews.com/feeds/rss01.xml'), (u'Macomb', u'http://www.detnews.com/feeds/rss03.xml'), (u'Politics', u'http://www.detnews.com/feeds/rss10.xml'), (u'Editorials', u'http://www.detnews.com/feeds/rss07.xml'), (u'Nation', u'http://www.detnews.com/feeds/rss09.xml'), (u'Business', u'http://www.detnews.com/feeds/rss21.xml')]
mrgrossm is offline  
Old 03-03-2010, 01:23 PM   #1528
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by mrgrossm View Post
The biggest problem is that both have a background, the News one is light enough, but the Free Press is really dark. Also both have lots of junk after the article that I don't know how to get rid of.

Can anybody help?
Code:
no_stylesheets = True
Will kill the backgrounds.
Junk after article is usually easiest to kill with
Code:
remove_tags_after = [dict(name='div', attrs={'id':'whatever_id_firebug_says_is_last_desired_content'})]
Use Firebug to find tags for:
keep_only_tags (this may be enough)
remove_tags
remove_tags_after
remove_tags_before

This will do it for the junk in FreePress:
Code:
    keep_only_tags = [dict(name='div', attrs={'id':'article-wrapper'})]
    remove_tags = [dict(name='div', attrs={'id':'sharelinks'})]

Last edited by Starson17; 03-03-2010 at 01:46 PM.
Starson17 is offline  
Old 03-03-2010, 04:35 PM   #1529
t3d
Enthusiast
t3d began at the beginning.
 
Posts: 38
Karma: 10
Join Date: Nov 2009
Location: Poland
Device: kindle 1st gen, kindle dxg, kindle paperwhite2
Quote:
Originally Posted by kovidgoyal View Post
@t3d: Cool, but if you want them included in calibre, you'll have to ping me.
Here I am again
Two new good quality recipes from our repository:
http://github.com/t3d/kalibrator/raw.../fronda.recipe
http://github.com/t3d/kalibrator/raw/master/runa.recipe

BTW, my first name is Tomasz, and not Tomaz like you have written in changelog for 0.6.41
Moreover some of recipes (including one listed above) are made by my pal nicknamed 'Mori'.
t3d is offline  
Old 03-03-2010, 08:57 PM   #1530
mrgrossm
Junior Member
mrgrossm began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Feb 2010
Device: Barnes & Noble Nook, Sony 505
Thanks so much Starson17! The Free press is perfect now! I'm working on the Detroit news one, got some of the junk eliminated. I'll keep working on it!
mrgrossm is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 07:08 AM.


MobileRead.com is a privately owned, operated and funded community.