![]() |
#1516 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
I looked at the page quickly. The print articles seem to be a simple change, so: http://www.walrusmagazine.com/articl...lighted-stage/ becomes: http://www.walrusmagazine.com/print/...lighted-stage/ I think you just want: Code:
classmethod BasicNewsRecipe.print_version(url)¶ Take a url pointing to the webpage with article content and return the URL pointing to the print version of the article. By default does nothing. For example: def print_version(self, url): return url + '?&pagewanted=print' This may work: Code:
def print_version(self, url): return url.replace('articles', 'print') Last edited by Starson17; 03-02-2010 at 07:29 AM. |
|
![]() |
![]() |
#1517 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() Posts: 114
Karma: 583
Join Date: Dec 2009
Location: Vigo, Spain
Device: Woxter Scriba 150, pocketbook 360
|
|
![]() |
![]() |
#1518 | |
Junior Member
![]() Posts: 2
Karma: 10
Join Date: Feb 2010
Device: Sony PRS 600
|
Quote:
This is what I have done: Code:
class AdvancedUserRecipe1267281853(BasicNewsRecipe): title = u'Today ONLINE' oldest_article = 7 max_articles_per_feed = 100 no_stylesheets = True feeds = [(u'TODAYonline', u'http://www.todayonline.com/RSS/Todayonline'), (u'Hot News', u'http://www.todayonline.com/RSS/Hotnews'), (u'Singapore', u'http://www.todayonline.com/RSS/Singapore'), (u'World', u'http://www.todayonline.com/RSS/World'), (u'Plus', u'http://www.todayonline.com/RSS/Plus'), (u'Sports', u'http://www.todayonline.com/RSS/Sports'), (u'Business', u'http://www.todayonline.com/RSS/Business'), (u'Tech', u'http://www.todayonline.com/RSS/Tech'), (u'Health', u'http://www.todayonline.com/RSS/Health'), (u'Traveller', u'http://www.todayonline.com/RSS/Traveller'), (u'Voices', u'http://www.todayonline.com/RSS/Voices'), (u'Weekend Voices', u'http://www.todayonline.com/RSS/Weekendvoices'), (u'Comment', u'http://www.todayonline.com/RSS/Comment'), (u'Food', u'http://www.todayonline.com/RSS/Food'), (u'Style', u'http://www.todayonline.com/RSS/Style'), (u'Shop', u'http://www.todayonline.com/RSS/Shop'), (u'At Home', u'http://www.todayonline.com/RSS/Athome'), (u'Car', u'http://www.todayonline.com/RSS/Car'), (u'Silver', u'http://www.todayonline.com/RSS/Silver'), (u'Kids', u'http://www.todayonline.com/RSS/Kids'), (u'Two Days', u'http://www.todayonline.com/RSS/Twodays')] def print_version(self, url): return url.replace('http://', 'http://www.todayonline.com/Print/') Today Online http://www.todayonline.com/RSS/Todayonline Hot News http://www.todayonline.com/RSS/Hotnews Singapore http://www.todayonline.com/RSS/Singapore World http://www.todayonline.com/RSS/World Plus http://www.todayonline.com/RSS/Plus Sports http://www.todayonline.com/RSS/Sports Business http://www.todayonline.com/RSS/Business Tech http://www.todayonline.com/RSS/Tech Health http://www.todayonline.com/RSS/Health Traveller http://www.todayonline.com/RSS/Traveller Voices http://www.todayonline.com/RSS/Voices Weekend Voices http://www.todayonline.com/RSS/Weekendvoices Comment http://www.todayonline.com/RSS/Comment Food http://www.todayonline.com/RSS/Food Style http://www.todayonline.com/RSS/Style Shop http://www.todayonline.com/RSS/Shop At Home http://www.todayonline.com/RSS/Athome Car http://www.todayonline.com/RSS/Car Silver http://www.todayonline.com/RSS/Silver Kids http://www.todayonline.com/RSS/Kids Two Days http://www.todayonline.com/RSS/Twodays The website doesn't provide full RSS feeds so I try to load up the print version of linked articles. What I don't understand is that I seem to be getting the header / footer but I can't see the article itself. It's a free newspaper so all the content should load. Why is this so? |
|
![]() |
![]() |
#1519 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
I loaded up LiveHeaders and took a look at your feeds. They don't actually link to articles. You can see that by just clicking on the links for your feeds. There's some cookies, fancy scripting and redirecting going on. To get the recipe to follow your feeds, you'll need to have it set up a browser session and then it can handle the cookies, etc. to get to your content. Start here and look at the get_browser method. I like to do a search of all recipes that use the method I'm trying to figure out and look at them first. A search of file contents for "get_browser" in the recipes folder will find them. Edit: You're probably also going to need the get_obfuscated_article method here. Last edited by Starson17; 03-02-2010 at 12:33 PM. |
|
![]() |
![]() |
#1520 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,411
Karma: 27757236
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
@Logseman: Not at the moment, no.
|
![]() |
![]() |
#1521 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Your RSS feeds do have cookies, etc., but AFAICT, they simply send you to the same place every time. So the .../RSS/HotNews RSS feed just sends you to the .../Hotnews page. Modifying your feed addresses by removing "/RSS" will give you everything that the RSS feed does with a lot less trouble. The destination page for each of your feeds has an article teaser, with a "Read More" link inside an <a> tag having id=moreLink. I'd approach this as follows: Use the parse_index method described here: Then use soup (as described there) to grab the moreLink as your article for that feed. |
|
![]() |
![]() |
#1522 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Feb 2010
Location: Vancouver, BC, Canada
Device: Amazon Kindle, iPhone
|
The Walrus magazine
Thank you for your generous response re: The Walrus magazine, Starson17! Your analysis of the publication was bang on.
It is definitely the multipage problem that continues to vex me. I tried both of your options and they yield the first of several pages for each article, plus all the crud. I am thinking the replacement of 'articles' with 'print' is not working, because that action does not yield the print version of the page at all. Any further suggestions? Last edited by jrollmorton; 03-02-2010 at 01:35 PM. Reason: missed information |
![]() |
![]() |
#1523 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
What happened when you tried my suggested replace? Hold on, I'll set up a recipe testbed for you and see for myself. |
|
![]() |
![]() |
#1524 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Spoiler:
What's happening with your recipe? You said you tried my two suggestions, but I only gave one, the url.replace('articles', 'print') change. The other one was just the quote from the API about how print_version worked. Edit: I can't work on the merger code because Calibre is busy doing a massive metadata update (I love the automation since Kovid let me add an option that locks author/title). The code above seemed to pull a pretty clean print version for me, with some links, that when followed were mostly crud. Recursion to 0 would turn that off. You'll have to be more specific on what problems you're having. Last edited by Starson17; 03-02-2010 at 02:52 PM. |
|
![]() |
![]() |
#1525 | ||
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Feb 2010
Location: Vancouver, BC, Canada
Device: Amazon Kindle, iPhone
|
The Walrus magazine - recipe
Quote:
I ran it a few times and encountered errors ... the fetch process could not complete after labouring away for 2.5-3 minutes. I corrected some indentation errors (as a result of my copy/paste job). But only when I removed the first four lines of your code did I get the desired result: Quote:
So the successful recipe is: Spoiler:
(I also changed the recursions value to '0'). Thank you for doing this, Starson17! The recipe has rendered a great version of the magazine, and it was very instructive for me to see how to import the BasicNewsRecipe and play with some of the variables. When I have a little more time, I will try to adapt some more news recipes! Cheers! |
||
![]() |
![]() |
#1526 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
|
![]() |
![]() |
#1527 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Feb 2010
Device: Barnes & Noble Nook, Sony 505
|
Detroit papers
I didn't get any responses to my last post, so I figured I'd repost and instead of a zip file I'd copy and paste my recipes.
I created recipes for both the Detroit News and Free Press, but I can't get it right! The biggest problem is that both have a background, the News one is light enough, but the Free Press is really dark. Also both have lots of junk after the article that I don't know how to get rid of. Can anybody help? Free Press recipe: class BasicUserRecipe1266601171(AutomaticNewsRecipe): title = u'Detroit Free Press' oldest_article = 1 max_articles_per_feed = 40 feeds = [(u'News', u'http://www.freep.com/apps/pbcs.dll/section?category=news&template=rss&mime=xml'), (u'General', u'http://www.freep.com/feeds/RSS.xml'), (u'Entertainment', u'http://www.freep.com/apps/pbcs.dll/section?category=ent&template=rss&mime=xml'), (u'Columns', u'http://www.freep.com/apps/pbcs.dll/section?category=col&template=rss&mime=xml')] Detroit News recipe: class BasicUserRecipe1266555833(AutomaticNewsRecipe): title = u'Detroit News' oldest_article = 1 max_articles_per_feed = 40 feeds = [(u'Local News', u'http://www.detnews.com/feeds/rss36.xml'), (u'Oakland', u'http://www.detnews.com/feeds/rss02.xml'), (u'Wayne', u'http://www.detnews.com/feeds/rss01.xml'), (u'Macomb', u'http://www.detnews.com/feeds/rss03.xml'), (u'Politics', u'http://www.detnews.com/feeds/rss10.xml'), (u'Editorials', u'http://www.detnews.com/feeds/rss07.xml'), (u'Nation', u'http://www.detnews.com/feeds/rss09.xml'), (u'Business', u'http://www.detnews.com/feeds/rss21.xml')] |
![]() |
![]() |
#1528 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Code:
no_stylesheets = True Junk after article is usually easiest to kill with Code:
remove_tags_after = [dict(name='div', attrs={'id':'whatever_id_firebug_says_is_last_desired_content'})] keep_only_tags (this may be enough) remove_tags remove_tags_after remove_tags_before This will do it for the junk in FreePress: Code:
keep_only_tags = [dict(name='div', attrs={'id':'article-wrapper'})] remove_tags = [dict(name='div', attrs={'id':'sharelinks'})] Last edited by Starson17; 03-03-2010 at 01:46 PM. |
|
![]() |
![]() |
#1529 | |
Enthusiast
![]() Posts: 38
Karma: 10
Join Date: Nov 2009
Location: Poland
Device: kindle 1st gen, kindle dxg, kindle paperwhite2
|
Quote:
![]() Two new good quality recipes from our repository: http://github.com/t3d/kalibrator/raw.../fronda.recipe http://github.com/t3d/kalibrator/raw/master/runa.recipe BTW, my first name is Tomasz, and not Tomaz like you have written in changelog for 0.6.41 ![]() Moreover some of recipes (including one listed above) are made by my pal nicknamed 'Mori'. |
|
![]() |
![]() |
#1530 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Feb 2010
Device: Barnes & Noble Nook, Sony 505
|
Thanks so much Starson17! The Free press is perfect now! I'm working on the Detroit news one, got some of the junk eliminated. I'll keep working on it!
|
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 02:52 AM |
Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 12:33 PM |
How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 05:08 AM |
Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 04:54 PM |
Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 02:37 PM |