Custom recipes (archive, read-only) - Page 52

kiklop74 · 09-25-2009, 02:28 PM

Quote:

Originally Posted by Andreiko

http://www.inosmi.ru/misc/export/xml...ranslation.xml

can someone please make a resipe out of this feed?
I know i have already asked, but maybe it went unoticable

. Sorry for repeating.

Here you go but this assumes you already patched your reader to use russian fonts:

macsilber · 09-25-2009, 04:51 PM

Can someone help me with this feed please

http://www.inc.com/rss.xml

CABITSS · 09-25-2009, 05:02 PM

Quote:

Originally Posted by kiklop74

The Toronto Star:

Hi Kiklop74
Thanks once again
When I run the recipe to download, I notice that all the article headings are repeating, as per sample below.

Sample 1
Ron James's big tent of comedy TheStar.com - Television - Ron James's big tent of comedy

Sample 2
Skates iced for love of dance TheStar.com - Television - Skates iced for love of dance

kiklop74 · 09-25-2009, 05:02 PM

Quote:

Originally Posted by macsilber

Can someone help me with this feed please

http://www.inc.com/rss.xml

Here goes:

kiklop74 · 09-25-2009, 05:03 PM

Quote:

Originally Posted by CABITSS

Hi Kiklop74
Thanks once again
When I run the recipe to download, I notice that all the article headings are repeating, as per sample below.

Sample 1
Ron James's big tent of comedy TheStar.com - Television - Ron James's big tent of comedy

Sample 2
Skates iced for love of dance TheStar.com - Television - Skates iced for love of dance

Yes they do. No time to work on that. Just live with that for now.

CABITSS · 09-25-2009, 05:05 PM

Can you create one for The Toronto Sun.

I did try, but the download file is real large and take about 9 to 12 min to download. So I am doing something wrong.
Can you create a new recipe for the
The Toronto Sun

Thanks

MichaelMSeattle · 09-25-2009, 05:39 PM

Hi again,
I was wondering if anyone had any suggestions for my issue with the New York Times Magazine? I tried the recipe but it didn't return any of the sub-articles, only their headers.

Also, I've heard discussion about using Firefox to determine the sections to remove from returned feeds. Can someone please elaborate a little more about how to do this?

Thanks again for your help!

- Mike

kiklop74 · 09-25-2009, 05:42 PM

That is because of the famous anti-scraping protection they employ. Everything related to NYT is pain.

macsilber · 09-25-2009, 07:25 PM

Thanks so much!

Could you attempt this one for me please

http://feeds.feedburner.com/entrepreneur/latest

MichaelMSeattle · 09-25-2009, 07:37 PM

Quote:

Originally Posted by kiklop74

That is because of the famous anti-scraping protection they employ. Everything related to NYT is pain.

I understand, I just thought I'd ask again.

Thanks!

-M

olaf · 09-26-2009, 11:03 AM

Figured out the smart-quotes thing with encoding. But now I am trying to determine how to replace actual text that is in error. In several places in the actual RSS feed there is an appearance of 'and #8216;' instead of a single quote. The preprocess_regexps command seems to replace everything between x and y with z - that is the only thing I know to make text replacements with. But I tried the following command to no avail. Is this the right command? Do I have the syntax wrong? I just want to replace the entire string, but do I say replace everything between 'and #8217' and semicolon with "'"? (the latter being a single-quote embedded in double-quotes).

preprocess_regexps = [(re.compile(r'and #8216.?;', re.DOTALL|re.IGNORECASE), lambda match: '"')]

Also - trying to convert '' to '', but doesn't seem to work. using for a command is
preprocess_regexps = [(re.compile(r'<strong.?>', re.DOTALL|re.IGNORECASE), lambda match: '')]

(also doing a similar command for the end tag.) What am I doing wrong?

olaf · 09-26-2009, 01:06 PM

And next question! Is there a way to get rid of the top image in this feed (i've cut out the majority of feeds for this example, but each article is preceded by the ad images, starting with "Share" and "Larger Text" . . . Whatever I try hasn't worked so far.

Here's the recipe:

import string, re

class AdvancedUserRecipe1252944207(BasicNewsRecipe):
title = u'Worcester Telegram test'
oldest_article = 1
max_articles_per_feed = 50
timefmt = ''
no_stylesheets = True

preprocess_regexps = [(re.compile(r'<strong.?>', re.DOTALL|re.IGNORECASE), lambda match: '')]
preprocess_regexps = [(re.compile(r'</strong.?>', re.DOTALL|re.IGNORECASE), lambda match: '')]
preprocess_regexps = [(re.compile(r'and #8217.?;', re.DOTALL|re.IGNORECASE), lambda match: '"')]
preprocess_regexps = [(re.compile(r'and #8216.?;', re.DOTALL|re.IGNORECASE), lambda match: '"')]

keep_only_tags = [dict(id=['frontpage_section', 'articleWell', 'headline', 'subheadline', 'SuperHeading', 'byline', 'articleBody', 'zoom1'])]
remove_tags = [dict(id=['factBoxes'])]
preprocess_regexps = [(re.compile(r'.*?', re.DOTALL|re.IGNORECASE), lambda match: '')]
preprocess_regexps = [(re.compile(r'<div class="verdana11">.*?', re.DOTALL|re.IGNORECASE), lambda match: '')]

encoding = 'cp1252'

remove_tags_after = [dict(id='leaderboardBot')]

feeds = [(u'Local News', u' http://www.telegram.com/apps/pbcs.dl...le=1101')]

danielc · 09-26-2009, 01:49 PM

I still cant sucess... can give me abit more help ?
I can go into the 2nd page and have the picture ...

Quote:

Originally Posted by danielc

Hi , i wanna to a help too ...
i go Jamie website and i wanna to make the RSS for the daily recipe of J.Oliver... can give me a help ? the picture cant showing out .. it will be great and i can bring it and make different dishes everyday by him .. ha ha THanks

http://rss.feedsportal.com/c/32402/f/467087/index.rss

kiklop74 · 09-26-2009, 06:01 PM

Quote:

Originally Posted by olaf

And next question! Is there a way to get rid of the top image in this feed (i've cut out the majority of feeds for this example

You are complicating things.

This is how it should look like:

Code:

from calibre.web.feeds.recipes import BasicNewsRecipe

class Telegram(BasicNewsRecipe):
    title                 = 'Telegram'
    oldest_article        = 2
    max_articles_per_feed = 100
    no_stylesheets        = True
    encoding              = 'cp1252'
    use_embedded_content  = False
    language              = 'en'
    extra_css             = ' .headline{font-size: x-large} '

    keep_only_tags     = [dict(name='div', attrs={'class':['headline','subHeadline','byline','articleBody']})]

    remove_tags = [
                     dict(name=['object','link','embed'])
					,dict(name='div',attrs={'class':['relatedContent','verdana11']})
                  ]

    remove_tags_after  = dict(name='div', attrs={'class':'verdana11'})

    feeds = [(u'Frontpage News', u'http://www.telegram.com/apps/pbcs.dll/section?Category=RSS03&MIME=xml')]

bhandarisaurabh · 09-27-2009, 09:25 AM

can anyone help me with the recipe of business world,though I have been using its rss feeds but i want the magazine just like economist

http://www.businessworld.in/bw/Magazine_Current_Issue

09-25-2009, 05:05 PM	#771
CABITSS Member Posts: 13 Karma: 10 Join Date: Sep 2009 Device: amazonkindle	New Request - Custom Receipt Can you create one for The Toronto Sun. I did try, but the download file is real large and take about 9 to 12 min to download. So I am doing something wrong. Can you create a new recipe for the The Toronto Sun Thanks

09-26-2009, 11:03 AM	#776
olaf Enthusiast Posts: 43 Karma: 50 Join Date: May 2009 Device: Kindle3	Figured out the smart-quotes thing with encoding. But now I am trying to determine how to replace actual text that is in error. In several places in the actual RSS feed there is an appearance of 'and #8216;' instead of a single quote. The preprocess_regexps command seems to replace everything between x and y with z - that is the only thing I know to make text replacements with. But I tried the following command to no avail. Is this the right command? Do I have the syntax wrong? I just want to replace the entire string, but do I say replace everything between 'and #8217' and semicolon with "'"? (the latter being a single-quote embedded in double-quotes). preprocess_regexps = [(re.compile(r'and #8216.?;', re.DOTALL\|re.IGNORECASE), lambda match: '"')] Also - trying to convert '<STRONG>' to '<b>', but doesn't seem to work. using for a command is preprocess_regexps = [(re.compile(r'<strong.?>', re.DOTALL\|re.IGNORECASE), lambda match: '<b>')] (also doing a similar command for the end tag.) What am I doing wrong? Last edited by olaf; 09-26-2009 at 12:17 PM.

09-26-2009, 01:06 PM	#777
olaf Enthusiast Posts: 43 Karma: 50 Join Date: May 2009 Device: Kindle3	And next question! Is there a way to get rid of the top image in this feed (i've cut out the majority of feeds for this example, but each article is preceded by the ad images, starting with "Share" and "Larger Text" . . . Whatever I try hasn't worked so far. Here's the recipe: import string, re class AdvancedUserRecipe1252944207(BasicNewsRecipe): title = u'Worcester Telegram test' oldest_article = 1 max_articles_per_feed = 50 timefmt = '' no_stylesheets = True preprocess_regexps = [(re.compile(r'<strong.?>', re.DOTALL\|re.IGNORECASE), lambda match: '<b>')] preprocess_regexps = [(re.compile(r'</strong.?>', re.DOTALL\|re.IGNORECASE), lambda match: '</b>')] preprocess_regexps = [(re.compile(r'and #8217.?;', re.DOTALL\|re.IGNORECASE), lambda match: '"')] preprocess_regexps = [(re.compile(r'and #8216.?;', re.DOTALL\|re.IGNORECASE), lambda match: '"')] keep_only_tags = [dict(id=['frontpage_section', 'articleWell', 'headline', 'subheadline', 'SuperHeading', 'byline', 'articleBody', 'zoom1'])] remove_tags = [dict(id=['factBoxes'])] preprocess_regexps = [(re.compile(r'<!-- This code displays columnist headshots: -->.?<p>', re.DOTALL\|re.IGNORECASE), lambda match: '')] preprocess_regexps = [(re.compile(r'<div class="verdana11">.?<!-- END ARTICLE COMMENTS -->', re.DOTALL\|re.IGNORECASE), lambda match: '')] encoding = 'cp1252' remove_tags_after = [dict(id='leaderboardBot')] feeds = [(u'Local News', u' http://www.telegram.com/apps/pbcs.dl...le=1101')]

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Custom column read ?	pchrist7	Calibre	2	10-04-2010 03:52 AM
Archive for custom screensavers	sleeplessdave	Amazon Kindle	1	07-07-2010 01:33 PM
How to back up preferences and custom recipes?	greenapple	Calibre	3	03-29-2010 06:08 AM
Donations for Custom Recipes	ddavtian	Calibre	5	01-23-2010 05:54 PM
Help understanding custom recipes	andersent	Calibre	0	12-17-2009 03:37 PM

09-25-2009, 04:51 PM	#767
macsilber Junior Member Posts: 4 Karma: 10 Join Date: Sep 2009 Device: kindle	Can someone help me with this feed please http://www.inc.com/rss.xml

09-25-2009, 05:39 PM	#772
MichaelMSeattle Enthusiast Posts: 30 Karma: 16 Join Date: Sep 2009 Device: sony prs-505/600	Hi again, I was wondering if anyone had any suggestions for my issue with the New York Times Magazine? I tried the recipe but it didn't return any of the sub-articles, only their headers. Also, I've heard discussion about using Firefox to determine the sections to remove from returned feeds. Can someone please elaborate a little more about how to do this? Thanks again for your help! - Mike

09-25-2009, 05:42 PM	#773
kiklop74 Guru Posts: 800 Karma: 194644 Join Date: Dec 2007 Location: Argentina Device: Kindle Voyage	That is because of the famous anti-scraping protection they employ. Everything related to NYT is pain.

09-25-2009, 07:25 PM	#774
macsilber Junior Member Posts: 4 Karma: 10 Join Date: Sep 2009 Device: kindle	Thanks so much! Could you attempt this one for me please http://feeds.feedburner.com/entrepreneur/latest

09-27-2009, 09:25 AM	#780
bhandarisaurabh Enthusiast Posts: 49 Karma: 10 Join Date: Aug 2009 Device: none	can anyone help me with the recipe of business world,though I have been using its rss feeds but i want the magazine just like economist http://www.businessworld.in/bw/Magazine_Current_Issue

Advert

Advert