Custom recipes (archive, read-only) - Page 51

GRiker · 09-23-2009, 08:05 AM

MichaelMSeattle:

Add the following to your recipe:

Code:

	def print_version(self, url):
		return url + '?pagewanted=print'

This will append the necessary suffix to fetch the print version. You can find a description of the function here.

G

kiklop74 · 09-23-2009, 09:38 AM

That will not work since NYT has quite good scraping protection.

This is the recipe that works for NYT magazine, same can be easily modified for other parts of NYT site.

MichaelMSeattle · 09-23-2009, 02:01 PM

Quote:

Originally Posted by kiklop74

That will not work since NYT has quite good scraping protection.

This is the recipe that works for NYT magazine, same can be easily modified for other parts of NYT site.

Thanks very much for responding so quickly! I love how you were able to get the cover image.

Your recipe returned the main articles of the magazine but not the sub-sections (which are listed in the TOC). I modified the recipe to add the sub-section feeds and that only added those to the TOC.

For all the sub articles (not those in the main section) I just see:
"This article was downloaded by calibre from http://www.nytimes.com/2009/09/20/magazine/20Letters-t-001.html" (or whatever was the source).

I'm attaching the full recipe below. Thanks again for your help!
-Mike

==============================================
#!/usr/bin/env python

__license__ = 'GPL v3'
__copyright__ = '2009, Darko Miletic <darko.miletic at gmail.com>'
'''
nytimes.com/pages/magazine
'''

import time
from calibre.web.feeds.news import BasicNewsRecipe

class NewYorkTimesMagazine(BasicNewsRecipe):
title = 'The New York Times Magazine3'
__author__ = 'Darko Miletic'
description = 'News from New York'
publisher = 'The New York Times'
category = 'news, politics, US'
delay = 1
language = 'en_US'
oldest_article = 10
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
encoding = 'cp1252'
INDEX = 'http://www.nytimes.com/pages/magazine/'

conversion_options = {
'comments' : description
,'tags' : category
,'language' : language
,'publisher': publisher
}

keep_only_tags = [dict(name='div', attrs={'id':'article'})]

remove_tags = [
dict(name='div', attrs={'class':['header','nextArticleLink clearfix','correctionNote']})
,dict(name='div', attrs={'id':['toolsRight','articleInline','readerscomment','aut horId']})
,dict(name=['object','link'])
]

remove_tags_after = dict(name='div',attrs={'id':'pageLinks'})

feeds = [(u'Articles', u'http://feeds.nytimes.com/nyt/rss/Magazine' ),
(u'The Ethicist', u'http://ethicist.blogs.nytimes.com/feed/'),
(u'Medium', u'http://themedium.blogs.nytimes.com/feed/'),
(u'Motherload', u'http://parenting.blogs.nytimes.com/feed/')
]

def append_page(self, soup, appendtag, position):
pager = soup.find('div',attrs={'id':'pageLinks'})
if pager:
atag = pager.find('a',attrs={'title':'Next Page'})
if atag:
soup2 = self.index_to_soup('http://www.nytimes.com' + atag['href'])
st = soup2.find('div',attrs={'id':'articleInline'})
if st:
st.extract()
tt = soup2.find('div',attrs={'class':'nextArticleLink clearfix'})
if tt:
tt.extract()
texttag = soup2.find('div', attrs={'id':'articleBody'})
for it in texttag.findAll(style=True):
del it['style']
for it in texttag.findAll(attrs={'id':'authorId'}):
it.extract()
for it in texttag.findAll(attrs={'class':'correctionNote'}):
it.extract()
newpos = len(texttag.contents)
self.append_page(soup2,texttag,newpos)
pager.extract()
pager2 = texttag.find('div',attrs={'id':'pageLinks'})
if pager2:
pager2.extract()
texttag.extract()
appendtag.insert(position,texttag)

def get_article_url(self, article):
return article.get('guid', None)

def preprocess_html(self, soup):
for item in soup.findAll(style=True):
del item['style']
self.append_page(soup, soup.body, 3)
return soup

def get_cover_url(self):
cover = None
soup = self.index_to_soup(self.INDEX)
tag = soup.find('div',attrs={'id':'ABcolumnPromo'})
if tag:
st = time.strptime(tag.h3.string,'%m.%d.%Y')
year = str(st.tm_year)
month = "%.2d" % st.tm_mon
day = "%.2d" % st.tm_mday
cover = 'http://graphics8.nytimes.com/images/' + year + '/' + month +'/' + day +'/magazine/' + day +'cover-395.jpg'
return cover

gregcd · 09-23-2009, 10:29 PM

Hi all, I'm updating a custom recipie to change the base font size for a LRF with html2lrf_options =
However the recipie (New Scientist) already uses this, what is the command to use?

kiklop74 · 09-23-2009, 10:39 PM

Quote:

Originally Posted by gregcd

Hi all, I'm updating a custom recipie to change the base font size for a LRF with html2lrf_options =
However the recipie (New Scientist) already uses this, what is the command to use?

html2lrf_options is obsolete (applies to 0.5.x and earlier versions of calibre). You should use instead new directive:

conversion_options

See example here

kiklop74 · 09-24-2009, 10:30 AM

Business Standard (India's daily newspaper)

CABITSS · 09-24-2009, 12:59 PM

Can someone help and create a recipe for The Toronto Star
Thanks in advance

kiklop74 · 09-24-2009, 01:11 PM

It is already present in this thread:

https://www.mobileread.com/forums/sho...&postcount=747

Andreiko · 09-24-2009, 06:10 PM

I am again asking you guys to make the recipe from
inosmi.ru.
Here the rss: http://www.inosmi.ru/misc/export/xml...ranslation.xml

if this is possible, cuz i understand, it takes time.

Andreiko · 09-24-2009, 06:11 PM

http://www.inosmi.ru/misc/export/xml...ranslation.xml

can someone please make a resipe out of this feed?
I know i have already asked, but maybe it went unoticable

. Sorry for repeating.

bhandarisaurabh · 09-24-2009, 11:18 PM

Quote:

Originally Posted by kiklop74

Business Standard (India's daily newspaper)

thanks a lot you are a genius

L4ur3nt · 09-25-2009, 04:25 AM

Quote:

Originally Posted by L4ur3nt

Hello all,

I have a problem with this flux rss :

http://rss.futura-sciences.com/packfs

I got some stranges thinks in the text like this

Does somebody know why? Thank you very much!

highwaykind · 09-25-2009, 04:57 AM

Quote:

Originally Posted by kiklop74

Here goes:

Thank you!!

olaf · 09-25-2009, 08:54 AM

What is the best way to change "smart quotes" (beginning quote, end quote) into a fixed single quote? My recipe is showing a special question mark character in each place where one of those quote marks occur.

CABITSS · 09-25-2009, 10:45 AM

Quote:

Originally Posted by kiklop74

The Toronto Star:

Thank you so much.
You are brilliant.
Thanks once again for your quick and workable response.

09-23-2009, 08:05 AM	#751
GRiker Comparer of the Ephemeris Posts: 1,496 Karma: 424697 Join Date: Mar 2009 Device: iPad	MichaelMSeattle: Add the following to your recipe: Code: def print_version(self, url): return url + '?pagewanted=print' This will append the necessary suffix to fetch the print version. You can find a description of the function here. G

09-24-2009, 12:59 PM	#757
CABITSS Member Posts: 13 Karma: 10 Join Date: Sep 2009 Device: amazonkindle	Custom Recipe Can someone help and create a recipe for The Toronto Star Thanks in advance

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Custom column read ?	pchrist7	Calibre	2	10-04-2010 03:52 AM
Archive for custom screensavers	sleeplessdave	Amazon Kindle	1	07-07-2010 01:33 PM
How to back up preferences and custom recipes?	greenapple	Calibre	3	03-29-2010 06:08 AM
Donations for Custom Recipes	ddavtian	Calibre	5	01-23-2010 05:54 PM
Help understanding custom recipes	andersent	Calibre	0	12-17-2009 03:37 PM

09-23-2009, 10:29 PM	#754
gregcd Connoisseur Posts: 90 Karma: 100000 Join Date: Jan 2009 Location: New Zealand Device: prs-t1, prs-650 to sell	Hi all, I'm updating a custom recipie to change the base font size for a LRF with html2lrf_options = However the recipie (New Scientist) already uses this, what is the command to use?

09-24-2009, 01:11 PM	#758
kiklop74 Guru Posts: 800 Karma: 194644 Join Date: Dec 2007 Location: Argentina Device: Kindle Voyage	It is already present in this thread: https://www.mobileread.com/forums/sho...&postcount=747

09-24-2009, 06:10 PM	#759
Andreiko Junior Member Posts: 5 Karma: 10 Join Date: Sep 2009 Device: Kindle DX, Sony-505	I am again asking you guys to make the recipe from inosmi.ru. Here the rss: http://www.inosmi.ru/misc/export/xml...ranslation.xml if this is possible, cuz i understand, it takes time.

09-24-2009, 06:11 PM	#760
Andreiko Junior Member Posts: 5 Karma: 10 Join Date: Sep 2009 Device: Kindle DX, Sony-505	http://www.inosmi.ru/misc/export/xml...ranslation.xml can someone please make a resipe out of this feed? I know i have already asked, but maybe it went unoticable . Sorry for repeating.

09-25-2009, 08:54 AM	#764
olaf Enthusiast Posts: 43 Karma: 50 Join Date: May 2009 Device: Kindle3	What is the best way to change "smart quotes" (beginning quote, end quote) into a fixed single quote? My recipe is showing a special question mark character in each place where one of those quote marks occur.

Advert

Advert