View Full Version : Custom recipes (archive, read-only)


Pages : 1 2 [3] 4 5 6 7 8 9 10 11 12

rc006
04-29-2009, 09:30 AM
ok thanks,
I have not found how to use "self.rower" in print_version, so i grep the the real link from "send link to" url
def get_article_url(self, article):
if 'link' in article:
url = article.get('link', None)

if 'guid' in article:
texte = article.get('description', None)
texte = texte[texte.find('link=')+5:]
url = texte[0:texte.find('"')]
return url

I have "solved" the problem for "Le figaro", but for the web site "Les Echos" i think i really need to use the function you mentioned because the "printed version" is called from the html page with a link "http://lesechos.fr/imprimer.php"

kiklop74
04-30-2009, 09:14 PM
Has anyone worked up a recipe for US News yet? If so, would certainly appreciate it. --John

New recipe for US News:

Sydney's Mom
05-01-2009, 11:17 AM
Anyone else having problems with Chicago Tribune? It just stopped working for me.

thegillons
05-01-2009, 11:58 AM
kiklop74, you are a friend to all of us. Thanks. I hope to learn how to make good recipes so I too can contribute to the community. Thanks again.

kovidgoyal
05-01-2009, 02:21 PM
Anyone else having problems with Chicago Tribune? It just stopped working for me.

When you say it stopped working, what do you mean?

EDIT: nevermind found the bug (it was in conversion to MOBI will be fixed in next release)

kiklop74
05-01-2009, 04:31 PM
I can report that it works fine on my machine. You probably have internet connection issues or somehow calibre got corrupted.

kiklop74
05-01-2009, 11:19 PM
New recipe for Twitch films:

dforsyth
05-02-2009, 05:04 AM
The IHT seems to have become the New York Times Global Edition and I would appreciate it if someone can set up a recipe for this revised link.
Many thanks.

Sydney's Mom
05-02-2009, 12:27 PM
When you say it stopped working, what do you mean?

EDIT: nevermind found the bug (it was in conversion to MOBI will be fixed in next release)

Thank you, 5.10 works! Have to keep up with Swine Flu!

eidolon5861
05-03-2009, 10:34 PM
I know this is not a news feed, but I've been trying for ages to create a recipe that will pull the HTML version of this great book (Without Hot Air - David MacKay). The key 250 pages of it are available here:

http://www.inference.phy.cam.ac.uk/withouthotair/sewthacontents.shtml

Really appreciate it.

kiklop74
05-04-2009, 10:20 AM
You can just try with this:


web2disk --encoding="utf-8" --verbose --dont-download-stylesheets http://www.inference.phy.cam.ac.uk/withouthotair/sewthacontents.shtml

html2epub --linearize-tables -t"Sustainable Energy - without the hot air" -a"David JC MacKay" sewthacontents.xhtml

joshdu1125
05-04-2009, 11:02 AM
I am trying to create a custome recipe with the feed below:
http://www.infzm.com/rss/home/rss2.0.xml

It eithers takes forever, or show up a error message, but I can go to this feed very fast on my browser.
Can somebody help me, it could just because of the server, but I am not sure. Thanks.

kiklop74
05-05-2009, 11:33 AM
Post your recipe code

joshdu1125
05-05-2009, 12:54 PM
class AdvancedUserRecipe1241538875(BasicNewsRecipe):
title = u'\u5357\u65b9\u5468\u672b'
oldest_article = 7
max_articles_per_feed = 100

feeds = [(u'\u71b1\u9ede\u65b0\u805e', u'http://www.infzm.com/rss/home/rss2.0.xml')]


thanks

kiklop74
05-05-2009, 07:12 PM
The slowdown, I guess, was present because you downloaded entire news page which is something you should never do. Here is filtered recipe:

joshdu1125
05-06-2009, 06:47 PM
thanks, how long does it take you to grab that feed?

kiklop74
05-06-2009, 08:37 PM
thanks, how long does it take you to grab that feed?

This site works really bad. It's unstable and quite slow and some articles just get timeout. Also I had to put non-chinese titles to the feed and recipe because that also produces errors on my windows machine. There is nothing you can really do here.

joshdu1125
05-07-2009, 12:09 AM
okay, cause it takes me almost an hour...

kiklop74
05-08-2009, 10:23 AM
The Straits Times recipe:

chinesealbumart
05-08-2009, 11:00 AM
The Straits Times recipe:

On behalf of the (few) Singaporeans who have Sony Reader, I thank you :)

hardav
05-13-2009, 10:49 AM
Hello,
Below is the recipe I created for the Denver Post. I have tried to customize a few times (large headers, description etc) every time I do, it errors out. Any suggestions on cleaning this up?

class AdvancedUserRecipe1242222423(BasicNewsRecipe):
title = u'Denver Post'
oldest_article = 1
max_articles_per_feed = 100

feeds = [(u'Breaking Stories', u'http://feeds.denverpost.com/dp-news-breaking?format=xml'), (u'Business', u'http://feeds.denverpost.com/dp-business?format=xml'), (u'Entertainment', u'http://feeds.denverpost.com/dp-entertainment?format=xml'), (u'Sports', u'http://feeds.denverpost.com/dp-sports?format=xml'), (u'Woody Paige', u'http://feeds.denverpost.com/dp-sports-columnists-woody_paige?format=xml'), (u'Mike Klis', u'http://feeds.denverpost.com/dp-sports-columnists-mike_klis?format=xml'), (u'Food', u'http://feeds.denverpost.com/dp-food?format=xml'), (u'Mike Rosen', u'http://feeds.denverpost.com/dp-opinion-columnists-mike_rosen?format=xml')]

rampo
05-14-2009, 09:05 PM
The IHT seems to have become the New York Times Global Edition and I would appreciate it if someone can set up a recipe for this revised link.
Many thanks.

I would like to second this request for the Global Edition of the New York Times (http://global.nytimes.com/), which is, as dforsyth noted, the reincarnation of the International Herald Tribune.
Thank you.

slm
05-18-2009, 10:25 PM
I just noticed that there is no recipe for Slate magazine. Is there a technical reason for this.
Or am I just the only weirdo who would love one?

kiklop74
05-18-2009, 10:50 PM
For now you are the only one asking for that magazine.

oroig
05-19-2009, 07:25 PM
hi,

I would like to ask for some recipes for spanish newspapers:

http://www.lavanguardia.es

http://www.marca.com

catalan version
http://www.elperiodico.cat

& spanish version
http://www.elperiodico.com

http://www.expansion.com


Thanks in advance to anyone who wishes to try

Rafardeon
05-21-2009, 04:58 PM
I currently worked out my first (very simple) recipe (and learning python on the way :) ). It works pretty well on my PRS-505.
If anyone's interested: it's from PHD Comics:

class AdvancedUserRecipe1242934654(BasicNewsRecipe):
title = u'PHD Comics'
oldest_article = 14
max_articles_per_feed = 100

feeds = [(u'PHD Comics', u'http://www.phdcomics.com/gradfeed.php')]

def print_version(self, url):
return url.replace('http://www.phdcomics.com/comics.php?f=', 'http://www.phdcomics.com/comics/archive_print.php?comicid=')


Sorry, if there's a duplicate (couldn't scan all posts).

fromthewest
05-21-2009, 08:55 PM
I have been trying the following recipe, but it only downloads the first article from the rss feed.

What is wrong?


class AdvancedUserRecipe1242938672(BasicNewsRecipe):
title = u'The Hamilton Spectator'
oldest_article = 1
max_articles_per_feed = 100
__author__ = 'Me'
description = 'News from the Hamilton Spectator'
no_stylesheets = True

keep_only_tags = [dict(id=['AssetWebPart1'])]


feeds = [(u'Top Stories', u'http://www.thespec.com/rss/82672?searchMode=Lineup')]


def print_version(self, url):
return url.replace('http://www.thespec.com/article/', 'http://www.thespec.com/printArticle/')


Thanks

kovidgoyal
05-21-2009, 10:45 PM
Increase the value of oldest_article

fromthewest
05-24-2009, 08:12 AM
Increase the value of oldest_article

Thank You.

I increased the value of oldest _article to 7.

It still only downloaded the first article in the rss feed.

-fromthewest

Derry
05-24-2009, 08:52 AM
For those looking for a Global NYTimes recipe, here is my attempt,
there are a couple of problems, too many blank pages at end of article, and it doesn't grab the second page etc of longer articles, tried url replace, opening browser etc, but I don't understand enough to get it working properly, still might be of use to some people,

Derry

class AdvancedUserRecipe1241195948(BasicNewsRecipe):
title = u'IHT/Global NYT'
oldest_article = 1
max_articles_per_feed = 10
remove_tags_before = dict(id='article')
remove_tags_after = dict(id='article')
remove_tags = [dict(attrs={'class':['articleTools', 'post-tools', 'side_tool', 'nextArticleLink clearfix']}),
dict(id=['footer', 'toolsRight', 'articleInline', 'navigation', 'archive', 'side_search', 'blog_sidebar', 'side_tool', 'side_index']),
dict(name=['script', 'noscript', 'style'])]
encoding = 'cp1252'
no_stylesheets = True
extra_css = 'h1 {font: sans-serif large;}\n.byline {font:monospace;}'
feeds = [(u'Frontpage', u'http://www.nytimes.com/services/xml/rss/nyt/GlobalHome.xml'), (u'Europe', u'http://www.nytimes.com/services/xml/rss/nyt/Europe.xml'), (u'Americas', u'http://www.nytimes.com/services/xml/rss/nyt/Americas.xml'), (u'Africa', u'http://www.nytimes.com/services/xml/rss/nyt/Africa.xml'), (u'Asia Pacific', u'http://www.nytimes.com/services/xml/rss/nyt/AsiaPacific.xml'), (u'Middle East', u'http://www.nytimes.com/services/xml/rss/nyt/MiddleEast.xml'),(u'Opinion', u'http://www.nytimes.com/services/xml/rss/nyt/GlobalOpinion.xml'), (u'Business', u'http://www.nytimes.com/services/xml/rss/nyt/WorldBusiness.xml'), (u'Technology', u'http://feeds.nytimes.com/nyt/rss/Technology'), (u'Sports', u'http://www.nytimes.com/services/xml/rss/nyt/GlobalSports.xml'), (u'Science', u'http://www.nytimes.com/services/xml/rss/nyt/Science.xml'), (u'Environment', u'http://www.nytimes.com/services/xml/rss/nyt/Environment.xml'), (u'Health', u'http://www.nytimes.com/services/xml/rss/nyt/Health.xml'), (u'Arts', u'http://www.nytimes.com/services/xml/rss/nyt/Arts.xml'), (u'Travel', u'http://www.nytimes.com/services/xml/rss/nyt/Travel.xml')]

kovidgoyal
05-24-2009, 02:47 PM
Thank You.

I increased the value of oldest _article to 7.

It still only downloaded the first article in the rss feed.

-fromthewest

look at the debug output get it by clicking the job name

cartman
05-28-2009, 08:18 AM
Hello, I would like to request a recipe for http://climateprogress.org. It is one of the best climate change blogs out there, and with large articles, so I think it is required reading for anybody concerned with the environment.

kiklop74
05-28-2009, 12:12 PM
New recipe for climate progress blog:

steviej
05-29-2009, 03:37 PM
I am having trouble following directions in the user manual to create a recipe for a web site. Can anyone help? the website rss feed is "http://www.humanevents.com/rss/viewfromtheright.xml". Individual article feeds example "http://www.humanevents.com/article.php?id=29232" and print versions of the same are"http://www.humanevents.com/article.php?print=yes&id=29232". I cannot make this work to save my life. It looks simple, but I am not a programmer and I think that is my problem. I am using a PRS-505. Many Thanks......


....forget it, figured it out, the articles on the feed are over a year old!!!
StevieJ

cartman
05-30-2009, 01:53 PM
Hi kiklop74, thanks for the recipe! works like a charm! btw i seem to not be able to find the slashdot recipe in calibre (english news)

kovidgoyal
05-30-2009, 02:29 PM
Hi kiklop74, thanks for the recipe! works like a charm! btw i seem to not be able to find the slashdot recipe in calibre (english news)

Typo on my part, will be in next release.

cartman
06-01-2009, 05:16 AM
About the climate progress recipe.
It works fine, but anything that is in double quotes is missing. For example, the text “What do you get when you buy a nuke? You get a lot of delays and rate increases….”, gets shown as “”

kiklop74
06-01-2009, 09:14 AM
About the climate progress recipe.
It works fine, but anything that is in double quotes is missing. For example, the text “What do you get when you buy a nuke? You get a lot of delays and rate increases….”, gets shown as “”

That was my mistake. Here is updated recipe:

cartman
06-01-2009, 01:27 PM
It works fine now, thanks!

smirando
06-01-2009, 04:31 PM
I wonder if it might be possible to have an update to the Globe and Mail recipe. The site was recently radically redesigned, and so the existing recipie no longer works. I'm not a programmer, and can only program the most basic of full content rss feed sites. Thanks in advance.

kovidgoyal
06-02-2009, 03:38 PM
Globe and mail will be fixed in the next release

smirando
06-02-2009, 04:10 PM
Thank you very very much.

sherryk_us
06-02-2009, 05:46 PM
I don't know if it's possible, but I'd like a reci for the following:

http://www.thebudgetfashionista.com/

Thank you for all the wonderful rss feeds through calibre!

Sherry King

Spankypoo
06-02-2009, 07:04 PM
I'm trying to make a recipe for Autosport's Features feed: http://www.autosport.com/rss/features.xml

I'm not quite sure how this works. It requires credentials, which I have, but I don't know how to get Calibre to log in to the site. And, unfortunately, the feed doesn't include the full articles, so I'm not quite sure how to get it to download the full articles...

Thanks!

kovidgoyal
06-02-2009, 07:54 PM
Look at the wall street journal recipe for an example of a recipe that uses logins If your rss feed contains a link to the full article, calibre should follow that automatically

joshdu1125
06-02-2009, 09:58 PM
seems like the built-in washington post doesn't work properly these days,

Politics
The Washington Post Politics section is the top political news site to better understand the intersection of business and politics and information from the White House and the federal government. Our Politics reports and video include information for government employees,federal agency workers,and discussions and blogs on political and economic issues. washingtonpost.com.

that's what I got for almost every section

emale07
06-03-2009, 10:58 AM
That's what I've gotten for about a week. Just upgraded, thinking that would fix it, but no dice.

kiklop74
06-03-2009, 03:05 PM
I don't know if it's possible, but I'd like a reci for the following:

http://www.thebudgetfashionista.com/

Thank you for all the wonderful rss feeds through calibre!

Sherry King

This site has broken html which confuses calibre anyways here is the recipe that works:

sherryk_us
06-04-2009, 12:02 AM
Thanks!

Sherry

kovidgoyal
06-05-2009, 01:33 AM
Washington post will be fixed in the next release

lokilech
06-08-2009, 04:12 PM
Hello,

i would like to read: http://www.scilogs.de/

Any chance to build a recipe?

kiklop74
06-08-2009, 04:19 PM
hi,
I would like to ask for some recipes for spanish newspapers:


Done. See http://calibre.kovidgoyal.net/ticket/2564

O.S.F
06-11-2009, 01:39 AM
I'm just starting to learn how to build recipes. I'm currently download feeds from multiple sites into one LRF and it works great. What I would like to do is combine articles with the same subjects from multiple sites into one LRF. For example, my Tech News book would contain the Tech sction feed from Washington Post and from Business Week.

Since the clean up and processing are different for each source, I thought the best way is to retreive individual section from each source and combine them into a single LRF.

Is there a way to do this using the current download recipe code? Or does it have to be a "post-download" job? Any way to automate this?

Thanks,

kovidgoyal
06-11-2009, 03:12 AM
Not really. The recipe framework isn't built to post process downlaods from multiple sources.

eLeseratte
06-11-2009, 09:58 AM
I'd like a good recipe for Tagesschau (http://www.tagesschau.de/xml/rss2) Golem (http://rss.golem.de/rss.php?feed=RSS2.0) and Handelsblatt (http://www.handelsblatt.com/rss/hb.xml)

Thank you for your help

oroig
06-12-2009, 07:09 AM
Done. See http://calibre.kovidgoyal.net/ticket/2564

Thanks a lot! I know a lot of spanish people who uses the prs 505 that will thank you.
:thanks:

rbarhat
06-13-2009, 01:19 AM
Can anyone provide with the custom recipe for Economic Times India...
http://economictimes.indiatimes.com/

Thanx in advance...
And great work Kovid and kiklop... just amazing

kiklop74
06-13-2009, 09:26 AM
The economic times - India:

moosejons_dad
06-14-2009, 11:58 AM
Can anyone check out the NY Times subscription rss feed because the sports feed is not shown as well as a few others such as the obits column?
Also, Time magazine does not work, as only the headings are shown..

Thanks in advance..

moosejons_dad
06-14-2009, 05:23 PM
Can anyone check out the NY Times subscription rss feed because the sports feed is not shown as well as a few others such as the obits column?
Also, Time magazine does not work, as only the headings are shown..

Thanks in advance..

These are the feeds:

http://www.nytimes.com/services/xml/rss/index.html

http://www.time.com/time/rss

rbarhat
06-16-2009, 06:51 AM
Wow thank you Kiklop.. So fast you responded.. Thank you so much..

kiklop74
06-16-2009, 08:40 AM
These are the feeds:

http://www.nytimes.com/services/xml/rss/index.html

http://www.time.com/time/rss

Open a ticket at calibre.kovidgoyal.net

rbarhat
06-16-2009, 10:26 AM
Encouraged by your previous response, here goes another of my wishes...

http://www.flonnet.com/

kiklop74
06-16-2009, 11:07 AM
For this one you will have to wait. Perhaps in the next two weeks.

rbarhat
06-16-2009, 12:49 PM
No probs.. Already glad to have EconomicTimes on my hands... Thank u man..

GPThomson
06-19-2009, 06:26 PM
Hello,

I would be interested by the following recipes. Those are newspapers and magazines from Belgium in Dutch.

De Gentenaar
http://www.gentenaar.be/

Het Nieuwsblad
http://www.nieuwsblad.be/

Gazet van Antwerpen
http://www.gva.be/

De Tijd
http://www.tijd.be/

Het Laatste Nieuws
http://www.hln.be/

Het Belang Van Limburg
http://www.hbvl.be/

Humo
http://www.humo.be/

Knack
http://www.knack.be/

I hope it's not too difficult.

Thanks!

emale07
06-21-2009, 04:44 PM
Would anyone mind helping me clean up this recipe for the Kansas City Star?

there's a ton of white space and I can't figure out how to remove the outline that shows up in each article.

thx

kiklop74
06-21-2009, 08:55 PM
Hello,

I would be interested by the following recipes. Those are newspapers and magazines from Belgium in Dutch.


Part of that list is covered here:

http://calibre.kovidgoyal.net/ticket/2690

trektech
06-27-2009, 09:23 AM
can anybody make a recipe for me?
i would love to have a recipe for http://www.inquirer.net/

thank you very much in advance...

rbarhat
06-29-2009, 02:13 AM
http://www.uncrate.com/

The RSS feed is not able to get the full story, just a few lines and then "..."
Please help.

kiklop74
06-29-2009, 10:07 AM
can anybody make a recipe for me?
i would love to have a recipe for http://www.inquirer.net/

thank you very much in advance...

New recipe for inquirer.net:

kiklop74
06-29-2009, 10:08 AM
http://www.uncrate.com/

The RSS feed is not able to get the full story, just a few lines and then "..."
Please help.

Uncrate recipe:

trektech
06-29-2009, 12:45 PM
New recipe for inquirer.net:


thank you very very much..:thanks:

rbarhat
06-29-2009, 08:06 PM
Thank you so much Kiklop...

JIGACE
06-30-2009, 04:56 PM
I been tryng to do a recipe to download a blog... the tipe of xxx.blogspot.com but i cant do it, it always download only a portion of the blog, is there a way to download the entire blog? The blog is http://globaliciousworld.blogspot.com/
thanks in advance:help:

SQMS
07-02-2009, 08:32 AM
Is it possible for someone to create a recipe for Accountancy Age for me please.

http://www.accountancyage.com/

Many thanks

omfosm
07-02-2009, 04:02 PM
looking for a fastcompany.com custom receipt please! thx

kiklop74
07-03-2009, 09:06 AM
I been tryng to do a recipe to download a blog... the tipe of xxx.blogspot.com but i cant do it, it always download only a portion of the blog, is there a way to download the entire blog? The blog is http://globaliciousworld.blogspot.com/
thanks in advance:help:

See this image how to start recipe

http://img170.imageshack.us/img170/1807/first.th.png (http://img170.imageshack.us/i/first.png/)

After that click on the 'Switch to advanced mode' and add these two lines:


use_embedded_content = True
encoding = 'utf-8'


So the result should be something like this:


class AdvancedUserRecipe1246622332(BasicNewsRecipe):
title = u'Globalicious'
oldest_article = 15
max_articles_per_feed = 100
use_embedded_content = True
encoding = 'utf-8'
feeds = [(u'All articles', u'http://globaliciousworld.blogspot.com/feeds/posts/default?alt=rss')]


That is all

JIGACE
07-03-2009, 11:41 AM
See this image how to start recipe

http://img170.imageshack.us/img170/1807/first.th.png (http://img170.imageshack.us/i/first.png/)

After that click on the 'Switch to advanced mode' and add these two lines:


use_embedded_content = True
encoding = 'utf-8'


So the result should be something like this:


class AdvancedUserRecipe1246622332(BasicNewsRecipe):
title = u'Globalicious'
oldest_article = 15
max_articles_per_feed = 100
use_embedded_content = True
encoding = 'utf-8'
feeds = [(u'All articles', u'http://globaliciousworld.blogspot.com/feeds/posts/default?alt=rss')]


That is all

Thanks man, it work, but doesn't download all the posts... i don´t know why... could it be the feed? so far it only downloads about a month of posts... is there a way to download the page and not the feed?:thumbsup:

kiklop74
07-03-2009, 12:15 PM
Thanks man, it work, but doesn't download all the posts... i don´t know why... could it be the feed? so far it only downloads about a month of posts... is there a way to download the page and not the feed?:thumbsup:

You should specify how old articles you want. Change the number in:


oldest_article = 15


to for example:


oldest_article = 80



The number represent the days in teh past of latest article you want to accept counting from now.

JIGACE
07-03-2009, 01:04 PM
You should specify how old articles you want. Change the number in:


oldest_article = 15


to for example:


oldest_article = 80



The number represent the days in teh past of latest article you want to accept counting from now.

I already did that, i increased the number of days until 365, but the results are always the same, downloads from today to may 4th, and thats it, no matter the number is the same result. I'm intrigued and frustated.. :blink:

kiklop74
07-03-2009, 02:06 PM
I already did that, i increased the number of days until 365, but the results are always the same, downloads from today to may 4th, and thats it, no matter the number is the same result. I'm intrigued and frustated.. :blink:

No need to be frustrated. You can download only what is published in rss feed and currently oldest article is from may 23. That's it. You can not get older than that.

For anything else a special custom recipe is needed.

JIGACE
07-03-2009, 02:33 PM
No need to be frustrated. You can download only what is published in rss feed and currently oldest article is from may 23. That's it. You can not get older than that.

For anything else a special custom recipe is needed.

I was afraid of that... :smack: thank you for the help...:thanks:

kiklop74
07-04-2009, 12:34 PM
Is it possible for someone to create a recipe for Accountancy Age for me please.

http://www.accountancyage.com/

Many thanks

Here goes:

kiklop74
07-04-2009, 12:34 PM
looking for a fastcompany.com custom receipt please! thx

Here goes:

ModileReader
07-05-2009, 12:31 AM
Would it be possible for someone to create a custom recipe for http://www.gulli.com/news/ and forhttp://www.fayerwayer.com/??
I would appreciate it much...

trektech
07-05-2009, 04:29 AM
hi kiklop. just wondering what software are you using to make this recipe. sorry for the noob question. :)

kiklop74
07-05-2009, 01:00 PM
hi kiklop. just wondering what software are you using to make this recipe. sorry for the noob question. :)

Any text editor will do. I use notepad++ . That is all.

SQMS
07-05-2009, 02:56 PM
Thank you Kiklop for the Accountancy Age recipe

trektech
07-05-2009, 07:43 PM
Any text editor will do. I use notepad++ . That is all.

thank you for the reply.... :)

ModileReader
07-06-2009, 12:12 AM
My problem with the feeds mentioned above is that I don´t know how to get the right url to put in the script. But both webpages have RSS-feeds.

kiklop74
07-06-2009, 08:53 AM
My problem with the feeds mentioned above is that I don´t know how to get the right url to put in the script. But both webpages have RSS-feeds.

The first one has RSS url http://ticker.gulli.com/rss
And the other one has RSS url is http://feeds.feedburner.com/fayerwayer

What exactly is your problem?

ModileReader
07-06-2009, 04:09 PM
To get these urls... But it works with them, so thank you very much!:thanks:

d1stewart
07-09-2009, 08:25 AM
Trying to download the day's articles on National Review Online (www.nationalreview.com). I followed the "how-to" online to do the custom script, and came up with this:

class AdvancedUserRecipe1247136264(BasicNewsRecipe):
title = u'National Review Online'
oldest_article = 2
max_articles_per_feed = 30

feeds = [(u'National Review Online', u'http://www.nationalreview.com/index.xml')]
def print_version(self, url):
return url.replace('http://article.nationalreview.com/', 'http://article.nationalreview.com/print/')


But all that gives me is, basically, a table of contents. Before adding the two last lines -- "def print_version etc." -- it did return articles, but only the first pages of multipage articles, and not the print pages (which are the complete articles).

Where am I making my mistakes?

trektech
07-11-2009, 08:21 AM
can you make a recipe for http://www.todayonline.com/RSS? thank you..

moosejons_dad
07-12-2009, 02:14 PM
Can someone make a new recipe for the Sunday NYTimes because the current one does not contain the Sports or Business section for Sunday only?
http://www.nytimes.com/services/xml/rss/index.html

Thank you.

p3aul
07-13-2009, 02:44 PM
Can I get a custom recipe for the NOAA in my area. here is the link to the webpage:

http://forecast.weather.gov/MapClick.php?CityName=Warner+Robins&state=GA&site=FFC&textField1=32.6127&textField2=-83.6313&e=0

I tried to do this myself by adding a feed but it didn't work.

Do I just add the script you create in the recipe box in advanced mode?
Thanks,
Paul

kiklop74
07-13-2009, 04:32 PM
can you make a recipe for http://www.todayonline.com/RSS? thank you..

You can do it yourself. See this post: http://www.mobileread.com/forums/showpost.php?p=511193&postcount=578

Start in the same manner, add all the feeds you want and than click on advanced mode button and add this at the bottom:


encoding = 'utf-8'
def print_version(self, url):
return url.replace('www.todayonline.com/', 'www.todayonline.com/Print/')


It should work like that

trektech
07-14-2009, 10:46 AM
You can do it yourself. See this post: http://www.mobileread.com/forums/showpost.php?p=511193&postcount=578

Start in the same manner, add all the feeds you want and than click on advanced mode button and add this at the bottom:


encoding = 'utf-8'
def print_version(self, url):
return url.replace('www.todayonline.com/', 'www.todayonline.com/Print/')


It should work like that


thank you, tried to make my own recipe but was unsuccessful, so i followed your advise but then it download only every first paragraph of the news.. i dont know what to do after that. :(

kiklop74
07-14-2009, 10:58 AM
thank you, tried to make my own recipe but was unsuccessful, so i followed your advise but then it download only every first paragraph of the news.. i dont know what to do after that. :(

I forgot to add this :)


use_embedded_content = False


With that it will work

f1doc
07-15-2009, 03:46 AM
Hello everyone. I'm new to this fabulous program. I'm an official paid subscriber to the New England Journal of Medicine. I made a recipe for Calibre to pick up each week's issue, using the RSS feed. This works fine . . . EXCEPT that the links to the articles go to an article summary, and not the full text. How can I tweak the recipe to have calibre download the full text and not just the article summary?

If necessary I can provide my details so this can be tested.

Thanks SO much for this great program!
-Gary

f1doc
07-15-2009, 08:12 AM
I added the print_version for the full version of the articles, but now i need to enter my username and password corresponding to my subscription, in order to access the articles that are only available to subscribers. where do i do this in the recipe? what are the necessary python commands?

thanks in advance

kiklop74
07-15-2009, 09:27 AM
See here the documentation:

http://calibre.kovidgoyal.net/user_manual/news.html

f1doc
07-15-2009, 09:40 AM
Hi kiklop!

I've read over this lots of times, and i admit it's a bit over my head. i can see that i need to add the log in stuff, but honestly, i'm not quite tech-savvy enough for this. i'm an anesthetist and emergency doctor, so i'm not stupid, but this stuff DOES get complicated :)

so any advice you can give me, stuff i can almost copy/paste, would be SO appreciated.

thanks again!
-Gary

kiklop74
07-15-2009, 09:50 AM
Hi kiklop!

I've read over this lots of times, and i admit it's a bit over my head. i can see that i need to add the log in stuff, but honestly, i'm not quite tech-savvy enough for this. i'm an anesthetist and emergency doctor, so i'm not stupid, but this stuff DOES get complicated :)

so any advice you can give me, stuff i can almost copy/paste, would be SO appreciated.


Add this to your recipe and it should work:


needs_subscription = True

def get_browser(self):
br = BasicNewsRecipe.get_browser()
if self.username is not None and self.password is not None:
br.open('http://content.nejm.org/cgi/login?uri=%2F')
br.select_form(nr=0)
br['username'] = self.username
br['code' ] = self.password
br.submit()
return br

f1doc
07-15-2009, 10:41 AM
Thanks kiklop, but this doesnt do the trick. oh well! thanks for the work and time - it would be great to get this to work, but seems rather difficult!
-Gary

trektech
07-16-2009, 12:31 PM
I forgot to add this :)


use_embedded_content = False


With that it will work

it does not work.... :o

GetSpiffed
07-17-2009, 04:21 AM
Hello,

Is there someone who can help me with a recipe for the various sections of the Dutch newspaper NRC:

http://www.nrc.nl/rss/binnenland
http://www.nrc.nl/rss/buitenland
http://www.nrc.nl/rss/economie
http://www.nrc.nl/rss/sport
http://www.nrc.nl/rss/wetenschap
http://www.nrc.nl/rss/kunst
http://www.nrc.nl/rss/media

I can't get them to work, getting a lot of unwantend garbage.

Thanks in advance...

Pepijn

kiklop74
07-19-2009, 08:47 AM
Hello,

Is there someone who can help me with a recipe for the various sections of the Dutch newspaper NRC:

http://www.nrc.nl/rss/binnenland
http://www.nrc.nl/rss/buitenland
http://www.nrc.nl/rss/economie
http://www.nrc.nl/rss/sport
http://www.nrc.nl/rss/wetenschap
http://www.nrc.nl/rss/kunst
http://www.nrc.nl/rss/media

I can't get them to work, getting a lot of unwantend garbage.

Thanks in advance...

Pepijn


Create recipe in calibre in standard way, add all the feeds to that recipe and than click on advanced button.

Add this code:


encoding = 'cp1252'

keep_only_tags = [dict(name='div', attrs={'class':'article clearfix'}) ]

def print_version(self, url):
return url + '?service=Print'

Alc
07-22-2009, 08:12 PM
First off, Calibre is amazing - many thanks to kovidgoyal and all who've helped with it.

It's worked pretty much fine for most of what I've wanted to do with it so far (minor layout issues here and there, nothing major), but the RSS feed for Wired seems broken. It took 80 minutes (!) to come down, and then when opened each article only shows a login prompt and the comments. I wouldn't know where to start with patching the python code or whatever it is. I don't suppose some kind soul could point me in the direction of a fixed recipe thing, or tell me how to amend the one I've got?

Many thanks again.

kiklop74
07-22-2009, 08:25 PM
First off, Calibre is amazing - many thanks to kovidgoyal and all who've helped with it.

It's worked pretty much fine for most of what I've wanted to do with it so far (minor layout issues here and there, nothing major), but the RSS feed for Wired seems broken. It took 80 minutes (!) to come down, and then when opened each article only shows a login prompt and the comments. I wouldn't know where to start with patching the python code or whatever it is. I don't suppose some kind soul could point me in the direction of a fixed recipe thing, or tell me how to amend the one I've got?

Many thanks again.

That is Kovid's recipe. Open a ticket for that in calibre's trac (calibre.kovidgoyal.net).

JIGACE
07-24-2009, 12:51 PM
Hi kiklop! i was wondering if you could help me... i been tryng to retrieve a blog, not the feed, because its incomplete but the actual feed, but i cant do it, i'm not very skilld on the programming stuff... could you help me? I need a recipe for a blog... a blogspot.com blog...

please?

kiklop74
07-25-2009, 06:16 AM
I'm too occupied with my daily job right now to do anything else. It will stay that way for some time.

amerryman
07-25-2009, 10:06 PM
Just want to verify if I'm the only having a problem with Scientific America Recipe on my PRS-505 (causes reboot in .LRF format)

Add Wired Mag to that as well..

kovidgoyal
07-26-2009, 01:42 AM
http://calibre.kovidgoyal.net/user_manual/faq.html#my-downloaded-news-content-causes-the-reader-to-reset

OnwardAhead
07-26-2009, 02:42 AM
Am trying to run a test with the ForeignPolicy.com main feed. Have tried a number of variations, and think that this following snippet would be closest to what I want to get, but no love.

class AdvancedUserRecipe1248523694(BasicNewsRecipe):
title = u'Foreign Policy Test'
oldest_article = 15
max_articles_per_feed = 100
keep_only_tags = [dict(name='div', attrs={'id':'art-mast'}),dict(name='div', attrs={'id':'art-body'})]

feeds = [(u'Main', u'http://www.foreignpolicy.com/node/feed')]

def print_version(self, url):
return url + '?print=yes&hidecomments=yes&page=full'

Not sure if my syntax is off, but I continually get 'IndexError: list index out of range' (see attached logs).

Any thoughts?

kovidgoyal
07-26-2009, 02:53 AM
That's a weird set of errors. Off the top of my head I'd say the pages being downloaded are not even HTML

The IndexErrors basically come from finding no html tags in the files

amerryman
07-27-2009, 12:56 AM
Thanks Kovid -

Okay just to be sure I re-read the FAQ and didn't see anything there about this maybe you can shed some light. Is there a way to (BESIDES creating a custom feed for each one) adjust the tags that are associated with recipes?

For instance instead of having each one tagged as news AND the title ie: under collections I have 8 news and each one self titled Washington Post, Engadget, etc.. is there a easy way of omitting the Self Title and just tagging it as News for the Collections?

kovidgoyal
07-27-2009, 02:43 AM
Not at the moment

joxxon
07-27-2009, 11:14 AM
I would really love to be able to read the swedish news paper Dagens Nyheter on my prs-505. Anyone that have cocked this one?

www.dn.se

feeds:

http://www.dn.se/toppnyheter-rss
http://www.dn.se/ekonomi-rss
http://www.dn.se/sport-rss
http://www.dn.se/debatt-rss
http://www.dn.se/ledare-rss
http://www.dn.se/kultur-rss

BTW, Im running a news website (free and add free so this is not a pitch ;) ) that collects news feeds from various news papers etc. Everything from CNN to BBC. Its designed for mobile devices. Feel free to have a look.

http://getnews.mine.nu

samgler
07-27-2009, 05:36 PM
Collection of Thai Newspapers.

http://www.norsorpor.com/chooseRSS.php

I browse within each category, news are there. However, I cannot make them to LRFs.

My codes are:

class AdvancedUserRecipe1248726179(BasicNewsRecipe):
title = u'NorSorPor'
oldest_article = 7
max_articles_per_feed = 100
encoding = 'utf_8'
no_stylesheets = True
use_embedded_content = False
remove_javascript = True

remove_tags = [dict(name='td', attrs={'align':'right'})]
remove_tags = [dict(name='td', attrs={'align':'left'})]

html2lrf_options = ['--ignore-tables']
html2epub_options = 'linearize_tables = True'

feeds = [
(u'Hot News', u'http://www.norsorpor.com/rss.php?category=1'),
(u'Business', u'http://www.norsorpor.com/rss.php?category=3'),
(u'Entertainment', u'http://www.norsorpor.com/rss.php?category=4'),
(u'Around The World', u'http://www.norsorpor.com/rss.php?category=5'),
(u'Sports', u'http://www.norsorpor.com/rss.php?category=6'),
(u'Technology', u'http://www.norsorpor.com/rss.php?category=9'),
(u'Premiere League', u'http://www.norsorpor.com/rss.php?category=21')
]


Each category will have its own blank page with only the title inside. Am I missing something? TIA

ssimon2000
07-28-2009, 01:31 AM
Has anyone developed a recipe for Our Daily Bread? The website is http://www.rbc.org/odb/odb.shtml, and the RSS feed is http://www.rbc.org/rss.ashx?id=50398.
I've tried loading the RSS address into a custom news source, but all I get is a title page...

Thanks!

cartesio
07-28-2009, 06:38 PM
Please help me with a recipe for Project Syndicate:
http://www.project-syndicate.org
http://www.project-syndicate.org/about_us/rss

Thanks!

OnwardAhead
07-29-2009, 05:20 AM
On the Foreign Policy feeds, very odd. Initially I thought there may have been an issue due to the javascript embedded in the print-friendly URLS. That said, if you try the feed direct, without rewriting the links (as my example did), you still get the same errors.

Has anyone tried this off of the ForeignPolicy Main feed (http://www.foreignpolicy.com/node/feed)?? Does not seem to be a problem with their other feeds. Would really like to get this working so any input is appreciated.

Cheers

scwehrl
07-29-2009, 08:28 AM
Seems that after upgrade to 6.0 the first couple of times I downloaded Newsweek it worked fine, now get the attached parse error. Something on my end? Windows Vista 32bit. Thanks for any help.

kovidgoyal
07-29-2009, 01:10 PM
Yeah newsweek started embedding some invalid XML content that is causing that error. Will be fixed in the next release.

dieterpops
07-30-2009, 03:03 PM
I would like to get a recipe for the Minneapolis Star Tribune. Thanks!

scwehrl
07-30-2009, 11:07 PM
Thanks very much Kovid

Krapmeister
07-31-2009, 02:05 AM
If anyone with advanced recipe knowledge can do a recipe for http://www.mcsweeneys.net/ I'd appreciate it.

Basic mode gets the content but also pages and pages of links to archive content which isn't in the feed.

The site adds about 10 items every day so a recipe that get's the most recent 30 would suit me, and then I can just schedule it for every 3 days.

Thanks

K

hackettt
08-01-2009, 02:51 PM
Kovid and Darko —

I am not skilled in recipes, and I believe I am doing something incorrect. What I wish to do is edit the UK papers' recipes to omit all categories except sport. (I am cricket and football fan living in the States.) However, when I try to edit the recipes, Calibre does not download anything. I first tried this with the Daily Mail to no success.

Is there something I must do besides eliminating the code for other sections I do not want?

This is what I reduced the code to:

from calibre.web.feeds.news import BasicNewsRecipe

class TheDailyMail(BasicNewsRecipe):
title = u'The Daily Mail'
oldest_article = 2
language = _('English')
author = 'RufusA'
simultaneous_downloads= 1
max_articles_per_feed = 50

extra_css = 'h1 {text-align: left;}'

remove_tags = [ dict(name='ul', attrs={'class':'article-icons-links'}) ]
remove_tags_after = dict(name='h3', attrs={'class':'social-links-title'})
remove_tags_before = dict(name='div', attrs={'id':'content'})
no_stylesheets = True

feeds = [
(u'Sport', u'http://www.dailymail.co.uk/sport/index.rss')]

def print_version(self, url):
main = url.partition('?')[0]
return main + '?printingPage=true'

However, nothing occurs when I try to download the information.

Thanks. Cheers.

kovidgoyal
08-01-2009, 02:52 PM
No you should just need to remove the feeds you dont want

kiklop74
08-01-2009, 04:24 PM
Kovid and Darko —

Is there something I must do besides eliminating the code for other sections I do not want?



You are complicating things without real need. This is what you need to change in your recipe:



class AdvancedUserRecipe1249153260(BasicNewsRecipe):
title = u'DailyMail'
oldest_article = 2
max_articles_per_feed = 100
no_stylesheets = True
encoding = 'cp1252'

keep_only_tags = [dict(name='div', attrs={'id':'js-article-text'})]

remove_tags = [dict(name='div', attrs={'class':['relatedItems','article-icon-links-container']})]

remove_tags_after = dict(name='h3', attrs={'class':'social-links-title'})

feeds = [(u'Sports', u'http://www.dailymail.co.uk/sport/index.rss')]

def print_version(self, url):
main = url.partition('?')[0]
return main + '?printingPage=true'

fogus
08-03-2009, 05:45 PM
I did a search but I didn't find anything for the following idea:

I read a lot of fixed width (80 character often) texts. Does anyone have a script to turn these into paragraphized texts?

Some examples:
http://www.ietf.org/rfc/rfc793.txt (RFC: TCP)
http://www.gutenberg.org/files/345/345.txt (Dracula from Gutenberg) (Yes, I know there is an HTML version of that one.)

Malakai
08-03-2009, 09:45 PM
Hey guys im tryin to grab a print version using the advanced version but i need to set the url.replace to change two things in the url.

here an example of the original url

http://www.dpreview.com/reviews/olympusep1/?from=rss

this is the url for the print version

http://www.dpreview.com/reviews/print.asp?review=OlympusEP1

how do i get it to remove the /?from=rss at the end

This is what i currently have

def print_version(self, url):
return url.replace('http://www.dpreview.com/reviews/', 'http://www.dpreview.com/reviews/print.asp?review=')

kiklop74
08-04-2009, 10:26 AM
Hey guys im tryin to grab a print version using the advanced version but i need to set the url.replace to change two things in the url.

here an example of the original url

http://www.dpreview.com/reviews/olympusep1/?from=rss

this is the url for the print version

http://www.dpreview.com/reviews/print.asp?review=OlympusEP1

how do i get it to remove the /?from=rss at the end

This is what i currently have

def print_version(self, url):
return url.replace('http://www.dpreview.com/reviews/', 'http://www.dpreview.com/reviews/print.asp?review=')

Try this:


def print_version(self, url):
baseurl = url.rpartition('/?')[0]
turl = baseurl.partition('/reviews/')[2]
return 'http://www.dpreview.com/reviews/print.asp?review=' + turl

Malakai
08-04-2009, 12:39 PM
Thank you so much for that, only thing missing are the pictures lol.
How do i retain those in the finished epub.

jbambridge
08-06-2009, 06:47 AM
Problem parsing guardian rss feed:

I have tried to update the Guardian Recipe to fix some problems with changes in the web site etc. I am almost there, but I am hitting the odd article that causes the following errors in ebook-convert:

Parsing feed_0/article_7/index.html ...
Traceback (most recent call last):
File "cli.py", line 254, in <module>
File "cli.py", line 246, in main
File "calibre\ebooks\conversion\plumber.pyo", line 657, in run
File "calibre\ebooks\conversion\plumber.pyo", line 761, in create_oebbook
File "calibre\ebooks\oeb\reader.pyo", line 72, in __call__
File "calibre\ebooks\oeb\reader.pyo", line 588, in _all_from_opf
File "calibre\ebooks\oeb\reader.pyo", line 243, in _manifest_from_opf
File "calibre\ebooks\oeb\reader.pyo", line 176, in _manifest_add_missing
File "calibre\ebooks\oeb\base.pyo", line 988, in fget
File "calibre\ebooks\oeb\base.pyo", line 917, in _parse_xhtml
TypeError: unsupported operand type(s) for +=: 'NoneType' and 'str'


The modified recipe is as follows:

#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = '2008, Kovid Goyal kovid@kovidgoyal.net'
__docformat__ = 'restructuredtext en'

'''
www.guardian.co.uk
'''

from calibre.web.feeds.news import BasicNewsRecipe

class Guardian(BasicNewsRecipe):

title = u'My Guardian'
language = _('English')
encoding = 'utf-8'
oldest_article = 7
max_articles_per_feed = 20
remove_javascript = True
simultaneous_downloads = 1
use_embedded_content = False
recursions = 0
filter_regexps = [r'\.g\.doubleclick\.net']

timefmt = ' [%a, %d %b %Y]'

keep_only_tags = [dict(id=['article-wrapper', 'main-article-info'])]




no_stylesheets = True
extra_css = 'h2 {font-size: medium;} \n h1 {text-align: left;}'


feeds = [
('Front Page', 'http://feeds.guardian.co.uk/theguardian/rss'),
# ('UK', 'http://feeds.guardian.co.uk/theguardian/uk/rss'),
# ('Business', 'http://www.guardian.co.uk/business/rss'),
# ('Politics', 'http://feeds.guardian.co.uk/theguardian/politics/rss'),
# ('Culture', 'http://feeds.guardian.co.uk/theguardian/culture/rss'),
# ('Money', 'http://feeds.guardian.co.uk/theguardian/money/rss'),
# ('Life & Style', 'http://feeds.guardian.co.uk/theguardian/lifeandstyle/rss'),
# ('Travel', 'http://feeds.guardian.co.uk/theguardian/travel/rss'),
# ('Environment', 'http://feeds.guardian.co.uk/theguardian/environment/rss')
]

def print_version(self, url):
return url + '/print'


Any ideas what the error means?

John

jbambridge
08-06-2009, 11:11 AM
One extra thought:

Checking:
File "calibre\ebooks\oeb\base.pyo", line 917, in _parse_xhtml

in the source code shows that this is a part of the code that removes empty <a></a> tags. This is indeed the case on the example I gave where the publisher has left a strange link in the text.

Adding a
remove_tags = [dict(name='a')] is a work around, although this also destroys valid <a> tags.

My PHP is not up to fixing the _parse_xhtml code myself though.

Can anyone suggest a better work around (that doesn't delete any valid content) or a fix to the PHP code?

John

P.S. I've attached the offending article as an example of the empty <a> tags. index.txt is after porcessing by the recipe and problem.txt is the original html file.

kovidgoyal
08-06-2009, 11:41 AM
Will be fixed in next release.

kiklop74
08-06-2009, 12:45 PM
I tried creating recipe for a new version and was not able to make conversion_options work.

Is it operational at all?

This is what I tried:


conversion_options = { 'tags':'aa,bb'
, 'publisher': 'pub'
, 'comments': 'desc'
, 'language': 'en'
}

kovidgoyal
08-06-2009, 12:55 PM
EDIT: Actually, looking at the code, it should be.

kiklop74
08-06-2009, 01:02 PM
EDIT: Actually, looking at the code, it should be.

well it does not work. Do you want issue for this?

kovidgoyal
08-06-2009, 01:24 PM
well it does not work. Do you want issue for this?

Never mind I'm looking at it now.

jj2me
08-06-2009, 11:57 PM
Since I couldn't find the Smithsonian Magazine in a search of this thread, and it's my sister's favorite magazine, I humbly submit this bare minimum effort (don't know Python) in case anyone else might like it and doesn't mind skipping over some poor formatting.

It's merely the RSS assembling from this page (http://www.smithsonianmag.com/RSS.html). Note that I set oldest_article = 30 for this monthly magazine. Change as you see fit.

AprilHare
08-07-2009, 04:39 AM
Attached is the errors I got when I tried to download the Sydney Morning Herald - too long for a simple post..

kiklop74
08-07-2009, 09:22 AM
Here is my take on SMH website:

kovidgoyal
08-07-2009, 01:06 PM
@AprilHare: Fixed in next release.

Malakai
08-08-2009, 12:33 PM
is there anyway to modify the downloaded news feed so that the tags are removed and it only shows News as the tag?

kovidgoyal
08-10-2009, 01:30 PM
Not at this time.

scwehrl
08-10-2009, 03:24 PM
Occurs in Vista32 and XP at least the past two days.

rkinsella
08-12-2009, 01:48 PM
Hi,

I updated the "The Irish Times" receipe as the rss feed has changed to use feed portal, so the receipe was pulling down alot more than it needed to. Find attached the new recipe, it uses a regex expression to pull the correct url out of the article's summary.

Thanks
(keep up the good work)

Ray Kinsella

kovidgoyal
08-12-2009, 02:12 PM
@rkinsella: Thanks
@schwerl: Should be fixed in 0.6.6

scwehrl
08-12-2009, 09:32 PM
Kovid, again, and as always, thank you for this software and for keeping one step ahead of others who break their html, xml, code, etc, etc

kovidgoyal
08-13-2009, 01:26 AM
Actually this time, I was the one who broke things :) But, thanks

ssimon2000
08-13-2009, 01:56 AM
Has anyone developed a recipe for Our Daily Bread? The website is http://www.rbc.org/odb/odb.shtml, and the RSS feed is http://www.rbc.org/rss.ashx?id=50398.
I've tried loading the RSS address into a custom news source, but all I get is a title page...

Thanks!

Surely someone can help me with this? :blink:

:help:

acidzebra
08-13-2009, 08:31 AM
Here is one I use for the Volkskrant (Dutch newspaper) which works well:

class AdvancedUserRecipe1249039563(BasicNewsRecipe):
title = u'De Volkskrant'
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True
keep_only_tags = [dict(name='div', attrs={'id':'leftColumnArticle'}) ]
remove_tags = [
dict(name='div',attrs={'class':'article_tools'}),
dict(name='div',attrs={'id':'article_tools'}),
dict(name='div',attrs={'class':'articletools'}),
dict(name='div',attrs={'id':'articletools'}),
dict(name='div',attrs={'id':'myOverlay'}),
dict(name='div',attrs={'id':'trackback'}),
dict(name='div',attrs={'id':'googleBanner'}),
dict(name='div',attrs={'id':'article_headlines'}),
]
extra_css = '''
body{font-family:Arial,Helvetica,sans-serif; font-size:small;}
h1{font-size:large;}
'''

feeds = [(u'Laatste Nieuws', u'http://volkskrant.nl/rss/laatstenieuws.rss'), (u'Binnenlands nieuws', u'http://volkskrant.nl/rss/nederland.rss'), (u'Buitenlands nieuws', u'http://volkskrant.nl/rss/internationaal.rss'), (u'Economisch nieuws', u'http://volkskrant.nl/rss/economie.rss'), (u'Sportnieuws', u'http://volkskrant.nl/rss/sport.rss'), (u'Kunstnieuws', u'http://volkskrant.nl/rss/kunst.rss'), (u'Wetenschapsnieuws', u'http://feeds.feedburner.com/DeVolkskrantWetenschap'), (u'Technologienieuws', u'http://feeds.feedburner.com/vkmedia')]

kiklop74
08-14-2009, 11:31 AM
Surely someone can help me with this? :blink:

:help:

Here goes:

geneven
08-14-2009, 01:57 PM
I'm newish to this, so please let me know if I've missed obvious information.

I am currently an Economist subscriber, have been for years. So I want to download the Economist info (each issue entire, if possible), for my new Kindle 2. I see a script that downloads some Economist info, and it works ok for me. How do I download the whole thing, as a legitimate subscriber?

kovidgoyal
08-14-2009, 01:58 PM
The Economist recipe should be downloading the entire magazine (economist.com decided a few months ago to make their entire magazine available for free online)

jament
08-14-2009, 04:30 PM
Calibre, of course, is amazing.

The Economist recipe is fantastic and the display on the Sony 505 is outstanding. Great work.

ssimon2000
08-14-2009, 08:30 PM
Here goes:

Thanks, Darko! Works perfectly! :thumbsup:

Greatly appreciated!!

cix3
08-19-2009, 05:23 PM
Hello all,

I would love to request a Caibre recipe for The New Republic: A Journal of Politics and the Arts. The website is: http://www.tnr.com/

Would a helpful fellow user be able to build this for the Caibre community?

bhandarisaurabh
08-21-2009, 10:38 PM
I require recipe for www.livemint.com.:help:

ddavtian
08-22-2009, 08:15 PM
I require recipe for www.livemint.com.:help:

You may request, not require.

bhandarisaurabh
08-22-2009, 11:32 PM
I AM REQUESTING A RECIPE FOR www.livemint.com.Please can anyone help

doreenjoy
08-23-2009, 12:37 AM
You may request, not require.

Well, maybe we can be a little understanding of folks who may not have English as a native language.

doreenjoy
08-23-2009, 05:11 AM
:help:

I've been working on my first recipe, a simple download of a blog through Feedburner.

I'd like to remove the blog photos and the footer at the bottom of each blog entry (buttons to Share the post, etc). Every customization I've tried makes the recipe fail completely.

I'm a complete noob and would appreciate any help.

GRiker
08-23-2009, 09:37 AM
This revised recipe removes most of the cruft trailing the blog entry. In order to remove portions of the content, you need to examine the source of the page to find the offending tags, then specify them for removal using 'remove_tags'.

G

GRiker
08-23-2009, 10:22 AM
And this version removes the <img> tags.

G

Gideon
08-23-2009, 02:48 PM
Of limited appeal, but I put this together today: The Tulsa World

Without the Sports section, I should mention.

doreenjoy
08-23-2009, 07:53 PM
And this version removes the <img> tags.

G

You are a fine, kind, upstanding mensche.

Grazie mille!

kiklop74
08-24-2009, 05:54 PM
New recipe for Livemint.com:

bhandarisaurabh
08-24-2009, 10:44 PM
New recipe for Livemint.com:
thanks :) for the recipe

bhandarisaurabh
08-24-2009, 11:04 PM
ERROR: Conversion Error: <b>Failed</b>: Fetch news from Livemint

Fetch news from Livemint
InputFormatPlugin: Recipe Input running Traceback (most recent call last):
File "worker.py", line 103, in <module>
File "worker.py", line 90, in main
File "calibre\gui2\convert\gui_conversion.pyo", line 17, in gui_convert
File "calibre\ebooks\conversion\plumber.pyo", line 656, in run
File "calibre\customize\conversion.pyo", line 217, in __call__
File "calibre\web\feeds\input.pyo", line 60, in convert
File "calibre\web\feeds\news.pyo", line 588, in download
File "calibre\web\feeds\news.pyo", line 741, in build_index
File "c:\docume~1\saurabh\locals~1\temp\calibre_0.6.8_vq dali_recipes\recipe0.py", line 37, in print_version
msoup = self.index_to_soup(link)
File "calibre\web\feeds\news.pyo", line 398, in index_to_soup
RuntimeError: Could not fetch index from http://www.livemint.com/2009/08/25003403/BSE-revamp-exercise-to-begin-i.html


this is the error calibre gives during fetching .

iggysprint
08-25-2009, 03:39 AM
New recipe for Livemint.com:

hi Darko,

Could you please help to custom the recipe for http://www.theedgesingapore.com/index.php

Sorry for troubling you. Not a very techful person:(

But I would be very appreciative of it .

Iggy

Gomes
08-25-2009, 11:45 AM
Any chance of philly.com?

GRiker
08-25-2009, 01:24 PM
Gomes,
There are RSS feeds in each section of philly.com. Follow the directions to create a custom feed (http://calibre.kovidgoyal.net/user_manual/news.html), then ask for assistance if you get stuck. It's actually pretty simple.

G

Gomes
08-25-2009, 01:37 PM
Thanks - I'll mess around with it tonight...

kiklop74
08-25-2009, 08:47 PM
this is the error calibre gives during fetching .

the recipe is fine. This is calibre gui error. if you execute recipe from command line like this :


ebook-convert livemint.recipe livemint.epub


everything works just fine.

I will report this in calibre trac.

kiklop74
08-25-2009, 09:13 PM
New recipe for The Edge Singapore:

iggysprint
08-26-2009, 11:55 AM
New recipe for The Edge Singapore:

Hey Darko,

Thanks a bunch.. much appreciated!

Iggy

indole
08-26-2009, 04:35 PM
I'm having trouble parsing my local newspaper feed, it's able to retrieve the articles but when I open the epub each article page is blank.

In the log it shows that once it gets to "parsing all content..." all the referenced files aren't found.
ex "Referenced file 'feed_0/article_23/PrintArticle.aspx%3fe%3d1715420' not found" and so on for each article. I've attached the log if that's more helpful.

Also this is what I have for the recipe:

class AdvancedUserRecipe1251250978(BasicNewsRecipe):
title = u'Intelligencer'
oldest_article = 7
no_stylesheets = True
max_articles_per_feed = 100

feeds = [(u'Recent Local News', u'http://www.intelligencer.ca/rss')]

def print_version(self, url):
turl = url.replace('ArticleDisplay', 'PrintArticle')
return turl

Any help would be appreciated, thanks!

edit: I was able to fix the problem by adding in the following:
keep_only_tags = [dict(id=['ctl00_ContentPlaceHolder1_FormView1'])]

kiklop74
08-27-2009, 08:27 PM
The problem is within calibre. For some unknown reason the span tag that contains entire article is emptied of content and content is moved below span tag on the same level.

This is the recipe that works well:

indole
08-28-2009, 12:03 AM
The problem is within calibre. For some unknown reason the span tag that contains entire article is emptied of content and content is moved below span tag on the same level.

This is the recipe that works well:

Thanks! I was having trouble with my recipe and it not displaying properly on my reader.

badkya
08-28-2009, 09:52 AM
Can I request a recipe for India Today?
http://indiatoday.intoday.in/

cometoluc
08-29-2009, 06:50 AM
Hello, can i please request a recipe for following two?

Het Laatste Nieuws
http://www.hln.be/hln/nl/1441/rss/integration/nmc/frameset/hln_footer/rssFeeds.dhtml

Netties
http://www.netties.be/v10/netties_rss.php

thanks in advance

TMF
08-30-2009, 04:47 AM
Hi, I'm having trouble with my recipe for the leading Swiss newspaper "Le Temps" (http://www.letemps.ch). The recipe is as follows:

class AdvancedUserRecipe1243078936(BasicNewsRecipe):
title = u'Le Temps'
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True
remove_tags = [dict(name='div', attrs={'id':'footer'})]
remove_tags = [dict(name='div', attrs={'class':'box links'})]
remove_tags = [dict(name='script')]
extra_css = '.heading {font-size: 13px; line-height: 15px; margin: 20px 0;} \n h2 {font-size: 24px; line-height: 25px; margin-bottom: 14px;} \n .author {font-size: 11px; margin: 0 0 5px 0;} \n .lead {font-weight: 700; margin: 10px 0;} \n p {margin: 0 0 10px 0;}'
feeds = [
('Actualité', 'http://www.letemps.ch/rss/site/'),
('Monde', 'http://www.letemps.ch/rss/site/actualite/monde'),
('Suisse & Régions', 'http://www.letemps.ch/rss/site/actualite/suisse_regions'),
('Sciences & Environnement', 'http://www.letemps.ch/rss/site/actualite/sciences_environnement'),
('Société', 'http://www.letemps.ch/rss/site/actualite/societe'),
('Economie & Finance', 'http://www.letemps.ch/rss/site/economie_finance'),
('Economie & Finance - Finance', 'http://www.letemps.ch/rss/site/economie_finance/finance'),
('Economie & Finance - Fonds de placement', 'http://www.letemps.ch/rss/site/economie_finance/fonds_placement'),
('Economie & Finance - Carrières', 'http://www.letemps.ch/rss/site/economie_finance/carrieres'),
('Culture', 'http://www.letemps.ch/rss/site/culture'),
('Culture - Cinéma', 'http://www.letemps.ch/rss/site/culture/cinema'),
('Culture - Musiques', 'http://www.letemps.ch/rss/site/culture/musiques'),
('Culture - Scènes', 'http://www.letemps.ch/rss/site/culture/scenes'),
('Culture - Arts plastiques', 'http://www.letemps.ch/rss/site/culture/arts_plastiques'),
('Livres', 'http://www.letemps.ch/rss/site/culture/livres'),
('Opinions', 'http://www.letemps.ch/rss/site/opinions'),
('Opinions - Editoriaux', 'http://www.letemps.ch/rss/site/opinions/editoriaux'),
('Opinions - Invités', 'http://www.letemps.ch/rss/site/opinions/invites'),
('Opinions - Chroniques', 'http://www.letemps.ch/rss/site/opinions/chroniques'),
('LifeStyle', 'http://www.letemps.ch/rss/site/lifestyle'),
('LifeStyle - Luxe', 'http://www.letemps.ch/rss/site/lifestyle/luxe'),
('LifeStyle - Horlogerie & Joaillerie', 'http://www.letemps.ch/rss/site/lifestyle/horlogerie_joaillerie'),
('LifeStyle - Design', 'http://www.letemps.ch/rss/site/lifestyle/design'),
('LifeStyle - Voyages', 'http://www.letemps.ch/rss/site/lifestyle/voyages'),
('LifeStyle - Gastronomie', 'http://www.letemps.ch/rss/site/lifestyle/gastronomie'),
('LifeStyle - Architecture & Immobilier', 'http://www.letemps.ch/rss/site/lifestyle/architecture_immobilier'),
('LifeStyle - Automobile', 'http://www.letemps.ch/rss/site/lifestyle/automobile'),
('Sports', 'http://www.letemps.ch/rss/site/actualite/sports'),
]

def print_version(self, url):
return url.replace('Page', 'Facet/print')

If I try to download it, I get this error message:

ERROR: Conversion Error: <b>Failed</b>: Fetch news from Le Temps

Fetch news from Le Temps
InputFormatPlugin: Recipe Input running Traceback (most recent call last):
File "worker.py", line 103, in <module>
File "worker.py", line 90, in main
File "calibre\gui2\convert\gui_conversion.pyo", line 19, in gui_convert
File "calibre\ebooks\conversion\plumber.pyo", line 717, in run
File "calibre\customize\conversion.pyo", line 208, in __call__
File "calibre\web\feeds\input.pyo", line 57, in convert
ValueError: u'C:\\Program Files\\calibre0.6\\Le Temps.recipe' is not a valid recipe file or builtin recipe


What am I doing wrong?

I submitted a ticket about it, http://calibre.kovidgoyal.net/ticket/2683, but unfortunately I do not understand the reply.

kiklop74
08-30-2009, 08:02 AM
In your recipe you have extra_css like this:


extra_css = ' <css stuff> \n <css stuff> '


Note the bolded end of line character. What Kovid is telling you is that you can not have end of line character in extra_css. Remove all of those and you will be fine.

TMF
08-30-2009, 05:52 PM
Kiklop74, thanks! But although I've now removed these newlines from the recipe, producing this line:

extra_css = '.heading {font-size: 13px; line-height: 15px; margin: 20px 0;} h2 {font-size: 24px; line-height: 25px; margin-bottom: 14px;} .author {font-size: 11px; margin: 0 0 5px 0;} .lead {font-weight: 700; margin: 10px 0;} p {margin: 0 0 10px 0;}'

I still get the same error message.

kovidgoyal
08-30-2009, 06:55 PM
Add the following to the top of the recipe

from calibre.web.feeds.news import BasicNewsRecipe

TMF
08-31-2009, 05:38 PM
Kovid, sorry; I've added this as the first line of the recipe, but it still produces the same error message.

twiz
08-31-2009, 05:54 PM
Hello!

I appreciate any help with this: I have a fairly simple blog I'd like to follow: The Old Foodie. http://www.theoldfoodie.com. A nice blog exploring world culinary history. I've been using calibre since 0.5.x without any major issues on Sabayon Gentoo with a Kindle 2.

I've tried using the blog's feedburner, atom, and xml feeds, and all error out after a few seconds with the errors I've pasted at the bottom. After erroring out it will refuse to try downloading the feed again without restarting calibre, but will run other recipes without issue. I've tested the built-in recipes, and they work fine and create readable MOBI files. I've tried clearing out temp and settings folders, and running calibre as a superuser to rule out file-write glitches.

Here is the output:

Fetch news from Old Foodie
InputFormatPlugin: Recipe Input running Downloading
Fetching file:///tmp/calibre_0.6.10_4aThWM_feeds2disk.html
Downloading
Fetching file:///tmp/calibre_0.6.10_78agJi_feeds2disk.html
Downloading
Fetching file:///tmp/calibre_0.6.10_ZJ4Mpu_feeds2disk.html
Downloading
Fetching file:///tmp/calibre_0.6.10_BjA_xM_feeds2disk.html
Downloading
Fetching file:///tmp/calibre_0.6.10_DCe6Q9_feeds2disk.html
WARNING: Encoding detection confidence 99%
Processing images...
Fetching https://blogger.googleusercontent.com/tracker/24170237-2580731752951078034?l=www.theoldfoodie.com
WARNING: Encoding detection confidence 99%
Processing images...
Fetching https://blogger.googleusercontent.com/tracker/24170237-101083291130190126?l=www.theoldfoodie.com
WARNING: Encoding detection confidence 99%
Processing images...
WARNING: Encoding detection confidence 99%
Fetching https://blogger.googleusercontent.com/tracker/24170237-3581153949504428956?l=www.theoldfoodie.com
WARNING: Encoding detection confidence 99%
Processing images...
Fetching https://blogger.googleusercontent.com/tracker/24170237-177527187621613644?l=www.theoldfoodie.com
Processing images...
Fetching https://blogger.googleusercontent.com/tracker/24170237-8741733087311750796?l=www.theoldfoodie.com
Recursion limit reached. Skipping links in file:///tmp/calibre_0.6.10_ZJ4Mpu_feeds2disk.html
file:///tmp/calibre_0.6.10_ZJ4Mpu_feeds2disk.html saved to /tmp/calibre_0.6.10_GHwNDb_plumber/feed_0/article_3/calibre_0.6.10_ZJ4Mpu_feeds2disk.xhtml
Downloaded article: An Airship Luncheon. from http://www.theoldfoodie.com/2009/08/airship-luncheon.html
Downloading
/opt/calibre/calibre-parallel: line 7: 10113 Segmentation fault $loader "$@"


Out of desperation I've even tried downloading directly from the home page (www.oldfoodie.com). This job produces a MOBI loaded with a large, unbroken paragraph with all html flags visible.

If anyone sees an obvious misstep I've made, or knows a quick and dirty way to get this feed going, it would be much appreciated!

kiklop74
08-31-2009, 06:00 PM
New recipe for HLN:

varin44
08-31-2009, 06:09 PM
this was a duplicate post. my original post (under twiz) has been removed from the spam bins and restored. sorry about that!

danielc
09-01-2009, 05:03 AM
Hi all i am new here and i am from HK, i know everyone is using English here , i love the program of Calibre but after a few try , i still have no idea how to make one Chinese newspaper recipes.. i know its hard for you guys, but do anyone can know Chinese and try for me ? will be very appreciate ... cheer ... ^^

danielc
09-01-2009, 05:05 AM
here is the link : http://hk.apple.nextmedia.com/template/apple/sec_main.php?iss_id=20090901&sec_id=4104

kiklop74
09-01-2009, 10:18 AM
The Old Foodie blog:

kovidgoyal
09-01-2009, 12:20 PM
@TMF: Your recipe works for me
@twiz: That's a bug with your calibre install, not your recipe. Does conversion of ebooks work in your calibre?

twiz
09-01-2009, 06:18 PM
I convert to MOBI and occasionally to EPUB without issue. I've converted at least 100 books with no problems.

I have seen variations of segmentation faults with the loader before. Mostly in the last two versions before 0.6.8. My calibre would randomly close while idle, and I'd see a seg fault waiting for me at the command line. It didn't seem to be a result of any regular functions of calibre (excluding news feeds; this is the first time I've tried them out). But with version 0.6.8, whatever was causing the random closes stopped. Also, coming out of the last 0.5.x version, it stopped recognizing my Kindle 2 upon hotplug. Mildly disturbing, but I just switched to Save to Disk from calibre directly to my mounted drive. It's only a small hassle for me to save books to my Kindle 2 that way instead.

I can't figure out why a custom recipe for Old Foodie would trigger a seg fault.

Thanks for the recipe for Old Foodie! I'll experiment with other custom recipes and see if they seg fault out also. That might narrow down the issue. Between upgrades I do wipe the calibre personal settings folder to try to prevent corruptions from carrying over.

bhandarisaurabh
09-02-2009, 11:00 PM
I would love to request a recipe for business standard ,its a business newspaper
http://www.business-standard.com/india/,thanks in advance

A4-
09-03-2009, 11:28 PM
I'm working on a receipt for Tweakers.net (dutch tech news) and I've got it working, but lots of images wont show up on the Ebook Viewer.
Here (http://tweakers.net/nieuws/62232/iriver-komt-met-kindle-2-look-a-like.html) is a random newspost with those images which wont show up.
For now I'm only interested in the thumbnails and not the full size ones.
Any help is apreciated :2thumbsup

Here's my receipe:
class AdvancedUserRecipe1252025187(BasicNewsRecipe):
title = u'Tweakers.net'
oldest_article = 4
max_articles_per_feed = 40
no_stylesheets = True
use_embedded_content = False

keep_only_tags = [dict(name='div', attrs={'class':'columnwrapper news'})]

remove_tags = [dict(name='div', attrs={'class':'reacties'}),
{'id' : ['utracker']},
{'class' : ['sidebar']}
]

feeds = [(u'Tweakers.net', u'http://tweakers.net/feeds/nieuws.xml')]

macsilber
09-07-2009, 06:43 AM
Hi I would appreciate hep on http://inc.com and http://www.entrepreneur.com/

Thank you!

GRiker
09-07-2009, 09:25 AM
macsilber,

RSS feeds are your friend. Here are the RSS feeds for Entrepreneur (http://www.entrepreneur.com/feeds/index.html), and here is the RSS feed for Inc (http://www.inc.com/about/rss.html).

Follow the instructions for creating a custom recipe (http://calibre.kovidgoyal.net/user_manual/news.html), and if you get stuck, ask for help.

G

macsilber
09-07-2009, 10:33 AM
Sorry did not explain it very well - I tried the rss feeds and they work kind of ok. I tried doing some programming but to no avail in cleaning it up properly.

For example to convert

http://www.entrepreneur.com/marketing/guerillamarketing/article203248.html

to

http://www.entrepreneur.com/article/printthis/203248.html

also on another one
http://goal.com/en/feeds/news?id=1659&fmt=rss

The print option creates a pdf so not sure how to handle for example
http://www.goal.com/en/news/1863/world-cup-2010/2009/09/07/1486062/international-friendly-preview-republic-of-ireland-south

prints to

http://www.goal.com/en/news/1863/world-cup-2010/2009/09/07/1486062/international-friendly-preview-republic-of-ireland-south#p

which is actually a pdf file.

dmendozadmd
09-07-2009, 11:33 AM
Since there have been a lot of custom recipe requests of late, I'm starting a sticky where they can be aggregated. Post requests for custom recipes here. Once you have a custom recipe that works well for you (please test both the LRF and EPUB versions), let me know and I'll include it into calibre so others can benefit from it as well.
Please explain to a newbie: what's a sticky? what's a recipe as used in your message?

GRiker
09-07-2009, 01:39 PM
macsilber: It would be more helpful if you could post the recipe you're using.

dmendozadmd: a 'sticky' is a popular topic that stays in the upper list of topics in the forum, so they're easier to find. A recipe is a script that calibre uses to download the contents of a particular website, then format it for your eReader.

G

Gomes
09-07-2009, 03:50 PM
Gomes,
There are RSS feeds in each section of philly.com. Follow the directions to create a custom feed (http://calibre.kovidgoyal.net/user_manual/news.html), then ask for assistance if you get stuck. It's actually pretty simple.

G

I've been trying to get a clean copy for a couple of weeks with no success. Essentially, I am unable to get the print version of the stories. I've tried to go through the directions cited above, but that doesn't seem to help...What I end up with is the article with all the various menus, pictures, and comments, which makes it difficult to read at best, and takes forever for calibre to fetch and convert. Can anyone help?

And yes, I realize I'm just probably missing something obvious...:smack:

cix3
09-07-2009, 04:23 PM
In a custom recipe, how do I remove multiple div classes?

For example, from this source page (http://www.tnr.com/print/article/politics/rocking-roberts), I want to remove these div classes: print-logo, print-site_name, img-left, and print-source_url.

Probably a simple syntax question, but I'm new to Python. I have tried...


remove_tags = [dict(name='div', attrs={'class':'print-logo'})]
remove_tags = [dict(name='div', attrs={'class':'print-site_name'})]
remove_tags = [dict(name='div', attrs={'class':'img-left'})]
remove_tags = [dict(name='div', attrs={'class':'print-source_url'})]


... which only removes the last div class listed (in this case, print-source_url).

This gives me a syntax error:

remove_tags = [dict(name='div', attrs={'class':'print-logo', 'print-site_name', 'img-left', 'print-source_url'})]


What is the correct syntax?

Thanks

kovidgoyal
09-07-2009, 04:27 PM
remove_tags = [dict(name='div', attrs={'class':['print-logo', 'print-site_name', ..]}]

cix3
09-07-2009, 04:38 PM
Thanks... I knew it must have been something simple like that.

Your snippet as written gave me a syntax error, but adding a ) as the second to last character fixed it.

GRiker
09-07-2009, 06:12 PM
gomes: Post your recipe. You will probably need to use remove_tags as cix3 has learned to get rid of the stuff you don't want.

Basically, this involves going to a sample page, examining the HTML source, isolating the stuff you don't want, then specifying a remove_tags directive as Kovid has described in his post above this one.

If you post your recipe, folks here are better able to help you refine it.

G

cix3
09-07-2009, 06:18 PM
Hello,

Here's my first stab at a recipe for The New Republic (www.tnr.com). It aggregates all articles and blogs, minus the images. Enjoy!


class The_New_Republic(BasicNewsRecipe):
title = 'The New Republic'
__author__ = 'cix3'
description = 'Intelligent, stimulating and rigorous examination of American politics, foreign policy and culture'
timefmt = ' [%b %d, %Y]'

oldest_article = 7
max_articles_per_feed = 100

remove_tags = [dict(name='div', attrs={'class':['print-logo', 'print-site_name', 'img-left', 'print-source_url']}), dict(name='hr', attrs={'class':'print-hr'}), dict(name='img')]

feeds = [
('Politics', 'http://www.tnr.com/rss/articles/Politics'),
('Books and Arts', 'http://www.tnr.com/rss/articles/Books-and-Arts'),
('Economy', 'http://www.tnr.com/rss/articles/Economy'),
('Environment and Energy', 'http://www.tnr.com/rss/articles/Environment-%2526-Energy'),
('Health Care', 'http://www.tnr.com/rss/articles/Health-Care'),
('Urban Policy', 'http://www.tnr.com/rss/articles/Urban-Policy'),
('World', 'http://www.tnr.com/rss/articles/World'),
('Film', 'http://www.tnr.com/rss/articles/Film'),
('Books', 'http://www.tnr.com/rss/articles/books'),
('The Plank', 'http://www.tnr.com/rss/blogs/The-Plank'),
('The Treatment', 'http://www.tnr.com/rss/blogs/The-Treatment'),
('The Spine', 'http://www.tnr.com/rss/blogs/The-Spine'),
('The Stash', 'http://www.tnr.com/rss/blogs/The-Stash'),
('The Vine', 'http://www.tnr.com/rss/blogs/The-Vine'),
('The Avenue', 'http://www.tnr.com/rss/blogs/The-Avenue'),
('William Galston', 'http://www.tnr.com/rss/blogs/William-Galston'),
('Simon Johnson', 'http://www.tnr.com/rss/blogs/Simon-Johnson'),
('Ed Kilgore', 'http://www.tnr.com/rss/blogs/Ed-Kilgore'),
('Damon Linker', 'http://www.tnr.com/rss/blogs/Damon-Linker'),
('John McWhorter', 'http://www.tnr.com/rss/blogs/John-McWhorter')
]

def print_version(self, url):
return url.replace('http://www.tnr.com/', 'http://www.tnr.com/print/')

bhandarisaurabh
09-09-2009, 10:20 PM
can anyone help me with recipe of business standard
if the url for the article is
http://www.business-standard.com/india/storypage.php?autono=369650
then print url is
http://www.business-standard.com/india/printpage.php?autono=369650&tp=

cutterjohn42
09-10-2009, 10:38 AM
It seems that the most recent version of the /. recipe in Calibre may have caused an auto-ban to be triggered for my IP address.

I noticed the last time that it seemed to be downloading more of the site than before, i.e. I had the article + comments, and I think that the way the site is setup that it leads to recursively downloading most of the site unless strictly limited. I used to have that problem with sitescooper and plucker and have to be very careful about limiting how much of /. was spidered to create a document for offline reading.

(This would be the version included with 0.6.11 .)

kovidgoyal
09-10-2009, 11:44 AM
Open a ticket about it, I'll look at it when I have a spare moment.

cix3
09-10-2009, 09:32 PM
Any idea how I can transform an article URL like this (http://www.motherjones.com/politics/2009/09/big-businesss-hidden-hand-smear-job-van-jones) into the print URL (http://www.motherjones.com/print/27151) that I want to use for my recipe?

I'm hoping there's an easy way to find corresponding print URLs (by that 5 digit number) for articles. Rather than removing all unwanted html from the actual article...

Any ideas?

Edit: I should also note that the original article page actually splits the article into multiple pages (which I would want to combine into one article for my recipe). The print version lists the entire article.

kovidgoyal
09-10-2009, 09:43 PM
Just fetch the HTML and parse it looking for the print link

cix3
09-10-2009, 09:49 PM
Just fetch the HTML and parse it looking for the print link

Can you give me an example of a built-in recipe that does this?

kovidgoyal
09-10-2009, 10:21 PM
Cant think of one off hand but basically, it's something like this


def get_article_url(self, article):
url = ...(from article as before)
soup = self.index_to_soup(url)
# do some processing on soup to find the full article link
a = soup.find(name='a', href=True, text=re.compile(r'Full\s*Article')
if a is not None:
return a['href']
return url


Stick a few print statements in there to debug things

cix3
09-11-2009, 12:28 AM
Cant think of one off hand but basically, it's something like this


def get_article_url(self, article):
url = ...(from article as before)
soup = self.index_to_soup(url)
# do some processing on soup to find the full article link
a = soup.find(name='a', href=True, text=re.compile(r'Full\s*Article')
if a is not None:
return a['href']
return url


Stick a few print statements in there to debug things


Hmmm... that's beyond my level of expertise. I'm going to have to wait for someone else to recommend a pre-built recipe that I can copy from.

Thanks!

jeremynpross
09-11-2009, 03:50 PM
Dear Recipe Creators,
I am working with the publisher of "The Kingdom Experiment" to convert it from PDF to Kindle and other eBook formats.

Here's the book:
http://www.amazon.com/Kingdom-Experiment-Bruce-Nuffer/dp/0834124742/ref=sr_1_1?ie=UTF8&qid=1252693916&sr=1-1#

The book is highly designed, with supergraphics type text, lots of white space, and a few images. We want to preserve the look of each page.

You can see the page layout via Amazon's "look Inside" here:

http://www.amazon.com/Kingdom-Experiment-Bruce-Nuffer/dp/0834124742/ref=sr_1_1?ie=UTF8&qid=1252693916&sr=1-1#reader

I've tried Calibre conversion from the PDF and from a zip archive of 300 dpi JPG exports renamed as CBZ. I tried output as ePub and as Mobi.

None work perfectly to preserve page layout, stretching and misplacing text. Using the free mobi tool to convert preserves the look but makes the text way too blurry.

Do you have any suggestions as to how I can do this conversion via the GUI in Calibre?

Forgive my ignorance, but I don't know the first thing about using a command line, right down to where the commands are entered, what syntax to use and where there are spaces.

I'm on deadline to get this figured out and will appreciate any help you can give me.

Thanks,
Jeremy Ross
jeremyr@earthlink.net

radikaldissent
09-12-2009, 02:04 PM
I've read through this thread and I've heard someone request a philly.com recipe. I'm not sure if someone got around to it but I created a recipe for the Philadelphia Inquirer (http://www.philly.com/inquirer/). It seems to work okay and I've attached it to this post. I'm always open to suggestions or advice to improve it.

radikaldissent
09-12-2009, 02:30 PM
I've been working on a recipe for mother jones, it's not totally functional though. I've got it to point to the print url but on some articles the content is cut off. If you run it, the first article works but the second one titled "The Melting Climate Change Deadline" at http://www.motherjones.com/print/27210 is cut off. Only the first paragraph is displayed. This problem also shows up for other articles in the feed. I'd appreciate any help to fix this.

cix3
09-12-2009, 11:48 PM
Salon, the award-winning online news and entertainment Web site, combines original investigative stories, breaking news, provocative personal essays and highly respected criticism along with popular staff-written blogs about politics, technology and culture.

http://www.salon.com


class Salon_com(BasicNewsRecipe):
title = 'Salon.com'
__author__ = 'cix3'
description = 'Salon.com - Breaking news, opinion, politics, entertainment, sports and culture.'
timefmt = ' [%b %d, %Y]'

oldest_article = 7
max_articles_per_feed = 100

remove_tags = [dict(name='div', attrs={'class':['ad_content', 'clearfix']}), dict(name='hr'), dict(name='img')]

remove_tags_before = dict(name='h2')

feeds = [
('All News & Politics', 'http://feeds.salon.com/salon/news'),
('War Room', 'http://feeds.salon.com/salon/war_room'),
('All Arts & Entertainment', 'http://feeds.salon.com/salon/ent'),
('I Like to Watch', 'http://feeds.salon.com/salon/iltw'),
('Book Reviews', 'http://feeds.salon.com/salon/books'),
('All Life stories', 'http://feeds.salon.com/salon/mwt'),
('Broadsheet', 'http://feeds.salon.com/salon/broadsheet'),
('All Opinion', 'http://feeds.salon.com/salon/opinion'),
('All Sports', 'http://feeds.salon.com/salon/sports'),
('All Tech & Business', 'http://feeds.salon.com/salon/tech'),
('Ask the Pilot', 'http://feeds.salon.com/salon/ask_the_pilot'),
('How the World Works', 'http://feeds.salon.com/salon/htww')
]

def print_version(self, url):
return url.replace('/index.html', '/print.html')


Enjoy!

olaf
09-14-2009, 01:40 AM
I am a total newbie to this and can't figure out my local newspaper RSS feeds. I tried some others, like BBC and they worked ok, but the feeds listed on this page don't come up with any results. Do I have to write special code to access these particular feeds?

http://www.telegram.com/apps/pbcs.dll/section?Category=rss_main

danielc
09-14-2009, 06:20 AM
Hi , i wanna to a help too ...
i go Jamie website and i wanna to make the RSS for the daily recipe of J.Oliver... can give me a help ? the picture cant showing out .. it will be great and i can bring it and make different dishes everyday by him .. ha ha THanks

http://rss.feedsportal.com/c/32402/f/467087/index.rss

GRiker
09-14-2009, 01:06 PM
olaf - I followed the instructions and was able to download a readable paper. Did you copy the links to the individual feeds, as opposed to using the main page? Here's the recipe I ended up with for 3 sections, 5 articles per section:

class AdvancedUserRecipe1252944207(BasicNewsRecipe):
title = u'Telegram - Worcester MA'
oldest_article = 7
max_articles_per_feed = 5

feeds = [(u'Front Page News', u'http://www.telegram.com/apps/pbcs.dll/section?Category=RSS03&MIME=xml'), (u'Local News', u'http://www.telegram.com/apps/pbcs.dll/section?Category=rss01&MIME=xml&profile=1101'), (u'Business', u'http://www.telegram.com/apps/pbcs.dll/section?Category=rss01&MIME=xml&profile=1002')]

GRiker
09-14-2009, 01:15 PM
danielc:

The problem is that calibre doesn't recognize 'feed://' as a valid url source. I replaced it with 'http://' and the recipe correctly downloads content from the site.

class AdvancedUserRecipe1252944534(BasicNewsRecipe):
title = u"Jamie's Recipes"
oldest_article = 7
max_articles_per_feed = 5

feeds = [(u"Jamie's Daily Recipe", u'http://rss.feedsportal.com/c/32402/f/467087/index.rss')]


Kovid, is it possible to recognize 'feed://' as a valid url prefix?

G

kovidgoyal
09-14-2009, 04:36 PM
Kovid, is it possible to recognize 'feed://' as a valid url prefix?

G

I can basically have it replace, feed:// with http://

ccowie
09-14-2009, 06:35 PM
Is it possible to get a recipe for the Globe and Mail?

kovidgoyal
09-14-2009, 07:23 PM
There already is one

olaf
09-14-2009, 09:33 PM
olaf - I followed the instructions and was able to download a readable paper. Did you copy the links to the individual feeds, as opposed to using the main page? Here's the recipe I ended up with for 3 sections, 5 articles per section:

class AdvancedUserRecipe1252944207(BasicNewsRecipe):
title = u'Telegram - Worcester MA'
oldest_article = 7
max_articles_per_feed = 5

feeds = [(u'Front Page News', u'http://www.telegram.com/apps/pbcs.dll/section?Category=RSS03&MIME=xml'), (u'Local News', u'http://www.telegram.com/apps/pbcs.dll/section?Category=rss01&MIME=xml&profile=1101'), (u'Business', u'http://www.telegram.com/apps/pbcs.dll/section?Category=rss01&MIME=xml&profile=1002')]
GRiker - thanks for the help. Something must be off with my program configuration - I had essentially the same recipe and was getting address failures. Copied your recipe and the same thing. I will re-install things to see if that makes a difference. Thanks again for looking at that for me.

Re-installed and used your recipe, but get an error on each of the three feeds - (11001, 'getaddrinfo failed'). Not sure what the difference is and why it works for you and not me.

danielc
09-15-2009, 02:51 AM
THanks GRiker,

THe problems is now it just show the content and after click into the pages, it cant show the picture and only get the words out.

Can give me a lesson on it too ? so thanks ...

Daniel


danielc:

The problem is that calibre doesn't recognize 'feed://' as a valid url source. I replaced it with 'http://' and the recipe correctly downloads content from the site.

class AdvancedUserRecipe1252944534(BasicNewsRecipe):
title = u"Jamie's Recipes"
oldest_article = 7
max_articles_per_feed = 5

feeds = [(u"Jamie's Daily Recipe", u'http://rss.feedsportal.com/c/32402/f/467087/index.rss')]


Kovid, is it possible to recognize 'feed://' as a valid url prefix?

G

ccowie
09-15-2009, 03:07 PM
I'm not sure if this is possible, but the more I read on these forums the more it seems everything is possible.
I've purchased a few issues of Orson Scott Card's Intergalactic Medicine Show electronic magazine. Is there any way to get a recipe that can deliver the issues I've purchased?

Andreiko
09-15-2009, 05:19 PM
Firts of all thank you guys for great job you have been doing for those dumbs like me who cannot create a normal and functional RSS recipe. :thumbsup:

I was checking and trying to create my own, advanced one, with downloadable articles (printed versions) so i can deploy it on my Kindle DX, but... its above my knowlege. Actualy i was trying to create the ricipe from the examples, but couldnt manage :bookworm:

I realy kindly ask you guys, plz, create one from http://www.inosmi.ru
with full articles, from this rss:http://www.inosmi.ru/misc/export/xml/rss/translation.xml

Its a russian news site, which actually provides translated (to russian) articles from all over the world. It would be really col to have this one. Also, i found that we dont have any pre-installed russian-language (and ukrainian) RSS feeds in calibre. I think i shall ask you also guys to make some 10-15 RSS feeds there. But firts i have to made up my mind about what are those realy reliable and major news and media sites.

But again, thank you guy for great job. :thanks:

rainbowworrier
09-17-2009, 06:00 AM
I would like to second Andreiko's vote of thanks! I'm a newbie and have only just learned what an RSS feed is, but I'm thrilled to be able to load them onto my new ereader!

I would also like some Russian language news feeds please, the general news site that Andreiko recommends looks good. I would also like a science news site. Maybe http://www.gazeta.ru/science/ or http://www.nkj.ru/?

Once again many thanks!

highwaykind
09-17-2009, 12:48 PM
Could someone help me figure out how to get the Smashing Magazine feed into Calibre?
http://www.smashingmagazine.com/wp-rss.php

I'm not that good with codes and don't know any python so the explanation in the Manual is a bit too complicated.
As long as the feed has the pictures and the text I don't much care what it looks like..

Thanks!

strico
09-17-2009, 02:37 PM
I know there's already a custom feed for The Guardian.
However, it will be nice if I could have the print edition of the same newspaper:

I believe it could be done by slightly modifying the current Guardian recipe
The print edition fetching is perfectly done for The Economist

The URL:

http://www.guardian.co.uk/theguardian

kovidgoyal
09-17-2009, 04:02 PM
@strico: Open a ticket for that, or it will get lost

strico
09-17-2009, 04:26 PM
@strico: Open a ticket for that, or it will get lost.

I will do that, thanks.

L4ur3nt
09-18-2009, 04:51 PM
Hello all,

I have a problem with this flux rss :

http://rss.futura-sciences.com/packfs

I got some stranges thinks in the text like this

.');" onmouseout="killlink()">

Does somebody know why? Thank you very much!

dazzla
09-18-2009, 10:05 PM
On the xkcd feed it doesn't display quite right on the PRS-300. The comics seem to be cut off on the right hand side (about 5-10 pixels). Just enough to miss a few letters on wide comics.

Anyone else have this problem? Anyone know of a way to customise the conversion to narrow it up a bit?

olaf
09-20-2009, 10:29 AM
A question - maybe this is in the manual, but I didn't find it, or understand it if it were mentioned. I am trying to get a local newspaper version setup from RSS feeds, but I get a wealth of stuff I don't want included in the final output. Is there a way for me to output the pre-formatted html to a file, so I can actually see what HTML tags I need to get rid of (filter out with the remove tags option, or other logic)?

GRiker
09-20-2009, 10:38 AM
Easiest way is to navigate to the subject page in your browser, then use View Source to see the HTML. The 'View Source' menu option will differ based upon your browser, but all the major browsers offer it. Firefox does a nice job of formatting the output.

G

kovidgoyal
09-20-2009, 12:36 PM
I find the Firebug Firefox extension to be very useful for figuring out what content to exclude from the page.

kiklop74
09-22-2009, 12:01 PM
Could someone help me figure out how to get the Smashing Magazine feed into Calibre?
http://www.smashingmagazine.com/wp-rss.php

I'm not that good with codes and don't know any python so the explanation in the Manual is a bit too complicated.
As long as the feed has the pictures and the text I don't much care what it looks like..

Thanks!

Here goes:

kiklop74
09-22-2009, 02:58 PM
The Toronto Star:

MichaelMSeattle
09-22-2009, 10:20 PM
Hi all. Great app, great forum!

I've struggled for some time trying to get a simple recipe for the New York Times Magazine. (This is a seperate link from the New York Times subscription feed.)
I've Googled and studied these pages and experimented, but no luck.

Here's what works:

class NYTimesMagazine(BasicNewsRecipe):
title = u'The New York Times Magazine'
__author__ = 'calibre'
language = 'en'

description = 'New York Times Magazine'
timefmt = ''
oldest_article = 7
max_articles_per_feed = 300
use_embedded_content = False
no_stylesheets = True
encoding = 'utf-8'

feeds = [(u'Magazine', u'http://feeds.nytimes.com/nyt/rss/Magazine'),
(u'The Ethicist', u'http://ethicist.blogs.nytimes.com/feed/'),
(u'Motherload', u'http://parenting.blogs.nytimes.com/feed/'),
(u'Medium', u'http://themedium.blogs.nytimes.com/feed/')]

-----------------------------------

The problem is the results only return one page of each article. I've tried adding "recursions = 2" but that just slows the process down exponentially. I know the answer is to use the print page but I can't figure out how to do this.

I have noticed that the difference between the NYTMag articles and print articles is an additional parameter to the url:

normal:
http://www.nytimes.com/2009/08/30/magazine/30FOB-medium-t.html
print:
http://www.nytimes.com/2009/08/30/magazine/30FOB-medium-t.html?pagewanted=print

Can someone please assist with this?
Thanks big time!
-Mike

MichaelMSeattle
09-22-2009, 10:25 PM
oops - the links didn't show as text just as links. Here's the differences I meant:

normal:
"h t t p://w w w.nytimes.com/2009/08/30/magazine/30FOB-medium-t.html"
print:
"h t t p://w w w.nytimes.com/2009/08/30/magazine/30FOB-medium-t.html?pagewanted=print"

Hi all. Great app, great forum!

I've struggled for some time trying to get a simple recipe for the New York Times Magazine. (This is a seperate link from the New York Times subscription feed.)
I've Googled and studied these pages and experimented, but no luck.

Here's what works:

class NYTimesMagazine(BasicNewsRecipe):
title = u'The New York Times Magazine'
__author__ = 'calibre'
language = 'en'

description = 'New York Times Magazine'
timefmt = ''
oldest_article = 7
max_articles_per_feed = 300
use_embedded_content = False
no_stylesheets = True
encoding = 'utf-8'

feeds = [(u'Magazine', u'http://feeds.nytimes.com/nyt/rss/Magazine'),
(u'The Ethicist', u'http://ethicist.blogs.nytimes.com/feed/'),
(u'Motherload', u'http://parenting.blogs.nytimes.com/feed/'),
(u'Medium', u'http://themedium.blogs.nytimes.com/feed/')]

-----------------------------------

The problem is the results only return one page of each article. I've tried adding "recursions = 2" but that just slows the process down exponentially. I know the answer is to use the print page but I can't figure out how to do this.

I have noticed that the difference between the NYTMag articles and print articles is an additional parameter to the url:

normal:
http://www.nytimes.com/2009/08/30/magazine/30FOB-medium-t.html
print:
http://www.nytimes.com/2009/08/30/magazine/30FOB-medium-t.html?pagewanted=print

Can someone please assist with this?
Thanks big time!
-Mike

Gomes
09-22-2009, 11:02 PM
I've read through this thread and I've heard someone request a philly.com recipe. I'm not sure if someone got around to it but I created a recipe for the Philadelphia Inquirer (http://www.philly.com/inquirer/). It seems to work okay and I've attached it to this post. I'm always open to suggestions or advice to improve it.

That was me. Thanks so much!