Custom recipes (archive, read-only) - Page 157

Starson17 · 07-22-2010, 06:01 PM

Quote:

Originally Posted by mohmedic

thank you starson17. this works fine and i have more to fix. i will post with more questions i am sure

You're welcome. I happened to have just finished fixing the BBC recipe, and saw your post, so I just pasted your recipe into my test recipe to see what was wrong with it.

Let us know if/when you have more questions.

caman · 07-24-2010, 03:28 AM

Hello, can anyone help me with this one?

http://sanduan.org/home/type.asp?iCa...&nChannel=News

Thank you very much

significance · 07-24-2010, 04:59 AM

I have put together a basic recipe to download new articles (or abstracts, if you aren't logged in) from Science Direct. Could someone help improve it? Currently, it does not bold or otherwise highlight the article titles, there seems to be a left indent that I'd prefer to get rid of, and it is downloading the versions of articles with small, grainy images instead of full-sized images. (To get larger images, I need to append "&artImgPref=F" to the URL, but my attempt below doesn't work).

Quote:

Originally Posted by code

class AdvancedUserRecipe1279948676(BasicNewsRecipe):
title = u'Science Direct'
__author__ = u'Barbara Robson'
description = u'New journal articles from my favourite journals on Science Direct. Edit to choose your own favourites. Full text if you have an institutional login; abstracts otherwise.'
oldest_article = 8
max_articles_per_feed = 40
no_stylesheets = True

feeds = [(u'Environmental Modelling and Software', u'http://rss.sciencedirect.com/publication/science/6063'),
(u'Ecological Modelling',u'http://rss.sciencedirect.com/publication/science/5934'),
(u'Estuarine, Coastal and Shelf Science',u'http://rss.sciencedirect.com/publication/science/6776'),
(u'Water Research',u'http://rss.sciencedirect.com/publication/science/5831')]

def full_images(self, url):
return url.append("&artImgPref=F")

remove_tags_before = dict(id='articleContent')
remove_tags_after = [dict(attrs={'class':'SDTxtSmallBold'})]
remove_tags = [dict(attrs={'class':'SDTxtSmallBold'})]
remove_attributes = ['width','height']

Thanks for any help!

significance · 07-24-2010, 10:36 PM

Following up my own message, I now have the article titles highlighted appropriately, though I would still appreciate help in getting the versions of articles with full-sized images and getting rid of the left margin if possible. My code at this point:

Code:

import re
from calibre.web.feeds.news import BasicNewsRecipe

class ScienceDirect(BasicNewsRecipe):
    title          = u'Science Direct'
    __author__ = u'Barbara Robson'
    description = u'New journal articles from my favourite journals on Science Direct. Edit to choose your own favourites. Full text if you have an institutional login; abstracts otherwise.'
    oldest_article = 10
    max_articles_per_feed = 40
    no_stylesheets = True
    cover_url = 'http://rss.sciencedirect.com/images/logo_scid.gif'

    feeds          = [(u'Environmental Modelling and Software', u'http://rss.sciencedirect.com/publication/science/6063'),
                          (u'Ecological Modelling',u'http://rss.sciencedirect.com/publication/science/5934'),
                          (u'Estuarine, Coastal and Shelf Science',u'http://rss.sciencedirect.com/publication/science/6776'),
                          (u'Water Research',u'http://rss.sciencedirect.com/publication/science/5831')]
    
    def full_images(self, url):
          return url.append("&artImgPref=F")

    remove_tags_before = dict(id='articleContent')
    # highlight article title
    preprocess_regexps = [
        (re.compile(r'(<div.class="articleTitle">)([^<]+)(<)'),
         lambda m: '%s<h2 class="h2">%s</h2>%s' % (m.group(1), m.group(2), m.group(3)))
    ]
    
    remove_tags_after = [dict(attrs={'class':'SDTxtSmallBold'})]
    remove_tags = [dict(attrs={'class':'SDTxtSmallBold'})]

significance · 07-25-2010, 07:52 PM

Is it possible to create news recipes for journals that publish the articles only as PDFs? If so, I'd love one for Limnology and Oceanography (http://www.aslo.org/lo/toc/index.html), if someone has the time. Things that might make this difficult:
1) Articles are published as PDFs
2) The main page has links to issues that are not yet available and issues that are in progress, with some articles available, but not all. I'd want to download only the latest complete issue.
3) Some articles are locked and available for purchase, while others are free to download. If your institution has a subscription, you can download even those available for purchase without paying again.

Too complicated?

kiklop74 · 07-25-2010, 11:17 PM

Quote:

Originally Posted by significance

Is it possible to create news recipes for journals that publish the articles only as PDFs?

In programming almost anything is possible with the right amount of effort.

Quote:

Originally Posted by significance

Too complicated?

Yes. I doubt anybody would undertake this task. At least not for free.

significance · 07-26-2010, 02:25 AM

Quote:

Originally Posted by kiklop74

In programming almost anything is possible with the right amount of effort.

True. But is it possible to include the articles in their original PDF format, or would the recipe need to run them through a converter? I find converted PDFs of scientific articles generally unsatisfactory, so I won't bother if they can't be somehow embedded as-is. As is no doubt obvious, I don't yet know much about the formats used for news feeds.

Quote:

Originally Posted by kiklop74

Yes. I doubt anybody would undertake this task. At least not for free.

If the PDFs don't need to be converted, I'll probably give it a go myself sooner or later. But thanks for your evaluation. I've only spent an afternoon with Calibre so far and I don't know Python, so I'm on a learning curve here.

sark666 · 07-26-2010, 04:30 PM

I just got my first ebook reader and I'm looking at this thread for some interesting rss feeds.

Howevery, what are the chances of a newsfeed posted in 2008 still working? I'm too familar with Calibre yet, but I understand it has gone through some changes and wasn't sure if I should attempt to use older receipes.

Starson17 · 07-26-2010, 04:41 PM

Quote:

Originally Posted by sark666

I just got my first ebook reader and I'm looking at this thread for some interesting rss feeds.

Howevery, what are the chances of a newsfeed posted in 2008 still working? I'm too familar with Calibre yet, but I understand it has gone through some changes and wasn't sure if I should attempt to use older receipes.

Are you aware that there are 100's of built in recipes? To answer your question, most older recipes will work fine if the website hasn't changed. I'd bet most have.

Starson17 · 07-26-2010, 04:51 PM

Quote:

Originally Posted by significance

1) Articles are published as PDFs

It's worse than that. They're multi-column pdfs. All recipes currently create EPUBs and Calibre can't convert multicolumn (it's on the ToDo list).

I think you'd be better off with an automated website downloader, such as wget (possibly Web2Disk could also do it, but I'm more familiar with wget). You could restrict wget to grabbing the PDF, then use your batch or script to add the unconverted pdf file to Calibre. It should be possible to automate the whole thing, but not via the recipe system. I assume you have a pdf reader available that will read the unmodified files.

significance · 07-26-2010, 07:06 PM

Quote:

Originally Posted by Starson17

It's worse than that. They're multi-column pdfs. All recipes currently create EPUBs and Calibre can't convert multicolumn (it's on the ToDo list).

I think you'd be better off with an automated website downloader, such as wget (possibly Web2Disk could also do it, but I'm more familiar with wget). You could restrict wget to grabbing the PDF, then use your batch or script to add the unconverted pdf file to Calibre. It should be possible to automate the whole thing, but not via the recipe system. I assume you have a pdf reader available that will read the unmodified files.

Oh, good idea. Cheers for that. Yes, I have a Kindle 2 that does quite well with unconverted PDFs these days, and my new Kindle DX (which I bought for just this purpose) is in the mail.

sark666 · 07-26-2010, 09:19 PM

Anyone want to try 24h Toronto?

They have other Canadian RSS feeds but I'm interested in the Toronto one.

http://eedition.toronto.24hrs.ca/epa...6225&type=full

http://eedition.toronto.24hrs.ca/epaper/viewer.aspx

tayseidel · 07-27-2010, 06:16 PM

Perhaps I didn't ask nicely enough the first time around. I'd really like to have someone take a look at making a recipe for http://www.columbian.com/. I mentioned in my previous post that I'd like to have the recipe fetch the print edition. This is done by adding /?print to the end of the url.

I'd very much appreciate a recipe.

Starson17 · 07-28-2010, 09:31 AM

Quote:

Originally Posted by tayseidel

Perhaps I didn't ask nicely enough the first time around. I'd really like to have someone take a look at making a recipe for http://www.columbian.com/.

While asking nicely is important, you have to remember that people write recipes only when they have time, or when they want the recipe themselves, or when they think there's a big demand, etc. It's a volunteer effort.

The easier you make it, the more likely that someone will pick it up. In your case, you haven't given a link to the feed, just the main page.

It's not that hard to write a recipe, and personally, I prefer to help those who've tried to write it and just need a bit of help. Sometimes, a user I've helped on one recipe has gone on to write a dozen more.

If you want to try it yourself, try putting your feed(s) into the basic recipe option. Then hit the Advanced button and add this code:

Code:

    def print_version(self, url):
        return url + '?print'

mwheinz · 07-28-2010, 11:10 AM

Quote:

Originally Posted by tayseidel

Perhaps I didn't ask nicely enough the first time around. I'd really like to have someone take a look at making a recipe for http://www.columbian.com/. I mentioned in my previous post that I'd like to have the recipe fetch the print edition. This is done by adding /?print to the end of the url.

I'd very much appreciate a recipe.

As far as I can tell, that website does not have a newsfeed, (well, actually, it appears to have 19 different newsfeeds) so it doesn't matter how nicely you ask if you don't bother to specify.

07-25-2010, 07:52 PM	#2345
significance Member Posts: 11 Karma: 10 Join Date: Oct 2009 Device: Kindle International	Journals with PDF articles Is it possible to create news recipes for journals that publish the articles only as PDFs? If so, I'd love one for Limnology and Oceanography (http://www.aslo.org/lo/toc/index.html), if someone has the time. Things that might make this difficult: 1) Articles are published as PDFs 2) The main page has links to issues that are not yet available and issues that are in progress, with some articles available, but not all. I'd want to download only the latest complete issue. 3) Some articles are locked and available for purchase, while others are free to download. If your institution has a subscription, you can download even those available for purchase without paying again. Too complicated?

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Custom column read ?	pchrist7	Calibre	2	10-04-2010 02:52 AM
Archive for custom screensavers	sleeplessdave	Amazon Kindle	1	07-07-2010 12:33 PM
How to back up preferences and custom recipes?	greenapple	Calibre	3	03-29-2010 05:08 AM
Donations for Custom Recipes	ddavtian	Calibre	5	01-23-2010 04:54 PM
Help understanding custom recipes	andersent	Calibre	0	12-17-2009 02:37 PM

07-24-2010, 03:28 AM	#2342
caman Junior Member Posts: 1 Karma: 10 Join Date: Jul 2010 Device: PRS-900	Hello, can anyone help me with this one? http://sanduan.org/home/type.asp?iCa...&nChannel=News Thank you very much

07-26-2010, 04:30 PM	#2348
sark666 Connoisseur Posts: 51 Karma: 10 Join Date: Jul 2010 Device: colognesbook	I just got my first ebook reader and I'm looking at this thread for some interesting rss feeds. Howevery, what are the chances of a newsfeed posted in 2008 still working? I'm too familar with Calibre yet, but I understand it has gone through some changes and wasn't sure if I should attempt to use older receipes.

07-26-2010, 09:19 PM	#2352
sark666 Connoisseur Posts: 51 Karma: 10 Join Date: Jul 2010 Device: colognesbook	Anyone want to try 24h Toronto? They have other Canadian RSS feeds but I'm interested in the Toronto one. http://eedition.toronto.24hrs.ca/epa...6225&type=full http://eedition.toronto.24hrs.ca/epaper/viewer.aspx

07-27-2010, 06:16 PM	#2353
tayseidel Zealot Posts: 146 Karma: 189664 Join Date: Feb 2009 Device: Glo HD, Aura H20, PRS-T1	Perhaps I didn't ask nicely enough the first time around. I'd really like to have someone take a look at making a recipe for http://www.columbian.com/. I mentioned in my previous post that I'd like to have the recipe fetch the print edition. This is done by adding /?print to the end of the url. I'd very much appreciate a recipe.

Advert

Advert