Custom recipes (archive, read-only) - Page 112

gambarini · 03-27-2010, 01:51 PM

Someone has a recipe for this feed rss?

http://feeds.punto-informatico.it/c/...8866/index.rss

thanks in advance

dhiru · 03-27-2010, 02:14 PM

is it possible to make recipe for business&economy magazine. it does not fave rss feed.
thanks
http://www.businessandeconomy.org/04032010/default.asp

olaf · 03-28-2010, 12:37 PM

I can not for the life of me figure out how to remove an image file at the top of each article of this newspaper. The image file has "Share - Larger Text - Smaller Text - Print" at the top of each article, pushing the main picture off to the next page and leaving the current page mostly blank. Any advice on how I get rid of that image? It seems to be embedded in code I can't seem to get at.

import string, re

class AdvancedUserRecipe1252944207(BasicNewsRecipe):
title = u'Telegram & Gazette'
oldest_article = 1
max_articles_per_feed = 50
timefmt = ''
no_stylesheets = True

keep_only_tags = [dict(id=['frontpage_section', 'articleWell', 'headline', 'subheadline', 'SuperHeading', 'byline', 'articleBody', 'zoom1'])]
remove_tags = [dict(id=['factBoxes'])]
preprocess_regexps = [(re.compile(r'.*?<p>', re.DOTALL|re.IGNORECASE), lambda match: '')]
preprocess_regexps = [(re.compile(r'<div class="verdana11">.*?', re.DOTALL|re.IGNORECASE), lambda match: '')]
encoding = 'cp1252'
remove_tags_after = [dict(id='leaderboardBot')]

feeds = [(u'Front Page News', u'http://www.telegram.com/apps/pbcs.dll/section?Category=RSS03&MIME=xml'),
(u'World & Regional', u'http://www.telegram.com/apps/pbcs.dll/section?Category=rss01&MIME=xml&profile=1052'),
(u'Living', u' http://www.telegram.com/apps/pbcs.dl...l&profile=1011'),
(u'Local News', u' http://www.telegram.com/apps/pbcs.dl...l&profile=1101'),
(u'Business', u'http://www.telegram.com/apps/pbcs.dll/section?Category=rss01&MIME=xml&profile=1002'),
(u'Opinion', u'http://www.telegram.com/apps/pbcs.dll/section?Category=rss01&MIME=xml&profile=1017'),
(u'Deaths', u'http://www.telegram.com/apps/pbcs.dll/section?Category=rss01&MIME=xml&profile=1001'),
(u'As I See It', u'http://www.telegram.com/apps/pbcs.dll/section?Category=rss01&MIME=xml&profile=1054')]

Starson17 · 03-28-2010, 01:32 PM

Quote:

Originally Posted by olaf

I can not for the life of me figure out how to remove an image file at the top of each article of this newspaper. The image file has "Share - Larger Text - Smaller Text - Print" at the top of each article, pushing the main picture off to the next page and leaving the current page mostly blank.

Try this:
remove_tags = [dict(name='div', attrs={'id':'article_tools'})]

Semonski · 03-28-2010, 01:53 PM

Thank you so much..... I'm trying it out now.....

Kos

Quote:

Originally Posted by kiklop74

New recipe for New York Post:

kiklop74 · 03-28-2010, 04:26 PM

Quote:

Originally Posted by Semonski

Thank you so much..... I'm trying it out now.....

Kos

Your recipe is too complicated. This is simplified and cleaned-up version (add more feeds, this is just example):

Code:

class Telegram(BasicNewsRecipe):
    title                 = 'Telegram'
    oldest_article        = 2
    max_articles_per_feed = 100
    no_stylesheets        = False
    use_embedded_content  = False
    encoding              = 'cp1252'
    publication_type      = 'newspaper'
    remove_empty_feeds    = True
    extra_css             = ' body{font-family: Verdana,sans-serif} .headline{font-size: xx-large; font-weight: bold} .mainPhotoCaption{font-size: x-small} '

    keep_only_tags     = [dict(name='div', attrs={'id':'articleWell'})]
    remove_tags_before = dict(attrs={'class':'headline'})
    remove_tags_after  = dict(attrs={'id':'zoom1'})
    remove_tags = [
                     dict(name='div', attrs={'class':'relatedContent'})
                    ,dict(name=['object','link','iframe'])
                  ]

    feeds          = [ 
                        (u'Front page' , u'http://www.telegram.com/apps/pbcs.dll/section?Category=RSS03&MIME=xml')
                     ]

    def preprocess_html(self, soup):
        return self.adeify_images(soup)

gambarini · 03-28-2010, 07:49 PM

The Apple Lounge an italian apple blog.

Any suggestion?

from calibre.ebooks.BeautifulSoup import BeautifulSoup
from calibre.web.feeds.news import BasicNewsRecipe
class Informatica(BasicNewsRecipe):
title = u'Informatica'
__author__ = 'Gabriele Marini'
oldest_article = 15
max_articles_per_feed = 100
use_embedded_content = False
remove_tags_after = dict(name='div', attrs={'id':'greet_block'})
no_stylesheets = True
feeds = [(u'The Apple Lounge', u'http://feeds.feedburner.com/Theapplelounge?format=xml')]
def print_version(self, url):
raw = self.browser.open(url).read()
soup = BeautifulSoup(raw.decode('utf8', 'replace'))
print_link = soup.find('a', {'title':'Stampa questo articolo'})
if print_link is None:
return url
return print_link['href']

kiklop74 · 03-28-2010, 09:25 PM

Quote:

Originally Posted by gambarini

The Apple Lounge an italian apple blog.

Any suggestion?

You are complicating too much. Calibre already extracts appropriate link from the feed (feedburner:Origlink). You just need to add the part for printing which is 'print/'. So the correct code would be:

Code:

def print_version(self, url):
     return url + 'print/'

Starson17 · 03-29-2010, 10:43 AM

Quote:

Originally Posted by MichaelMSeattle

I've tried running the GoComics reversed recipe for only about 5 comics/7 days. When I run it, it first seems to hang

Over the weekend I ran all comics of the GoComics.com recipe at size 1200 and 4 strips from each. I have the 200+ comics available broken up into four groups (four recipes) A-F, G-M, N-Z and Editorial comics. They all ran fine. However, I ran them at 8 hour intervals, not in sequence, and I set the delay option to 2 and the simultaneous connections option to 1 to minimize server load. I have seen occasional failures in the past that may be related to server load or anti-scraping tools on their server.

Starson17 · 03-29-2010, 10:45 AM

Quote:

Originally Posted by gambarini

can you give me an example of the print statement?

Sorry, I missed this post.

Code:

print 'The contents of the variable site_url is: ', site_url

One of my favorites is to print soup variables.

olaf · 03-29-2010, 11:48 AM

(message sent in error)

olaf · 03-29-2010, 12:09 PM

Quote:

Originally Posted by kiklop74

Your recipe is too complicated. This is simplified and cleaned-up version (add more feeds, this is just example):

Code:

class Telegram(BasicNewsRecipe):
    title                 = 'Telegram'
    oldest_article        = 2
    max_articles_per_feed = 100
    no_stylesheets        = False
    use_embedded_content  = False
    encoding              = 'cp1252'
    publication_type      = 'newspaper'
    remove_empty_feeds    = True
    extra_css             = ' body{font-family: Verdana,sans-serif} .headline{font-size: xx-large; font-weight: bold} .mainPhotoCaption{font-size: x-small} '

    keep_only_tags     = [dict(name='div', attrs={'id':'articleWell'})]
    remove_tags_before = dict(attrs={'class':'headline'})
    remove_tags_after  = dict(attrs={'id':'zoom1'})
    remove_tags = [
                     dict(name='div', attrs={'class':'relatedContent'})
                    ,dict(name=['object','link','iframe'])
                  ]

    feeds          = [ 
                        (u'Front page' , u'http://www.telegram.com/apps/pbcs.dll/section?Category=RSS03&MIME=xml')
                     ]

    def preprocess_html(self, soup):
        return self.adeify_images(soup)

Kiklop - that did the trick - thank you!

olaf · 03-29-2010, 12:33 PM

Quote:

Originally Posted by Starson17

Try this:
remove_tags = [dict(name='div', attrs={'id':'article_tools'})]

Starson - this worked - thank you!

olaf · 03-29-2010, 12:34 PM

Question regarding the Calibre online Help page. Is the 'Edit Metadata' page blank, or is my browser missing something?

Starson17 · 03-29-2010, 01:10 PM

Quote:

Originally Posted by olaf

Starson - this worked - thank you!

I'm glad to hear it. I didn't test your recipe. I just popped open Firebug, found your problem content and gave you the necessary line for that single problem.

03-28-2010, 07:49 PM	#1672
gambarini Connoisseur Posts: 98 Karma: 22 Join Date: Mar 2010 Device: IRiver Story, Ipod Touch, Android SmartPhone	My first recipe The Apple Lounge an italian apple blog. Any suggestion? from calibre.ebooks.BeautifulSoup import BeautifulSoup from calibre.web.feeds.news import BasicNewsRecipe class Informatica(BasicNewsRecipe): title = u'Informatica' __author__ = 'Gabriele Marini' oldest_article = 15 max_articles_per_feed = 100 use_embedded_content = False remove_tags_after = dict(name='div', attrs={'id':'greet_block'}) no_stylesheets = True feeds = [(u'The Apple Lounge', u'http://feeds.feedburner.com/Theapplelounge?format=xml')] def print_version(self, url): raw = self.browser.open(url).read() soup = BeautifulSoup(raw.decode('utf8', 'replace')) print_link = soup.find('a', {'title':'Stampa questo articolo'}) if print_link is None: return url return print_link['href']

03-29-2010, 11:48 AM	#1676
olaf Enthusiast Posts: 43 Karma: 50 Join Date: May 2009 Device: Kindle3	(message sent in error) Last edited by olaf; 03-29-2010 at 12:10 PM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Custom column read ?	pchrist7	Calibre	2	10-04-2010 03:52 AM
Archive for custom screensavers	sleeplessdave	Amazon Kindle	1	07-07-2010 01:33 PM
How to back up preferences and custom recipes?	greenapple	Calibre	3	03-29-2010 06:08 AM
Donations for Custom Recipes	ddavtian	Calibre	5	01-23-2010 05:54 PM
Help understanding custom recipes	andersent	Calibre	0	12-17-2009 03:37 PM

03-27-2010, 01:51 PM	#1666
gambarini Connoisseur Posts: 98 Karma: 22 Join Date: Mar 2010 Device: IRiver Story, Ipod Touch, Android SmartPhone	Someone has a recipe for this feed rss? http://feeds.punto-informatico.it/c/...8866/index.rss thanks in advance

03-27-2010, 02:14 PM	#1667
dhiru Connoisseur Posts: 83 Karma: 10 Join Date: Aug 2009 Device: iphone, Irex iliad, sony prs950, kindle Dx, Ipad	is it possible to make recipe for business&economy magazine. it does not fave rss feed. thanks http://www.businessandeconomy.org/04032010/default.asp

03-28-2010, 12:37 PM	#1668
olaf Enthusiast Posts: 43 Karma: 50 Join Date: May 2009 Device: Kindle3	I can not for the life of me figure out how to remove an image file at the top of each article of this newspaper. The image file has "Share - Larger Text - Smaller Text - Print" at the top of each article, pushing the main picture off to the next page and leaving the current page mostly blank. Any advice on how I get rid of that image? It seems to be embedded in code I can't seem to get at. import string, re class AdvancedUserRecipe1252944207(BasicNewsRecipe): title = u'Telegram & Gazette' oldest_article = 1 max_articles_per_feed = 50 timefmt = '' no_stylesheets = True keep_only_tags = [dict(id=['frontpage_section', 'articleWell', 'headline', 'subheadline', 'SuperHeading', 'byline', 'articleBody', 'zoom1'])] remove_tags = [dict(id=['factBoxes'])] preprocess_regexps = [(re.compile(r'<!-- This code displays columnist headshots: -->.?<p>', re.DOTALL\|re.IGNORECASE), lambda match: '')] preprocess_regexps = [(re.compile(r'<div class="verdana11">.?<!-- END ARTICLE COMMENTS -->', re.DOTALL\|re.IGNORECASE), lambda match: '')] encoding = 'cp1252' remove_tags_after = [dict(id='leaderboardBot')] feeds = [(u'Front Page News', u'http://www.telegram.com/apps/pbcs.dll/section?Category=RSS03&MIME=xml'), (u'World & Regional', u'http://www.telegram.com/apps/pbcs.dll/section?Category=rss01&MIME=xml&profile=1052'), (u'Living', u' http://www.telegram.com/apps/pbcs.dl...l&profile=1011'), (u'Local News', u' http://www.telegram.com/apps/pbcs.dl...l&profile=1101'), (u'Business', u'http://www.telegram.com/apps/pbcs.dll/section?Category=rss01&MIME=xml&profile=1002'), (u'Opinion', u'http://www.telegram.com/apps/pbcs.dll/section?Category=rss01&MIME=xml&profile=1017'), (u'Deaths', u'http://www.telegram.com/apps/pbcs.dll/section?Category=rss01&MIME=xml&profile=1001'), (u'As I See It', u'http://www.telegram.com/apps/pbcs.dll/section?Category=rss01&MIME=xml&profile=1054')]

03-29-2010, 12:34 PM	#1679
olaf Enthusiast Posts: 43 Karma: 50 Join Date: May 2009 Device: Kindle3	Question regarding the Calibre online Help page. Is the 'Edit Metadata' page blank, or is my browser missing something?