Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 04-01-2010, 11:18 AM   #1711
gambarini
Connoisseur
gambarini began at the beginning.
 
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
Quote:
Originally Posted by kiklop74 View Post
There is a problem with this feed. You should post a bug report in calibre trac.
I have opened a ticket; they don't debug singles feeds, but they give me an excellent suggestion:
they say " probably your remove_tags is too aggressive or the html has some problems..."
Now with the correct remove_tags every article shows perfect!!!!!
And so, now the recipe is complete.
Thanks to all!!!

Code:
#!/usr/bin/env  python
__license__   = 'GPL v3'
__author__    = '^^^^^^'
__copyright__ = '******'
__description__ = 'Punto Informatico'

'''
http://www.punto-informatico.it/
'''

from calibre.web.feeds.news import BasicNewsRecipe


class ilsole(BasicNewsRecipe):
    author        = '***'
    description   = 'Punto Informatico: Internet dal 1996'

    cover_url      = ' '
    title          = u'Punto Informatico '
    publisher      = 'italiaNews High Tech'
    category       = 'News, finance, economy, politics'

    language       = 'it'
    timefmt        = '[%a, %d %b, %Y]'

    oldest_article = 15
    max_articles_per_feed = 50
    use_embedded_content  = False

    remove_javascript  = True
    no_stylesheets     = True
    keep_only_tags     = [dict(name='div', attrs={'class':'box'})]
	remove_tags        = [dict(name='div',attrs={'class':'boxadv'})]
    def get_article_url(self, article):
        return article.get('id', article.get('guid', None))

    feeds              = [(u'Punto Informatico',u'http://feeds.punto-informatico.it/c/32288/f/438866/index.rss')]
gambarini is offline  
Old 04-01-2010, 11:28 AM   #1712
gambarini
Connoisseur
gambarini began at the beginning.
 
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
Does "keep_only_tags" support more than one condition?
gambarini is offline  
Old 04-01-2010, 11:40 AM   #1713
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by gambarini View Post
Does "keep_only_tags" support more than one condition?
It doesn't support conditionals at all. It will allow you to define extensive lists of tags:

Code:
keep_only_tags = [dict(name='div', attrs={'class':['feature','banner']}),
                          dict(name='a', attrs={'id':['first','second','third']}),
                          dict(name='span', attrs={'potato':['idaho', 'red']})  
                                  ]
Conditionals can be supported with other methods.

Edit: actually, it may support conditionals, I've just never used it.
Starson17 is offline  
Old 04-01-2010, 01:01 PM   #1714
gambarini
Connoisseur
gambarini began at the beginning.
 
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
Quote:
Originally Posted by Starson17 View Post
It doesn't support conditionals at all. It will allow you to define extensive lists of tags:

Code:
keep_only_tags = [dict(name='div', attrs={'class':['feature','banner']}),
                          dict(name='a', attrs={'id':['first','second','third']}),
                          dict(name='span', attrs={'potato':['idaho', 'red']})  
                                  ]
Conditionals can be supported with other methods.

Edit: actually, it may support conditionals, I've just never used it.
thanks!!!!!!
gambarini is offline  
Old 04-01-2010, 01:02 PM   #1715
gambarini
Connoisseur
gambarini began at the beginning.
 
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
with this feed

http://www.lastampa.it/cmstp/rubrich...asp?ID_blog=25

i obtain this message:

httperror_seek_wrapper: HTTP Error 403: Forbidden

It's strange, the link doesn't need any user/password

Last edited by gambarini; 04-01-2010 at 01:26 PM.
gambarini is offline  
Old 04-01-2010, 02:09 PM   #1716
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by gambarini View Post
thanks!!!!!!
Glad to help.
Starson17 is offline  
Old 04-01-2010, 10:48 PM   #1717
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,378
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
@gamabarini: Some sites give you forbidden error s if you try to download too many files from them in a short time. Use the delay setting to add some delay between downloads
kovidgoyal is online now  
Old 04-02-2010, 04:03 AM   #1718
andifink
Junior Member
andifink began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Apr 2010
Device: Hexaglot/Hanvon N518
20minuten.ch and tagesanzeiger.ch recipe

hello,

I put together some custom recepie for german/switzerland. I like to share. They might be "not perfect" but for me their ok... maybe someone can improve them...

code:

'''
www.20min.ch
'''

from calibre.web.feeds.recipes import BasicNewsRecipe

class ZwanzigMinuten(BasicNewsRecipe):
title = '20min Online'
category = 'news, politics, nachrichten, Switzerland'
oldest_article = 1
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
language = 'de'


keep_only_tags = [dict(name='div', attrs={'class':['story_titles','story_media','story_text']})]

# remove_tags = [ dict(name='script')
# ,dict(name='div',attrs={'id':['footerAd', 'footerBottom','contentFooter','googleAdSense','si ngleSmallRight','horizontalNavigation','contentbox ','singleLogo']})
# ,dict(name='div',attrs={'class':['boxNews','boxExclusiv','boxExclusiv ad']})
# ]

feeds = [
(u'Front' , u'http://www.20min.ch/rss/rss.tmpl?type=channel&get=1')
,(u'News' , u'http://www.20min.ch/rss/rss.tmpl?type=channel&get=4')
,(u'Ausland' , u'http://www.20min.ch/rss/rss.tmpl?type=rubrik&get=3')
,(u'Schweiz' , u'http://www.20min.ch/rss/rss.tmpl?type=rubrik&get=2')
,(u'Wirtschaft & Börse' , u'http://www.20min.ch/rss/rss.tmpl?type=channel&get=8')
,(u'Zürich' , u'http://www.20min.ch/rss/rss.tmpl?type=rubrik&get=19')
,(u'Bern' , u'http://www.20min.ch/rss/rss.tmpl?type=rubrik&get=20')
,(u'Mittelland' , u'http://www.20min.ch/rss/rss.tmpl?type=rubrik&get=2087')
,(u'Basel' , u'http://www.20min.ch/rss/rss.tmpl?type=rubrik&get=21')
,(u'Zentralschweiz' , u'http://www.20min.ch/rss/rss.tmpl?type=rubrik&get=112')
,(u'Ostschweiz' , u'http://www.20min.ch/rss/rss.tmpl?type=rubrik&get=126')
,(u'Panorama' , u'http://www.20min.ch/rss/rss.tmpl?type=rubrik&get=13')
,(u'People' , u'http://www.20min.ch/rss/rss.tmpl?type=channel&get=28')
,(u'Sport' , u'http://www.20min.ch/rss/rss.tmpl?type=channel&get=9')
,(u'Digital' , u'http://www.20min.ch/rss/rss.tmpl?type=channel&get=10')
,(u'Auto' , u'http://www.20min.ch/rss/rss.tmpl?type=channel&get=11')
,(u'Life' , u'http://www.20min.ch/rss/rss.tmpl?type=channel&get=25')

]



tages-anzeiger.ch

'''
www.tagesanzeiger.ch
'''

from calibre.web.feeds.recipes import BasicNewsRecipe

class Tagesanzeiger(BasicNewsRecipe):
title = 'Tages-Anzeiger Online'
category = 'news, politics, nachrichten, Switzerland'
oldest_article = 2
max_articles_per_feed = 100
# no_stylesheets = True
use_embedded_content = False
language = 'de'


# keep_only_tags = [dict(name='div', attrs={'class':'article'})]

remove_tags = [ dict(name='script')
,dict(name='div',attrs={'id':['footerAd', 'footerBottom','contentFooter','googleAdSense','si ngleSmallRight','horizontalNavigation','contentbox ','singleLogo']})
,dict(name='div',attrs={'class':['boxNews','boxExclusiv','boxExclusiv ad']})
]

feeds = [
(u'Zuerich' , u'http://www.tagesanzeiger.ch/zuerich/rss.html')
,(u'Schweiz' , u'http://www.tagesanzeiger.ch/schweiz/rss.html')
,(u'Ausland' , u'http://www.tagesanzeiger.ch/ausland/rss.html')
,(u'Wirtschaft' , u'http://www.tagesanzeiger.ch/wirtschaft/rss.html')
,(u'Sport' , u'http://www.tagesanzeiger.ch/sport/rss.html')
,(u'Kultur' , u'http://www.tagesanzeiger.ch/kultur/rss.html')
,(u'Panorama' , u'http://www.tagesanzeiger.ch/panorama/rss.html')
,(u'Leben' , u'http://www.tagesanzeiger.ch/leben/rss.html')
,(u'Digital' , u'http://www.tagesanzeiger.ch/digital/rss.html')
,(u'Auto' , u'http://www.tagesanzeiger.ch/auto/rss.html')
,(u'Dossiers' , u'http://www.tagesanzeiger.ch/dossiers/rss.html')
]

def print_version(self, url):
return url + '/print.html'
andifink is offline  
Old 04-02-2010, 04:11 AM   #1719
gambarini
Connoisseur
gambarini began at the beginning.
 
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
Quote:
Originally Posted by kovidgoyal View Post
@gamabarini: Some sites give you forbidden error s if you try to download too many files from them in a short time. Use the delay setting to add some delay between downloads
thanks very mutch...!!!! I'll tray immediately.

Now i have only one feed to complete: LaStampa, an italian newspaper (the newspaper of Torino).

gambarini is offline  
Old 04-02-2010, 09:01 AM   #1720
KlJue
Junior Member
KlJue began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Mar 2010
Location: Germany
Device: DR1000S
Hi kiklop74,
thanks for your info. My first test with Sueddeutsche Zeitung stops with following error:

ERROR: Konvertierungsfehler: <b>Misslungen</b>: Nachrichten abrufen von Sueddeutche Zeitung

Nachrichten abrufen von Sueddeutche Zeitung
Resolved conversion options
{'asciiize': False,
'author_sort': None,
'authors': None,
'base_font_size': 0,
'book_producer': None,
'chapter': None,
'chapter_mark': 'pagebreak',
'comments': None,
'cover': None,
'debug_pipeline': None,
'disable_font_rescaling': False,
'dont_download_recipe': False,
'dont_justify': False,
'dont_split_on_page_breaks': True,
'extra_css': None,
'extract_to': None,
'flow_size': 260,
'font_size_mapping': None,
'footer_regex': '(?i)(?<=<hr>)((\\s*<a name=\\d+></a>((<img.+?>)*<br>\\s*)?\\d+<br>\\s*.*?\\s*)|(\\s* <a name=\\d+></a>((<img.+?>)*<br>\\s*)?.*?<br>\\s*\\d+))(?=<br>)' ,
'header_regex': '(?i)(?<=<hr>)((\\s*<a name=\\d+></a>((<img.+?>)*<br>\\s*)?\\d+<br>\\s*.*?\\s*)|(\\s* <a name=\\d+></a>((<img.+?>)*<br>\\s*)?.*?<br>\\s*\\d+))(?=<br>)' ,
'input_encoding': None,
'input_profile': <calibre.customize.profiles.InputProfile object at 0x03A88310>,
'insert_blank_line': False,
'insert_metadata': False,
'isbn': None,
'language': None,
'level1_toc': None,
'level2_toc': None,
'level3_toc': None,
'line_height': 0,
'linearize_tables': False,
'lrf': False,
'margin_bottom': 5.0,
'margin_left': 5.0,
'margin_right': 5.0,
'margin_top': 5.0,
'max_toc_links': 50,
'no_chapters_in_toc': False,
'no_default_epub_cover': False,
'no_inline_navbars': False,
'output_profile': <calibre.customize.profiles.OutputProfile object at 0x03A884F0>,
'page_breaks_before': None,
'password': 'Mxxxxxxxxxl',
'prefer_metadata_cover': False,
'preprocess_html': False,
'pretty_print': True,
'pubdate': None,
'publisher': None,
'rating': None,
'read_metadata_from_opf': None,
'remove_first_image': False,
'remove_footer': False,
'remove_header': False,
'remove_paragraph_spacing': False,
'remove_paragraph_spacing_indent_size': 1.5,
'series': None,
'series_index': None,
'tags': None,
'test': False,
'timestamp': None,
'title': None,
'title_sort': None,
'toc_filter': None,
'toc_threshold': 6,
'use_auto_toc': False,
'username': 'KJLieb',
'verbose': 2}
InputFormatPlugin: Recipe Input running
Python function terminated unexpectedly
(Error Code: 1)
Traceback (most recent call last):
File "site.py", line 103, in main
File "site.py", line 85, in run_entry_point
File "site-packages\calibre\utils\ipc\worker.py", line 99, in main
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 24, in gui_convert
File "site-packages\calibre\ebooks\conversion\plumber.py", line 787, in run
File "site-packages\calibre\customize\conversion.py", line 211, in __call__
File "site-packages\calibre\web\feeds\input.py", line 100, in convert
File "site-packages\calibre\web\feeds\news.py", line 554, in __init__
File "c:\dokume~1\klausl~1\lokale~1\temp\calibre_0.6.46 _sxttki_recipes\recipe0.py", line 46, in get_browser
br.open(self.INDEX)
File "site-packages\mechanize-0.1.11-py2.6.egg\mechanize\_mechanize.py", line 209, in open
File "site-packages\mechanize-0.1.11-py2.6.egg\mechanize\_mechanize.py", line 236, in _mech_open
File "site-packages\mechanize-0.1.11-py2.6.egg\mechanize\_opener.py", line 202, in open
File "site-packages\mechanize-0.1.11-py2.6.egg\mechanize\_http.py", line 612, in http_response
File "site-packages\mechanize-0.1.11-py2.6.egg\mechanize\_opener.py", line 219, in error
File "urllib2.py", line 367, in _call_chain
File "site-packages\mechanize-0.1.11-py2.6.egg\mechanize\_http.py", line 146, in http_error_302
File "site-packages\mechanize-0.1.11-py2.6.egg\mechanize\_mechanize.py", line 209, in open
File "site-packages\mechanize-0.1.11-py2.6.egg\mechanize\_mechanize.py", line 236, in _mech_open
File "site-packages\mechanize-0.1.11-py2.6.egg\mechanize\_opener.py", line 191, in open
File "urllib2.py", line 407, in _open
File "urllib2.py", line 367, in _call_chain
File "site-packages\mechanize-0.1.11-py2.6.egg\mechanize\_http.py", line 729, in http_open
File "site-packages\mechanize-0.1.11-py2.6.egg\mechanize\_http.py", line 704, in do_open
File "httplib.py", line 974, in getresponse
File "httplib.py", line 391, in begin
File "httplib.py", line 355, in _read_status
httplib.BadStatusLine

Do you have anny solution for this problem?
Thanks for help again
KlJue
KlJue is offline  
Old 04-02-2010, 12:22 PM   #1721
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
Quote:
Originally Posted by KlJue View Post
Hi kiklop74,
thanks for your info. My first test with Sueddeutsche Zeitung stops with following error:
They changed the login procedure. In order to develop new version I will need a valid username and password for payed access. If you have that send it to me in private message.
kiklop74 is offline  
Old 04-02-2010, 07:01 PM   #1722
chensl316
Junior Member
chensl316 began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Apr 2010
Device: nook
can someone please make this webpage into a recipe?
http://www.smh.com.au/todays-paper
thanks in advance
chensl316 is offline  
Old 04-02-2010, 10:10 PM   #1723
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
What is wrong with the existing Sydney Morning herald recipe?
kiklop74 is offline  
Old 04-02-2010, 11:16 PM   #1724
chensl316
Junior Member
chensl316 began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Apr 2010
Device: nook
thanks for your reply,
the existing sydney morning herald include the articles from its rss feeds,
and it's fine.
I would like to have a recipe which only fetches articles from 'today's newspaper' page.
it would be great if you could post the code here,
thanks again.
chensl316 is offline  
Old 04-03-2010, 08:52 AM   #1725
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
New recipe for The Sydney Morning Herald - printed edition:
Attached Files
File Type: zip smh_au_print.zip (1.5 KB, 249 views)
kiklop74 is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 05:06 AM.


MobileRead.com is a privately owned, operated and funded community.