![]() |
#1711 | |
Connoisseur
![]() Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
|
Quote:
they say " probably your remove_tags is too aggressive or the html has some problems..." Now with the correct remove_tags every article shows perfect!!!!! And so, now the recipe is complete. Thanks to all!!! Code:
#!/usr/bin/env python
__license__ = 'GPL v3'
__author__ = '^^^^^^'
__copyright__ = '******'
__description__ = 'Punto Informatico'
'''
http://www.punto-informatico.it/
'''
from calibre.web.feeds.news import BasicNewsRecipe
class ilsole(BasicNewsRecipe):
author = '***'
description = 'Punto Informatico: Internet dal 1996'
cover_url = ' '
title = u'Punto Informatico '
publisher = 'italiaNews High Tech'
category = 'News, finance, economy, politics'
language = 'it'
timefmt = '[%a, %d %b, %Y]'
oldest_article = 15
max_articles_per_feed = 50
use_embedded_content = False
remove_javascript = True
no_stylesheets = True
keep_only_tags = [dict(name='div', attrs={'class':'box'})]
remove_tags = [dict(name='div',attrs={'class':'boxadv'})]
def get_article_url(self, article):
return article.get('id', article.get('guid', None))
feeds = [(u'Punto Informatico',u'http://feeds.punto-informatico.it/c/32288/f/438866/index.rss')]
|
|
![]() |
![]() |
#1712 |
Connoisseur
![]() Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
|
Does "keep_only_tags" support more than one condition?
|
![]() |
![]() |
#1713 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
It doesn't support conditionals at all. It will allow you to define extensive lists of tags:
Code:
keep_only_tags = [dict(name='div', attrs={'class':['feature','banner']}), dict(name='a', attrs={'id':['first','second','third']}), dict(name='span', attrs={'potato':['idaho', 'red']}) ] Edit: actually, it may support conditionals, I've just never used it. |
![]() |
![]() |
#1714 | |
Connoisseur
![]() Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
|
Quote:
|
|
![]() |
![]() |
#1715 |
Connoisseur
![]() Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
|
with this feed
http://www.lastampa.it/cmstp/rubrich...asp?ID_blog=25 i obtain this message: httperror_seek_wrapper: HTTP Error 403: Forbidden It's strange, the link doesn't need any user/password Last edited by gambarini; 04-01-2010 at 01:26 PM. |
![]() |
![]() |
#1716 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
|
![]() |
![]() |
#1717 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,378
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
@gamabarini: Some sites give you forbidden error s if you try to download too many files from them in a short time. Use the delay setting to add some delay between downloads
|
![]() |
![]() |
#1718 |
Junior Member
![]() Posts: 2
Karma: 10
Join Date: Apr 2010
Device: Hexaglot/Hanvon N518
|
20minuten.ch and tagesanzeiger.ch recipe
hello,
I put together some custom recepie for german/switzerland. I like to share. They might be "not perfect" but for me their ok... maybe someone can improve them... code: ''' www.20min.ch ''' from calibre.web.feeds.recipes import BasicNewsRecipe class ZwanzigMinuten(BasicNewsRecipe): title = '20min Online' category = 'news, politics, nachrichten, Switzerland' oldest_article = 1 max_articles_per_feed = 100 no_stylesheets = True use_embedded_content = False language = 'de' keep_only_tags = [dict(name='div', attrs={'class':['story_titles','story_media','story_text']})] # remove_tags = [ dict(name='script') # ,dict(name='div',attrs={'id':['footerAd', 'footerBottom','contentFooter','googleAdSense','si ngleSmallRight','horizontalNavigation','contentbox ','singleLogo']}) # ,dict(name='div',attrs={'class':['boxNews','boxExclusiv','boxExclusiv ad']}) # ] feeds = [ (u'Front' , u'http://www.20min.ch/rss/rss.tmpl?type=channel&get=1') ,(u'News' , u'http://www.20min.ch/rss/rss.tmpl?type=channel&get=4') ,(u'Ausland' , u'http://www.20min.ch/rss/rss.tmpl?type=rubrik&get=3') ,(u'Schweiz' , u'http://www.20min.ch/rss/rss.tmpl?type=rubrik&get=2') ,(u'Wirtschaft & Börse' , u'http://www.20min.ch/rss/rss.tmpl?type=channel&get=8') ,(u'Zürich' , u'http://www.20min.ch/rss/rss.tmpl?type=rubrik&get=19') ,(u'Bern' , u'http://www.20min.ch/rss/rss.tmpl?type=rubrik&get=20') ,(u'Mittelland' , u'http://www.20min.ch/rss/rss.tmpl?type=rubrik&get=2087') ,(u'Basel' , u'http://www.20min.ch/rss/rss.tmpl?type=rubrik&get=21') ,(u'Zentralschweiz' , u'http://www.20min.ch/rss/rss.tmpl?type=rubrik&get=112') ,(u'Ostschweiz' , u'http://www.20min.ch/rss/rss.tmpl?type=rubrik&get=126') ,(u'Panorama' , u'http://www.20min.ch/rss/rss.tmpl?type=rubrik&get=13') ,(u'People' , u'http://www.20min.ch/rss/rss.tmpl?type=channel&get=28') ,(u'Sport' , u'http://www.20min.ch/rss/rss.tmpl?type=channel&get=9') ,(u'Digital' , u'http://www.20min.ch/rss/rss.tmpl?type=channel&get=10') ,(u'Auto' , u'http://www.20min.ch/rss/rss.tmpl?type=channel&get=11') ,(u'Life' , u'http://www.20min.ch/rss/rss.tmpl?type=channel&get=25') ] tages-anzeiger.ch ''' www.tagesanzeiger.ch ''' from calibre.web.feeds.recipes import BasicNewsRecipe class Tagesanzeiger(BasicNewsRecipe): title = 'Tages-Anzeiger Online' category = 'news, politics, nachrichten, Switzerland' oldest_article = 2 max_articles_per_feed = 100 # no_stylesheets = True use_embedded_content = False language = 'de' # keep_only_tags = [dict(name='div', attrs={'class':'article'})] remove_tags = [ dict(name='script') ,dict(name='div',attrs={'id':['footerAd', 'footerBottom','contentFooter','googleAdSense','si ngleSmallRight','horizontalNavigation','contentbox ','singleLogo']}) ,dict(name='div',attrs={'class':['boxNews','boxExclusiv','boxExclusiv ad']}) ] feeds = [ (u'Zuerich' , u'http://www.tagesanzeiger.ch/zuerich/rss.html') ,(u'Schweiz' , u'http://www.tagesanzeiger.ch/schweiz/rss.html') ,(u'Ausland' , u'http://www.tagesanzeiger.ch/ausland/rss.html') ,(u'Wirtschaft' , u'http://www.tagesanzeiger.ch/wirtschaft/rss.html') ,(u'Sport' , u'http://www.tagesanzeiger.ch/sport/rss.html') ,(u'Kultur' , u'http://www.tagesanzeiger.ch/kultur/rss.html') ,(u'Panorama' , u'http://www.tagesanzeiger.ch/panorama/rss.html') ,(u'Leben' , u'http://www.tagesanzeiger.ch/leben/rss.html') ,(u'Digital' , u'http://www.tagesanzeiger.ch/digital/rss.html') ,(u'Auto' , u'http://www.tagesanzeiger.ch/auto/rss.html') ,(u'Dossiers' , u'http://www.tagesanzeiger.ch/dossiers/rss.html') ] def print_version(self, url): return url + '/print.html' |
![]() |
![]() |
#1719 | |
Connoisseur
![]() Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
|
Quote:
Now i have only one feed to complete: LaStampa, an italian newspaper (the newspaper of Torino). ![]() |
|
![]() |
![]() |
#1720 |
Junior Member
![]() Posts: 2
Karma: 10
Join Date: Mar 2010
Location: Germany
Device: DR1000S
|
Hi kiklop74,
thanks for your info. My first test with Sueddeutsche Zeitung stops with following error: ERROR: Konvertierungsfehler: <b>Misslungen</b>: Nachrichten abrufen von Sueddeutche Zeitung Nachrichten abrufen von Sueddeutche Zeitung Resolved conversion options {'asciiize': False, 'author_sort': None, 'authors': None, 'base_font_size': 0, 'book_producer': None, 'chapter': None, 'chapter_mark': 'pagebreak', 'comments': None, 'cover': None, 'debug_pipeline': None, 'disable_font_rescaling': False, 'dont_download_recipe': False, 'dont_justify': False, 'dont_split_on_page_breaks': True, 'extra_css': None, 'extract_to': None, 'flow_size': 260, 'font_size_mapping': None, 'footer_regex': '(?i)(?<=<hr>)((\\s*<a name=\\d+></a>((<img.+?>)*<br>\\s*)?\\d+<br>\\s*.*?\\s*)|(\\s* <a name=\\d+></a>((<img.+?>)*<br>\\s*)?.*?<br>\\s*\\d+))(?=<br>)' , 'header_regex': '(?i)(?<=<hr>)((\\s*<a name=\\d+></a>((<img.+?>)*<br>\\s*)?\\d+<br>\\s*.*?\\s*)|(\\s* <a name=\\d+></a>((<img.+?>)*<br>\\s*)?.*?<br>\\s*\\d+))(?=<br>)' , 'input_encoding': None, 'input_profile': <calibre.customize.profiles.InputProfile object at 0x03A88310>, 'insert_blank_line': False, 'insert_metadata': False, 'isbn': None, 'language': None, 'level1_toc': None, 'level2_toc': None, 'level3_toc': None, 'line_height': 0, 'linearize_tables': False, 'lrf': False, 'margin_bottom': 5.0, 'margin_left': 5.0, 'margin_right': 5.0, 'margin_top': 5.0, 'max_toc_links': 50, 'no_chapters_in_toc': False, 'no_default_epub_cover': False, 'no_inline_navbars': False, 'output_profile': <calibre.customize.profiles.OutputProfile object at 0x03A884F0>, 'page_breaks_before': None, 'password': 'Mxxxxxxxxxl', 'prefer_metadata_cover': False, 'preprocess_html': False, 'pretty_print': True, 'pubdate': None, 'publisher': None, 'rating': None, 'read_metadata_from_opf': None, 'remove_first_image': False, 'remove_footer': False, 'remove_header': False, 'remove_paragraph_spacing': False, 'remove_paragraph_spacing_indent_size': 1.5, 'series': None, 'series_index': None, 'tags': None, 'test': False, 'timestamp': None, 'title': None, 'title_sort': None, 'toc_filter': None, 'toc_threshold': 6, 'use_auto_toc': False, 'username': 'KJLieb', 'verbose': 2} InputFormatPlugin: Recipe Input running Python function terminated unexpectedly (Error Code: 1) Traceback (most recent call last): File "site.py", line 103, in main File "site.py", line 85, in run_entry_point File "site-packages\calibre\utils\ipc\worker.py", line 99, in main File "site-packages\calibre\gui2\convert\gui_conversion.py", line 24, in gui_convert File "site-packages\calibre\ebooks\conversion\plumber.py", line 787, in run File "site-packages\calibre\customize\conversion.py", line 211, in __call__ File "site-packages\calibre\web\feeds\input.py", line 100, in convert File "site-packages\calibre\web\feeds\news.py", line 554, in __init__ File "c:\dokume~1\klausl~1\lokale~1\temp\calibre_0.6.46 _sxttki_recipes\recipe0.py", line 46, in get_browser br.open(self.INDEX) File "site-packages\mechanize-0.1.11-py2.6.egg\mechanize\_mechanize.py", line 209, in open File "site-packages\mechanize-0.1.11-py2.6.egg\mechanize\_mechanize.py", line 236, in _mech_open File "site-packages\mechanize-0.1.11-py2.6.egg\mechanize\_opener.py", line 202, in open File "site-packages\mechanize-0.1.11-py2.6.egg\mechanize\_http.py", line 612, in http_response File "site-packages\mechanize-0.1.11-py2.6.egg\mechanize\_opener.py", line 219, in error File "urllib2.py", line 367, in _call_chain File "site-packages\mechanize-0.1.11-py2.6.egg\mechanize\_http.py", line 146, in http_error_302 File "site-packages\mechanize-0.1.11-py2.6.egg\mechanize\_mechanize.py", line 209, in open File "site-packages\mechanize-0.1.11-py2.6.egg\mechanize\_mechanize.py", line 236, in _mech_open File "site-packages\mechanize-0.1.11-py2.6.egg\mechanize\_opener.py", line 191, in open File "urllib2.py", line 407, in _open File "urllib2.py", line 367, in _call_chain File "site-packages\mechanize-0.1.11-py2.6.egg\mechanize\_http.py", line 729, in http_open File "site-packages\mechanize-0.1.11-py2.6.egg\mechanize\_http.py", line 704, in do_open File "httplib.py", line 974, in getresponse File "httplib.py", line 391, in begin File "httplib.py", line 355, in _read_status httplib.BadStatusLine Do you have anny solution for this problem? Thanks for help again KlJue |
![]() |
![]() |
#1721 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
They changed the login procedure. In order to develop new version I will need a valid username and password for payed access. If you have that send it to me in private message.
|
![]() |
![]() |
#1722 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Apr 2010
Device: nook
|
can someone please make this webpage into a recipe?
http://www.smh.com.au/todays-paper thanks in advance |
![]() |
![]() |
#1723 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
What is wrong with the existing Sydney Morning herald recipe?
|
![]() |
![]() |
#1724 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Apr 2010
Device: nook
|
thanks for your reply,
the existing sydney morning herald include the articles from its rss feeds, and it's fine. I would like to have a recipe which only fetches articles from 'today's newspaper' page. it would be great if you could post the code here, thanks again. |
![]() |
![]() |
#1725 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
New recipe for The Sydney Morning Herald - printed edition:
|
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 02:52 AM |
Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 12:33 PM |
How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 05:08 AM |
Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 04:54 PM |
Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 02:37 PM |