Custom recipes (archive, read-only) - Page 92

Felipe · 02-05-2010, 09:07 AM

Hi,

I would like to be able to download the articles from http://www.elcomercio.com/ which is an Ecuadorian newspaper.

Thank you very much.

Best regards,
Felipe

XanthanGum · 02-05-2010, 12:49 PM

Quote:

Originally Posted by kiklop74

New recipe for Read It Later website:

kiklop74,

Thanks for letting us know about that instapaper.com site. Very interesting.

I visited the site, created an account, and saved some articles to read later. However, when I crank up your recipe in Calibre, I'm getting an error. Here it is:

ERROR: Conversion Error: Failed: Fetch news from Read It Later

Fetch news from Read It Later
Resolved conversion options
{'asciiize': False,
'author_sort': None,
'authors': None,
'base_font_size': 0,
'book_producer': None,
'chapter': None,
'chapter_mark': 'pagebreak',
'comments': None,
'cover': None,
'debug_pipeline': None,
'disable_font_rescaling': False,
'dont_download_recipe': False,
'dont_justify': True,
'enable_autorotation': False,
'extra_css': None,
'font_size_mapping': None,
'footer_regex': '(?i)(?<=<hr>)((\\s*<a name=\\d+></a>((<img.+?>)* \\s*)?\\d+ \\s*.*?\\s*)|(\\s* <a name=\\d+></a>((<img.+?>)* \\s*)?.*? \\s*\\d+))(?= )' ,
'header': False,
'header_format': '%t by %a',
'header_regex': '(?i)(?<=<hr>)((\\s*<a name=\\d+></a>((<img.+?>)* \\s*)?\\d+ \\s*.*?\\s*)|(\\s* <a name=\\d+></a>((<img.+?>)* \\s*)?.*? \\s*\\d+))(?= )' ,
'header_separation': 0,
'input_encoding': None,
'input_profile': <calibre.customize.profiles.InputProfile object at 0x02BCF970>,
'insert_blank_line': False,
'insert_metadata': False,
'isbn': None,
'language': None,
'level1_toc': None,
'level2_toc': None,
'level3_toc': None,
'line_height': 0,
'linearize_tables': False,
'lrf': False,
'margin_bottom': 5.0,
'margin_left': 5.0,
'margin_right': 5.0,
'margin_top': 5.0,
'max_toc_links': 50,
'minimum_indent': 0,
'mono_family': None,
'no_chapters_in_toc': False,
'no_inline_navbars': False,
'output_profile': <calibre.customize.profiles.SonyReaderOutput object at 0x02BCFB50>,
'page_breaks_before': None,
'password': '',
'prefer_metadata_cover': False,
'preprocess_html': False,
'pretty_print': False,
'publisher': None,
'rating': None,
'read_metadata_from_opf': None,
'remove_first_image': False,
'remove_footer': False,
'remove_header': False,
'remove_paragraph_spacing': False,
'remove_paragraph_spacing_indent_size': 1.5,
'render_tables_as_images': False,
'sans_family': None,
'series': None,
'series_index': None,
'serif_family': None,
'tags': None,
'test': False,
'text_size_multiplier_for_rendered_tables': 1.0,
'title': None,
'title_sort': None,
'toc_filter': None,
'toc_threshold': 6,
'use_auto_toc': False,
'username': '',
'verbose': 2,
'wordspace': 2.5}
InputFormatPlugin: Recipe Input running
Python function terminated unexpectedly
'NoneType' object has no attribute 'findAll' (Error Code: 1)
Traceback (most recent call last):
File "site.py", line 103, in main
File "site.py", line 85, in run_entry_point
File "site-packages\calibre\utils\ipc\worker.py", line 99, in main
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 24, in gui_convert
File "site-packages\calibre\ebooks\conversion\plumber.py", line 745, in run
File "site-packages\calibre\customize\conversion.py", line 211, in __call__
File "site-packages\calibre\web\feeds\input.py", line 92, in convert
File "site-packages\calibre\web\feeds\news.py", line 634, in download
File "site-packages\calibre\web\feeds\news.py", line 751, in build_index
File "c:\docume~1\hp_adm~1\locals~1\temp\calibre_0.6.37 _3vz3rn_recipes\recipe0.py", line 52, in parse_index
for item in ritem.findAll('li'):
AttributeError: 'NoneType' object has no attribute 'findAll'

I took out my username and password but am postive both were correct.

Hope you can help.

XG

XanthanGum · 02-05-2010, 12:55 PM

kiklop74,

Oops!! I thought you were accessing the instapaper.com site. I see now in your recipe it's another site, readitlater.com.

My apologies.

Where can I find the recipe that accesses the instapaper.com site?

XG

Denny_ · 02-05-2010, 02:17 PM

XG,

In the list of recipes by language it is under Unknown near the bottom.

Denny

XanthanGum · 02-05-2010, 02:21 PM

Quote:

Originally Posted by Denny_

XG,

In the list of recipes by language it is under Unknown near the bottom.

Denny

Denny,

Thanks. I'll check it out.

XG

kiklop74 · 02-05-2010, 03:45 PM

New recipe for El Comercio:

Felipe · 02-05-2010, 04:13 PM

Quote:

Originally Posted by kiklop74

New recipe for El Comercio:

kiklop74, you rock!

exdream · 02-05-2010, 05:41 PM

Please help!
I'm trying to figure out a recipe for http://szmobil.sueddeutsche.de/. I'm working on it pretty long now and after a short success with parsing one section I can't get the login with calibres browser-instance going

Every downloaded article page is the login form. Has anybody an idea? Thanks for your help!

from calibre.web.feeds.recipes import BasicNewsRecipe

class SzMobilRecipe(BasicNewsRecipe):
title = u'S\xfcddeutsche Zeitung'
oldest_article = 7
max_articles_per_feed = 100
description = 'Sueddeutsche Zeitung Mobile Ausgabe'
language = 'de'

needs_subscription = True

def get_browser(self):
br = BasicNewsRecipe.get_browser()
if self.username is not None and self.password is not None:
br.open('http://szmobil.sueddeutsche.de/login.php')
br.select_form(nr=0)
br['username'] = self.username
br['password'] = self.password
br.submit()
return br

# feeds = [(u'Streiflicht', u'http://szmobil.sueddeutsche.de/show.php?id=streif')]

def parse_index(self):
feeds = []
for title, url in [('Politik', 'http://szmobil.sueddeutsche.de/show.php?section=Politik')
# ('Seite Drei', 'http://szmobil.sueddeutsche.de/show.php?section=Seite+drei'),
# ('Meinungsseite', 'http://szmobil.sueddeutsche.de/show.php?section=Meinungsseite'),
# ('Panorama', 'http://szmobil.sueddeutsche.de/show.php?section=Panorama'),
# ('Feuilleton', 'http://szmobil.sueddeutsche.de/show.php?section=Feuilleton'),
# ('Medien', 'http://szmobil.sueddeutsche.de/show.php?section=Medien'),
# ('Wissen', 'http://szmobil.sueddeutsche.de/show.php?section=Wissen'),
# ('Wirtschaft', u'http://szmobil.sueddeutsche.de/show.php?section=Wirtschaft'),
# ('Sport', u'http://szmobil.sueddeutsche.de/show.php?section=Sport'),
# ('Muenchen-Bayern', u'http://szmobil.sueddeutsche.de/show.php?section=M%FCnchen%2FBayern')
]:
articles = self.nz_parse_section(url)
if articles:
feeds.append((title, articles))
return feeds

def nz_parse_section(self, url):
soup = self.index_to_soup(url)
current_articles = []
for li in soup.findAll('li'):
a = li.find('a', href = True)
if a is None:
continue
title = self.tag_to_string(a)
url = a.get('href', False)
if not url or not title:
continue
current_articles.append({'title': title, 'url': url, 'description':'', 'date':''})
return current_articles

kiklop74 · 02-06-2010, 07:22 AM

There is also one hidden field in that form. Try this:

Code:

def get_browser(self):
br = BasicNewsRecipe.get_browser()
if self.username is not None and self.password is not None:
br.open('http://szmobil.sueddeutsche.de/login.php')
br.select_form(nr=0)
br['username'] = self.username
br['password'] = self.password
br['id'] = 'streif'
br.submit()
return br

exdream · 02-06-2010, 08:06 AM

Quote:

Originally Posted by kiklop74

There is also one hidden field in that form. Try this: ...

Hi kiklop74,
many thanks for your reply. I tried it, but it didn't work - got an ValueError: control 'id' is readonly. So I tried this then:

def get_browser(self):
br = BasicNewsRecipe.get_browser()
if self.username is not None and self.password is not None:
br.open('http://szmobil.sueddeutsche.de/login.php')
br.select_form(nr=0)
ctl_1 = br.find_control(type = 'hidden', name = 'id')
ctl_1.readonly = False
[1.try] ctl_1.value = 'streif'
br['username'] = self.username
br['password'] = self.password
[2.try]. br['id'] = 'streif'
br.submit()
return br

Both try outs brought the same. The ValueError disappeared but the downladed article pages have been the login-page again.

I'm getting more and more at loss with it.

I'm thankful for any idea?

Regards,
Gero

Starson17 · 02-06-2010, 11:21 AM

My wife reads the Discover Magazine feed and tells me that the Main menu, Section menu and Next links at the top of each article page (deepest pages) are all linking to external locations. Looking at the epub, I see that those links are really relative links, but the html code for each article page includes a base tag of the form:
<base href="http://discovermagazine.com ...>
Removing the base tag seems to fix the problem. Do recipe bugs belong here or in the bug tracker? Thanks.

Abelturd · 02-06-2010, 01:14 PM

Question: When I download a recipe calibre adds the "next", "previous" and "section menu" links of itself, right? My problem is that the "section menu" link doesn't point to the table of contents but to some nonexistent label, e.g. index.html#article_0. Is there some way how I can make it point to the table of contents of the given feed? Please.

kovidgoyal · 02-06-2010, 02:16 PM

Ah yes, <base> tags will screw things up. I'll add some code to strip them automatically in the next release.

@Abelturd: Section Menu links only work in recipes that have multiple sections.

Abelturd · 02-06-2010, 02:46 PM

Custom recipe for the ŽIVÉ.sk (zive.sk) - slovak IT news website.

Starson17 · 02-06-2010, 02:57 PM

Quote:

Originally Posted by kovidgoyal

Ah yes, <base> tags will screw things up. I'll add some code to strip them automatically in the next release.

Great!

02-06-2010, 11:21 AM	#1376
Starson17 Wizard Posts: 4,004 Karma: 177841 Join Date: Dec 2009 Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T	My wife reads the Discover Magazine feed and tells me that the Main menu, Section menu and Next links at the top of each article page (deepest pages) are all linking to external locations. Looking at the epub, I see that those links are really relative links, but the html code for each article page includes a base tag of the form: <base href="http://discovermagazine.com ...> Removing the base tag seems to fix the problem. Do recipe bugs belong here or in the bug tracker? Thanks. Last edited by Starson17; 03-02-2010 at 07:11 AM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Custom column read ?	pchrist7	Calibre	2	10-04-2010 02:52 AM
Archive for custom screensavers	sleeplessdave	Amazon Kindle	1	07-07-2010 12:33 PM
How to back up preferences and custom recipes?	greenapple	Calibre	3	03-29-2010 05:08 AM
Donations for Custom Recipes	ddavtian	Calibre	5	01-23-2010 04:54 PM
Help understanding custom recipes	andersent	Calibre	0	12-17-2009 02:37 PM

02-05-2010, 09:07 AM	#1366
Felipe Bookworm Posts: 5 Karma: 10 Join Date: Dec 2009 Location: Quito, Ecuador Device: BeBook	Hi, I would like to be able to download the articles from http://www.elcomercio.com/ which is an Ecuadorian newspaper. Thank you very much. Best regards, Felipe

02-05-2010, 12:55 PM	#1368
XanthanGum Connoisseur Posts: 51 Karma: 10 Join Date: Dec 2008 Location: Germany Device: SONY PRS-500	kiklop74, Oops!! I thought you were accessing the instapaper.com site. I see now in your recipe it's another site, readitlater.com. My apologies. Where can I find the recipe that accesses the instapaper.com site? XG

02-05-2010, 02:17 PM	#1369
Denny_ Member Posts: 12 Karma: 42 Join Date: Jan 2010 Device: Kindle	XG, In the list of recipes by language it is under Unknown near the bottom. Denny

02-05-2010, 05:41 PM	#1373
exdream Junior Member Posts: 9 Karma: 10 Join Date: Jan 2010 Device: Sony PRS-505	Please help! I'm trying to figure out a recipe for http://szmobil.sueddeutsche.de/. I'm working on it pretty long now and after a short success with parsing one section I can't get the login with calibres browser-instance going Every downloaded article page is the login form. Has anybody an idea? Thanks for your help! from calibre.web.feeds.recipes import BasicNewsRecipe class SzMobilRecipe(BasicNewsRecipe): title = u'S\xfcddeutsche Zeitung' oldest_article = 7 max_articles_per_feed = 100 description = 'Sueddeutsche Zeitung Mobile Ausgabe' language = 'de' needs_subscription = True def get_browser(self): br = BasicNewsRecipe.get_browser() if self.username is not None and self.password is not None: br.open('http://szmobil.sueddeutsche.de/login.php') br.select_form(nr=0) br['username'] = self.username br['password'] = self.password br.submit() return br # feeds = [(u'Streiflicht', u'http://szmobil.sueddeutsche.de/show.php?id=streif')] def parse_index(self): feeds = [] for title, url in [('Politik', 'http://szmobil.sueddeutsche.de/show.php?section=Politik') # ('Seite Drei', 'http://szmobil.sueddeutsche.de/show.php?section=Seite+drei'), # ('Meinungsseite', 'http://szmobil.sueddeutsche.de/show.php?section=Meinungsseite'), # ('Panorama', 'http://szmobil.sueddeutsche.de/show.php?section=Panorama'), # ('Feuilleton', 'http://szmobil.sueddeutsche.de/show.php?section=Feuilleton'), # ('Medien', 'http://szmobil.sueddeutsche.de/show.php?section=Medien'), # ('Wissen', 'http://szmobil.sueddeutsche.de/show.php?section=Wissen'), # ('Wirtschaft', u'http://szmobil.sueddeutsche.de/show.php?section=Wirtschaft'), # ('Sport', u'http://szmobil.sueddeutsche.de/show.php?section=Sport'), # ('Muenchen-Bayern', u'http://szmobil.sueddeutsche.de/show.php?section=M%FCnchen%2FBayern') ]: articles = self.nz_parse_section(url) if articles: feeds.append((title, articles)) return feeds def nz_parse_section(self, url): soup = self.index_to_soup(url) current_articles = [] for li in soup.findAll('li'): a = li.find('a', href = True) if a is None: continue title = self.tag_to_string(a) url = a.get('href', False) if not url or not title: continue current_articles.append({'title': title, 'url': url, 'description':'', 'date':''}) return current_articles

02-06-2010, 01:14 PM	#1377
Abelturd Little Fuzzy Soldier Posts: 580 Karma: 5711 Join Date: Sep 2008 Location: Nowhere in particular. Device: cybook gen3, htc hero, ipaq 214	Question: When I download a recipe calibre adds the "next", "previous" and "section menu" links of itself, right? My problem is that the "section menu" link doesn't point to the table of contents but to some nonexistent label, e.g. index.html#article_0. Is there some way how I can make it point to the table of contents of the given feed? Please.

02-06-2010, 02:16 PM	#1378
kovidgoyal creator of calibre Posts: 46,084 Karma: 29579912 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Ah yes, <base> tags will screw things up. I'll add some code to strip them automatically in the next release. @Abelturd: Section Menu links only work in recipes that have multiple sections.

Advert

Advert