Recipe for Sächsische Zeitung

jande · 04-21-2014, 08:54 PM

Hello,

I’m searching for a recipe for the german newspaper „Sächsische Zeitung“ (http://www.sz-online.de). It offers some RSS feeds that don’t include the paid content. So I’m looking for a way for converting the whole website to epub.

I’m not quite sure but it seems to me as if I managed to complete the login process for this site with the following code:

LOGIN = 'https://secure.sz-online.de/Customers.v3/login.asp'

def get_browser(self):
br = BasicNewsRecipe.get_browser(self)
br.open(self.INDEX)
if self.username is not None and self.password is not None:
br.open(self.LOGIN)
br.select_form(name='loginform')
br['Loginname'] = self.username
br['LoginPassword'] = self.password
br.submit(label='Anmelden')
return br

But I don’t know how to continue. How can I get the content of, for example, http://www.sz-online.de/nachrichten/politik and http://www.sz-online.de/nachrichten/wissen with just one recipe?

I would be very glad if anybody could help me.

Thanks a lot,

Jan

jande · 04-22-2014, 02:15 PM

Okay, I continued the work on my recipe. This is the newest version:

Code:

#!/usr/bin/env  python

import re

# (1) import the basic recipe and needed parts from BeautifulSoup

from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Tag, NavigableString

# (2) declare your class, derived from BasaicNewsRecipe, and set the variable INDEX to the url for the site page with links

class SZOnline(BasicNewsRecipe):

    title      = 'SZ Test'
    __author__ = 'Jan Nikolas Dicke'
    description = 'none'
    INDEX = 'http://www.sz-online.de/'
    language = 'de'


# (5) you will probably want to remove javascript, and may want to disable loading of stylesheets. Here, this does not make much difference, so I have retained the line for future use if desired, but made it a comment using "#"

    remove_javascript = True

# (6) parse_index finds the article links, using the INDEX variable, and 
# looking for links in a DIV with class="contenedor_nuevo". No cover image
# is specified. All subsequent lines here are part of parse_index. See 
# the code for the correct indentation structure

    def parse_index(self):
        articles = []
        soup = self.index_to_soup(self.INDEX)
#        ts = soup.find(id='magazineTopStories')
#        ds = self.tag_to_string(ts.find('h1')).split(':')[-1]
#        self.timefmt = ' [%s]'%ds
        cover = None
        feeds = []
#        for section in soup.findAll('div', attrs={'class':'magazineSection'}):
        for section in soup.findAll('header'):
            section_title = self.tag_to_string(section.find('h2'))
            articles = []

# (7) all article links have a "href" attribute
#            for post in section.findAll('h3', attrs={'class':'headline'}):
            for post in section.findAll('a', href=True):
                url = post['href']

# (8) other links may also have a "href" attribute, but article links
# will start with "/", and need the base url appended

                if url.startswith('/'):
                    url = 'http://www.sz-online.de'+url
                    title = self.tag_to_string(post)


#                    self.log('\t\t', desc)

# (11) build the list of article links
                articles.append({'title':title, 'url':url})


# (12) and if any article links have been found, append the article list to the feed list, which is finally returned
            if articles:
                feeds.append((section_title, articles))

        return feeds

Unfortunally, fetching the news end up with the following error:

Code:

calibre, version 1.33.0 (darwin, isfrozen: True)
Konvertierungsfehler: Fehlgeschlagen: Nachrichten abrufen von SZ Test

Nachrichten abrufen von SZ Test
Resolved conversion options
calibre version: 1.33.0
{'asciiize': False,
 'author_sort': None,
 'authors': None,
 'base_font_size': 0,
 'book_producer': None,
 'change_justification': 'original',
 'chapter': None,
 'chapter_mark': 'pagebreak',
 'comments': None,
 'cover': None,
 'debug_pipeline': None,
 'dehyphenate': True,
 'delete_blank_paragraphs': True,
 'disable_font_rescaling': False,
 'dont_compress': False,
 'dont_download_recipe': False,
 'duplicate_links_in_toc': False,
 'embed_all_fonts': False,
 'embed_font_family': None,
 'enable_heuristics': False,
 'expand_css': False,
 'extra_css': None,
 'extract_to': None,
 'filter_css': None,
 'fix_indents': True,
 'font_size_mapping': None,
 'format_scene_breaks': True,
 'html_unwrap_factor': 0.4,
 'input_encoding': None,
 'input_profile': <calibre.customize.profiles.InputProfile object at 0x1091d9110>,
 'insert_blank_line': False,
 'insert_blank_line_size': 0.5,
 'insert_metadata': False,
 'isbn': None,
 'italicize_common_cases': True,
 'keep_ligatures': False,
 'language': None,
 'level1_toc': None,
 'level2_toc': None,
 'level3_toc': None,
 'line_height': 0,
 'linearize_tables': False,
 'lrf': False,
 'margin_bottom': 5.0,
 'margin_left': 5.0,
 'margin_right': 5.0,
 'margin_top': 5.0,
 'markup_chapter_headings': True,
 'max_toc_links': 50,
 'minimum_line_height': 120.0,
 'mobi_file_type': 'old',
 'mobi_ignore_margins': False,
 'mobi_keep_original_images': False,
 'mobi_toc_at_start': False,
 'no_chapters_in_toc': False,
 'no_inline_navbars': False,
 'no_inline_toc': False,
 'output_profile': <calibre.customize.profiles.OutputProfile object at 0x1091d94d0>,
 'page_breaks_before': None,
 'personal_doc': '[PDOC]',
 'prefer_author_sort': False,
 'prefer_metadata_cover': False,
 'pretty_print': False,
 'pubdate': None,
 'publisher': None,
 'rating': None,
 'read_metadata_from_opf': None,
 'remove_fake_margins': True,
 'remove_first_image': False,
 'remove_paragraph_spacing': False,
 'remove_paragraph_spacing_indent_size': 1.5,
 'renumber_headings': True,
 'replace_scene_breaks': '',
 'search_replace': None,
 'series': None,
 'series_index': None,
 'share_not_sync': False,
 'smarten_punctuation': False,
 'sr1_replace': '',
 'sr1_search': '',
 'sr2_replace': '',
 'sr2_search': '',
 'sr3_replace': '',
 'sr3_search': '',
 'start_reading_at': None,
 'subset_embedded_fonts': False,
 'tags': None,
 'test': False,
 'timestamp': None,
 'title': None,
 'title_sort': None,
 'toc_filter': None,
 'toc_threshold': 6,
 'toc_title': None,
 'unsmarten_punctuation': False,
 'unwrap_lines': True,
 'use_auto_toc': False,
 'verbose': 2}
Python function terminated unexpectedly: local variable 'title' referenced before assignment
InputFormatPlugin: Recipe Input running
Using custom recipe
Traceback (most recent call last):
  File "/Applications/calibre.app/Contents/Resources/Python/lib/python2.7/site.py", line 208, in main
    return run_entry_point()
  File "/Applications/calibre.app/Contents/Resources/Python/lib/python2.7/site.py", line 114, in run_entry_point
    return getattr(pmod, func)()
  File "site-packages/calibre/utils/ipc/worker.py", line 195, in main
  File "site-packages/calibre/gui2/convert/gui_conversion.py", line 25, in gui_convert
  File "site-packages/calibre/ebooks/conversion/plumber.py", line 1038, in run
  File "site-packages/calibre/customize/conversion.py", line 241, in __call__
  File "site-packages/calibre/ebooks/conversion/plugins/recipe_input.py", line 117, in convert
  File "site-packages/calibre/web/feeds/news.py", line 982, in download
  File "site-packages/calibre/web/feeds/news.py", line 1147, in build_index
  File "<string>", line 59, in parse_index
UnboundLocalError: local variable 'title' referenced before assignment

May anybody help me, please?

Thanks, Jan

skoll1975 · 04-26-2014, 06:36 AM

Hi Jan,

python is a bit "different" if you're not used to it

But: In your error message you may notice the last two lines:

File "<string>", line 59, in parse_index
UnboundLocalError: local variable 'title' referenced before assignment

This refers to your script. In line 59 you use a variable "title", and python complains that there is no value assigned. You wanted to assign something two lines earlier, so my guess is that the "if" condition was actually false. Try to start debugging there.

Best reagrds,

Bernd

04-21-2014, 08:54 PM	#1
jande Junior Member Posts: 2 Karma: 10 Join Date: Apr 2014 Device: Kindle Paperwhite	Recipe for Sächsische Zeitung Hello, I’m searching for a recipe for the german newspaper „Sächsische Zeitung“ (http://www.sz-online.de). It offers some RSS feeds that don’t include the paid content. So I’m looking for a way for converting the whole website to epub. I’m not quite sure but it seems to me as if I managed to complete the login process for this site with the following code: LOGIN = 'https://secure.sz-online.de/Customers.v3/login.asp' def get_browser(self): br = BasicNewsRecipe.get_browser(self) br.open(self.INDEX) if self.username is not None and self.password is not None: br.open(self.LOGIN) br.select_form(name='loginform') br['Loginname'] = self.username br['LoginPassword'] = self.password br.submit(label='Anmelden') return br But I don’t know how to continue. How can I get the content of, for example, http://www.sz-online.de/nachrichten/politik and http://www.sz-online.de/nachrichten/wissen with just one recipe? I would be very glad if anybody could help me. Thanks a lot, Jan

04-26-2014, 06:36 AM	#3
skoll1975 Member Posts: 18 Karma: 10 Join Date: Jun 2012 Device: Kindle	Missing title Hi Jan, python is a bit "different" if you're not used to it But: In your error message you may notice the last two lines: File "<string>", line 59, in parse_index UnboundLocalError: local variable 'title' referenced before assignment This refers to your script. In line 59 you use a variable "title", and python complains that there is no value assigned. You wanted to assign something two lines earlier, so my guess is that the "if" condition was actually false. Try to start debugging there. Best reagrds, Bernd

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Recipe new: "Neue Osnabrücker Zeitung"	VoHegg	Recipes	0	09-28-2013 08:21 AM
Update request for Sueddeutsche Zeitung News Recipe	Divingduck	Recipes	14	12-05-2012 03:46 PM
Recipe for german newspaper "Berliner Zeitung"	a.peter	Recipes	1	12-13-2011 04:02 PM
recipe for Neuss-Grevenbroicher-Zeitung (NGZ) - german	schuster	Recipes	0	05-14-2011 01:50 PM
Problem with recipe for Sueddeutsche Zeitung	amontiel69	Recipes	0	02-25-2011 12:05 PM

Advert