|
|
#1 |
|
Enthusiast
![]() ![]() Posts: 43
Karma: 136
Join Date: Mar 2011
Device: Kindle Paperwhite
|
Minor issue with padding of Sueddeutsche recipe
Hello,
first of all thanks to all the contributors to Calibre and its recipes. This is really great, and makes my Kindle 3 so much more useful ![]() I have a subscription of the Sueddeutsche newspaper, which when downloading with target Kindle (MOBI) has a padding left for the actual articles. Here is an example: ![]() Since the heading of each article is perfectly left-aligned I tried to get rid of the extra padding to increase readability. The online source looks as follows: Code:
<p style="padding-left:4px;">Die EU-Kommission plant ....</p><br class="br5"> <p style="padding-left:4px;">EU-Kommissar Algirdas Semeta ...</p><br class="br5"> Code:
<blockquote class="calibre_9">...</blockquote> Code:
blockquote { margin: 0em 0em 0em 2em; }
Code:
.calibre_9 { margin-top: 1em; text-indent: 0pt }
None of those worked. Can anyone point me in the right direction? Do I need to change the CSS, adjust the recipe (pre-filter the padding-left:4px?), or is there a totally different solution? Thanks in advance. Cheers, - aerodynamik Last edited by aerodynamik; 04-08-2011 at 05:45 PM. |
|
|
|
|
|
#2 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,597
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Add
remove_attributes = ['style'] to the recipe |
|
|
|
| Advert | |
|
|
|
|
#3 |
|
Enthusiast
![]() ![]() Posts: 43
Karma: 136
Join Date: Mar 2011
Device: Kindle Paperwhite
|
Changed the existing
Code:
remove_attributes = ['height','width'] Code:
remove_attributes = ['height','width','style'] I will start looking into the documentation on recipes and hope to tweak this recipe some more, e.g. extra line breaks removed, download additional online sections that are not online every day. Thanks again! |
|
|
|
|
|
#4 | |
|
Enthusiast
![]() ![]() Posts: 43
Karma: 136
Join Date: Mar 2011
Device: Kindle Paperwhite
|
Quote:
To remove extra line brikes add 'br' to the 2nd dictionary in remove_tags: Code:
remove_tags =[
dict(attrs={'class':'hidePrint'})
,dict(name=['link','object','embed','base','iframe','br'])
]
Code:
,(u'Muenchen City' , INDEX + 'M%FCnchen+City/' )
,(u'Wochenende' , INDEX + 'SZ+am+Wochenende/' )
This is great. I'll test some more, complete the list of feeds and would then post the complete updated recipe here. Hope that helps anyone, have a good weekend. - aerodynamik |
|
|
|
|
|
|
#5 |
|
Enthusiast
![]() ![]() Posts: 43
Karma: 136
Join Date: Mar 2011
Device: Kindle Paperwhite
|
Updated recipe
Here we go.
Changes
I only tested the updated recipe on Kindle 3 with direct output to Mobi. Would be good if someone with other target formats and devices could give it a test. Not sure if I had to update the author section somehow... Code:
__license__ = 'GPL v3'
__copyright__ = '2010, Darko Miletic <darko.miletic at gmail.com>'
'''
www.sueddeutsche.de/sz/
'''
from calibre.web.feeds.news import BasicNewsRecipe
from calibre import strftime
class SueddeutcheZeitung(BasicNewsRecipe):
title = 'Sueddeutche Zeitung Ext'
__author__ = 'Darko Miletic'
description = 'News from Germany. Access to paid content.'
publisher = 'Sueddeutche Zeitung'
category = 'news, politics, Germany'
no_stylesheets = True
oldest_article = 2
encoding = 'cp1252'
needs_subscription = True
remove_empty_feeds = True
delay = 1
PREFIX = 'http://www.sueddeutsche.de'
INDEX = PREFIX + '/app/epaper/textversion/'
use_embedded_content = False
masthead_url = 'http://pix.sueddeutsche.de/img/layout/header/SZ_solo288x31.gif'
language = 'de'
publication_type = 'newspaper'
extra_css = ' body{font-family: Arial,Helvetica,sans-serif} '
conversion_options = {
'comment' : description
, 'tags' : category
, 'publisher' : publisher
, 'language' : language
, 'linearize_tables' : True
}
remove_attributes = ['height','width','style']
def get_browser(self):
br = BasicNewsRecipe.get_browser()
if self.username is not None and self.password is not None:
br.open(self.INDEX)
br.select_form(name='lbox')
br['login_name' ] = self.username
br['login_passwort'] = self.password
br.submit()
return br
remove_tags =[
dict(attrs={'class':'hidePrint'})
,dict(name=['link','object','embed','base','iframe','br'])
]
keep_only_tags = [dict(attrs={'class':'artikelBox'})]
remove_tags_before = dict(attrs={'class':'artikelTitel'})
remove_tags_after = dict(attrs={'class':'author'})
feeds = [
(u'Politik' , INDEX + 'Politik/' )
,(u'Seite drei' , INDEX + 'Seite+drei/' )
,(u'Meinungsseite' , INDEX + 'Meinungsseite/')
,(u'Wissen' , INDEX + 'Wissen/' )
,(u'Panorama' , INDEX + 'Panorama/' )
,(u'Feuilleton' , INDEX + 'Feuilleton/' )
,(u'Medien' , INDEX + 'Medien/' )
,(u'Wirtschaft' , INDEX + 'Wirtschaft/' )
,(u'Sport' , INDEX + 'Sport/' )
,(u'Bayern' , INDEX + 'Bayern/' )
,(u'Muenchen' , INDEX + 'M%FCnchen/' )
,(u'Muenchen City' , INDEX + 'M%FCnchen+City/' )
,(u'Jetzt.de' , INDEX + 'Jetzt.de/' )
,(u'Reise' , INDEX + 'Reise/' )
,(u'SZ Extra' , INDEX + 'SZ+Extra/' )
,(u'Wochenende' , INDEX + 'SZ+am+Wochenende/' )
,(u'Stellen-Markt' , INDEX + 'Stellen-Markt/')
,(u'Motormarkt' , INDEX + 'Motormarkt/')
,(u'Immobilien-Markt', INDEX + 'Immobilien-Markt/')
,(u'Thema' , INDEX + 'Thema/' )
,(u'Forum' , INDEX + 'Forum/' )
,(u'Leute' , INDEX + 'Leute/' )
,(u'Jugend' , INDEX + 'Jugend/' )
,(u'Beilage' , INDEX + 'Beilage/' )
]
def parse_index(self):
src = self.index_to_soup(self.INDEX)
id = ''
for itt in src.findAll('a',href=True):
if itt['href'].startswith('/app/epaper/textversion/inhalt/'):
id = itt['href'].rpartition('/inhalt/')[2]
totalfeeds = []
lfeeds = self.get_feeds()
for feedobj in lfeeds:
feedtitle, feedurl = feedobj
self.report_progress(0, _('Fetching feed')+' %s...'%(feedtitle if feedtitle else feedurl))
articles = []
soup = self.index_to_soup(feedurl + id)
tbl = soup.find(attrs={'class':'szprintd'})
for item in tbl.findAll(name='td',attrs={'class':'topthema'}):
atag = item.find(attrs={'class':'Titel'}).a
ptag = item.find('p')
stag = ptag.find('script')
if stag:
stag.extract()
url = self.PREFIX + atag['href']
title = self.tag_to_string(atag)
description = self.tag_to_string(ptag)
articles.append({
'title' :title
,'date' :strftime(self.timefmt)
,'url' :url
,'description':description
})
totalfeeds.append((feedtitle, articles))
return totalfeeds
|
|
|
|
| Advert | |
|
|
|
|
#6 |
|
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Nov 2011
Device: Nook Color
|
Hi readers of Süddeutsche Zeitung,
I made some modifications to the recipe to pull in a cover and add "Thema des Tages" which was missing. Hope the cover retrieval holds up, will test over the next week. MB Code:
__license__ = 'GPL v3'
__copyright__ = '2010, Darko Miletic <darko.miletic at gmail.com>'
'''
www.sueddeutsche.de/sz/
'''
from calibre.web.feeds.news import BasicNewsRecipe
from calibre import strftime
class SueddeutcheZeitung(BasicNewsRecipe):
title = 'Süddeutsche Zeitung'
__author__ = 'Darko Miletic'
description = 'News from Germany. Access to paid content.'
publisher = 'Süddeutsche Zeitung'
category = 'news, politics, Germany'
no_stylesheets = True
oldest_article = 2
encoding = 'iso-8859-1'
needs_subscription = True
remove_empty_feeds = True
delay = 1
cover_source = 'http://www.sueddeutsche.de/verlag'
PREFIX = 'http://www.sueddeutsche.de'
INDEX = PREFIX + '/app/epaper/textversion/'
use_embedded_content = False
masthead_url = 'http://pix.sueddeutsche.de/img/layout/header/SZ_solo288x31.gif'
language = 'de'
publication_type = 'newspaper'
extra_css = ' body{font-family: Arial,Helvetica,sans-serif} '
conversion_options = {
'comment' : description
, 'tags' : category
, 'publisher' : publisher
, 'language' : language
, 'linearize_tables' : True
}
remove_attributes = ['height','width','style']
def get_browser(self):
br = BasicNewsRecipe.get_browser()
if self.username is not None and self.password is not None:
br.open(self.INDEX)
br.select_form(name='lbox')
br['login_name' ] = self.username
br['login_passwort'] = self.password
br.submit()
return br
remove_tags =[
dict(attrs={'class':'hidePrint'})
,dict(name=['link','object','embed','base','iframe','br'])
]
keep_only_tags = [dict(attrs={'class':'artikelBox'})]
remove_tags_before = dict(attrs={'class':'artikelTitel'})
remove_tags_after = dict(attrs={'class':'author'})
feeds = [
(u'Politik' , INDEX + 'Politik/' )
,(u'Seite drei' , INDEX + 'Seite+drei/' )
,(u'Thema des Tages' , INDEX + 'Thema+des+Tages/' )
,(u'Meinungsseite' , INDEX + 'Meinungsseite/')
,(u'Wissen' , INDEX + 'Wissen/' )
,(u'Panorama' , INDEX + 'Panorama/' )
,(u'Feuilleton' , INDEX + 'Feuilleton/' )
,(u'Medien' , INDEX + 'Medien/' )
,(u'Wirtschaft' , INDEX + 'Wirtschaft/' )
,(u'Sport' , INDEX + 'Sport/' )
,(u'Bayern' , INDEX + 'Bayern/' )
,(u'Muenchen' , INDEX + 'M%FCnchen/' )
,(u'Muenchen City' , INDEX + 'M%FCnchen+City/' )
,(u'Jetzt.de' , INDEX + 'Jetzt.de/' )
,(u'Reise' , INDEX + 'Reise/' )
,(u'SZ Extra' , INDEX + 'SZ+Extra/' )
,(u'Wochenende' , INDEX + 'SZ+am+Wochenende/' )
,(u'Stellen-Markt' , INDEX + 'Stellen-Markt/')
,(u'Motormarkt' , INDEX + 'Motormarkt/')
,(u'Immobilien-Markt', INDEX + 'Immobilien-Markt/')
,(u'Thema' , INDEX + 'Thema/' )
,(u'Forum' , INDEX + 'Forum/' )
,(u'Leute' , INDEX + 'Leute/' )
,(u'Jugend' , INDEX + 'Jugend/' )
,(u'Beilage' , INDEX + 'Beilage/' )
]
def get_cover_url(self):
cover_source_soup = self.index_to_soup(self.cover_source)
preview_image_div = cover_source_soup.find(attrs={'class':'preview-image'})
return preview_image_div.div.img['src']
def parse_index(self):
src = self.index_to_soup(self.INDEX)
id = ''
for itt in src.findAll('a',href=True):
if itt['href'].startswith('/app/epaper/textversion/inhalt/'):
id = itt['href'].rpartition('/inhalt/')[2]
totalfeeds = []
lfeeds = self.get_feeds()
for feedobj in lfeeds:
feedtitle, feedurl = feedobj
self.report_progress(0, _('Fetching feed')+' %s...'%(feedtitle if feedtitle else feedurl))
articles = []
soup = self.index_to_soup(feedurl + id)
tbl = soup.find(attrs={'class':'szprintd'})
for item in tbl.findAll(name='td',attrs={'class':'topthema'}):
atag = item.find(attrs={'class':'Titel'}).a
ptag = item.find('p')
stag = ptag.find('script')
if stag:
stag.extract()
url = self.PREFIX + atag['href']
title = self.tag_to_string(atag)
description = self.tag_to_string(ptag)
articles.append({
'title' :title
,'date' :strftime(self.timefmt)
,'url' :url
,'description':description
})
totalfeeds.append((feedtitle, articles))
return totalfeeds
|
|
|
|
|
|
#7 |
|
Junior Member
![]() Posts: 2
Karma: 10
Join Date: Jan 2015
Device: none
|
SZ of next day...
Hi SZ Recipe Community!
first of all - thanks for the great SZ recipe - it is a welcome alternative to Amazon's expensive SZ abo! As you can already download the newspaper of the next day after about 7 pm on the SZ e-paper website - I was wondering if this is also possible with the Calibre recipe. I tried to find out the id pattern in the download URL eg. http://epaper.sueddeutsche.de/app/ep...lt/1422486000/. obviously id's from 27.Jan. to 29.Jan. have all fix places 1422wxyz00. Where wxyz was rising each day. 26.Jan.: wxyz = 2268 28.Jan.: wxyz = 3996 29.Jan.: wxyz = 4860 the id difference between two days is 4860-3996=864 I added my code below. It is tested just by downloading (an our ago) the paper of tomorrow which worked well. I expect that the code will work till 30.Jan. - Will be exciting what the URL ID will actually look like in February :-) Most probably the ID calculation has to be adjusted... I also don't adjusted the date which is entered in the calibre database as I don't know at the moment how to rise the date by one day... Code:
,'date' :strftime(self.timefmt) My additional Code in short: Code:
from datetime import datetime
d = 864 #id delta between two days
d29 = 4860 #start id @ day 29
now = datetime.now()
dy = int(strftime('%j'))
dyt = dy + 1 # day of the year tomorrow
dg = dyt - 29
id_d = d * dg
d_d = d29 + id_d
id = "1422"+str(d_d)+"00" #1422 jan.2015
feeds = [
(u'Politik' , INDEX + 'Politik/{}/'.format(id) )
...
Code:
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
__license__ = 'GPL v3'
__copyright__ = '2010, Darko Miletic <darko.miletic at gmail.com>'
'''
www.sueddeutsche.de/sz/
'''
# History
# 2014.10.02 Fixed url Problem von lala-rob(web@lala-rob.de)
from calibre.web.feeds.news import BasicNewsRecipe
from calibre import strftime
from datetime import datetime
class SueddeutcheZeitung(BasicNewsRecipe):
title = u'Süddeutsche Zeitung'
__author__ = 'Darko Miletic'
description = 'News from Germany. Access to paid content.'
publisher = u'Süddeutsche Zeitung'
category = 'news, politics, Germany'
no_stylesheets = True
oldest_article = 2
encoding = 'iso-8859-1'
needs_subscription = True
remove_empty_feeds = True
delay = 1
cover_source = 'http://www.sueddeutsche.de/verlag'
PREFIX = 'http://epaper.sueddeutsche.de'
INDEX = PREFIX + '/app/epaper/textversion/'
use_embedded_content = False
masthead_url = 'http://pix.sueddeutsche.de/img/layout/header/SZ_solo288x31.gif'
language = 'de'
publication_type = 'newspaper'
extra_css = ' body{font-family: Arial,Helvetica,sans-serif} '
conversion_options = {
'comment' : description
, 'tags' : category
, 'publisher' : publisher
, 'language' : language
, 'linearize_tables' : True
}
remove_attributes = ['height','width','style']
def get_browser(self):
browser = BasicNewsRecipe.get_browser(self)
# Login via fetching of Streiflicht -> Fill out login request
#url = self.root_url + 'show.php?id=streif'
url = 'https://id.sueddeutsche.de/login'
browser.open(url)
browser.select_form(nr=0) # to select the first form
browser['login'] = self.username
browser['password'] = self.password
browser.submit()
return browser
remove_tags =[
dict(attrs={'class':'hidePrint'})
,dict(name=['link','object','embed','base','iframe','br'])
]
keep_only_tags = [dict(attrs={'class':'artikelBox'})]
remove_tags_before = dict(attrs={'class':'artikelTitel'})
remove_tags_after = dict(attrs={'class':'author'})
#P.S. 28.01.15
#BEG
d = 864 #id delta between two days
d29 = 4860 #start id @ day 29
now = datetime.now()
dy = int(strftime('%j'))
dyt = dy + 1 # day of the year tomorrow
dg = dyt - 29
id_d = d * dg
d_d = d29 + id_d
id = "1422"+str(d_d)+"00" #1422 jan.2015
#END
feeds = [
(u'Politik' , INDEX + 'Politik/{}/'.format(id) )
,(u'Seite drei' , INDEX + 'Seite+drei/{}/'.format(id) )
,(u'Thema des Tages' , INDEX + 'Thema+des+Tages/{}/'.format(id) )
,(u'Meinungsseite' , INDEX + 'Meinungsseite/{}/'.format(id))
,(u'Wissen' , INDEX + 'Wissen/{}/'.format(id) )
,(u'Panorama' , INDEX + 'Panorama/{}/'.format(id) )
,(u'Feuilleton' , INDEX + 'Feuilleton/{}/'.format(id) )
,(u'Medien' , INDEX + 'Medien/{}/'.format(id) )
,(u'Wirtschaft' , INDEX + 'Wirtschaft/{}/'.format(id) )
,(u'Sport' , INDEX + 'Sport/{}/'.format(id) )
,(u'Bayern' , INDEX + 'Bayern/{}/'.format(id) )
,(u'Muenchen' , INDEX + 'M%FCnchen/{}/'.format(id) )
,(u'Muenchen City' , INDEX + 'M%FCnchen+City/{}/'.format(id) )
,(u'Jetzt.de' , INDEX + 'Jetzt.de/{}/'.format(id) )
,(u'Reise' , INDEX + 'Reise/{}/'.format(id) )
,(u'SZ Extra' , INDEX + 'SZ+Extra/{}/'.format(id) )
,(u'Wochenende' , INDEX + 'SZ+am+Wochenende/{}/'.format(id) )
,(u'Stellen-Markt' , INDEX + 'Stellen-Markt/{}/'.format(id))
,(u'Motormarkt' , INDEX + 'Motormarkt/{}/'.format(id))
,(u'Immobilien-Markt', INDEX + 'Immobilien-Markt/{}/'.format(id))
,(u'Thema' , INDEX + 'Thema/{}/'.format(id) )
,(u'Forum' , INDEX + 'Forum/{}/'.format(id) )
,(u'Leute' , INDEX + 'Leute/{}/'.format(id) )
,(u'Jugend' , INDEX + 'Jugend/{}/'.format(id) )
,(u'Beilage' , INDEX + 'Beilage/{}/'.format(id) )
]
def get_cover_url(self):
cover_source_soup = self.index_to_soup(self.cover_source)
preview_image_div = cover_source_soup.find(attrs={'class':'preview-image'})
return preview_image_div.div.img['src']
def parse_index(self):
src = self.index_to_soup(self.INDEX)
id = ''
for itt in src.findAll('a',href=True):
if itt['href'].startswith('/app/epaper/textversion/inhalt/{}/'.format(id)):
id = itt['href'].rpartition('/inhalt/{}/'.format(id))[2]
totalfeeds = []
lfeeds = self.get_feeds()
for feedobj in lfeeds:
feedtitle, feedurl = feedobj
self.report_progress(0, ('Fetching feed')+' %s...'%(feedtitle if feedtitle else feedurl))
articles = []
soup = self.index_to_soup(feedurl + id)
tbl = soup.find(attrs={'class':'szprintd'})
for item in tbl.findAll(name='td',attrs={'class':'topthema'}):
atag = item.find(attrs={'class':'Titel'}).a
ptag = item.find('p')
stag = ptag.find('script')
if stag:
stag.extract()
url = self.PREFIX + atag['href']
title = self.tag_to_string(atag)
description = self.tag_to_string(ptag)
articles.append({
'title' :title
,'date' :strftime(self.timefmt)
,'url' :url
,'description':description
})
totalfeeds.append((feedtitle, articles))
return totalfeeds
|
|
|
|
|
|
#8 |
|
Junior Member
![]() Posts: 2
Karma: 10
Join Date: Jan 2015
Device: none
|
Hi SZ readers!
The code above was not working anymore today since the ID calculation went wrong. Actually the ID is just built by adding 86400 for each passing day. The corrected code is written below. Bye! ----- ID examples: 1422486000 29.Jan. (d29) 1423004400 04.Feb. (d35) #P.S. 03.02.15 #BEG d = 86400 #id delta between two days d29 = 1422486000 #start id @ day 29 now = datetime.now() dy = int(strftime('%j')) dyt = dy + 1 # day of the year tomorrow dg = dyt - 29 id_d = d * dg id = d29 + id_d #END |
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Problem with recipe for Sueddeutsche Zeitung | amontiel69 | Recipes | 0 | 02-25-2011 11:05 AM |
| PRS-505 Minor issue when putting books on my reader | Atagahi | Sony Reader | 1 | 12-09-2010 12:39 PM |
| German: Sueddeutsche Zeitung is broken | kbaerwald | Recipes | 3 | 11-18-2010 05:57 AM |
| Minor issue on searches | Sydney's Mom | Calibre | 6 | 06-14-2010 04:46 PM |
| Mobi Conversion to EPub very minor issue | markbond1007 | Calibre | 1 | 08-06-2009 02:49 PM |