04-08-2011, 06:41 PM | #1 |
Enthusiast
Posts: 43
Karma: 136
Join Date: Mar 2011
Device: Kindle Paperwhite
|
Minor issue with padding of Sueddeutsche recipe
Hello,
first of all thanks to all the contributors to Calibre and its recipes. This is really great, and makes my Kindle 3 so much more useful I have a subscription of the Sueddeutsche newspaper, which when downloading with target Kindle (MOBI) has a padding left for the actual articles. Here is an example: Since the heading of each article is perfectly left-aligned I tried to get rid of the extra padding to increase readability. The online source looks as follows: Code:
<p style="padding-left:4px;">Die EU-Kommission plant ....</p><br class="br5"> <p style="padding-left:4px;">EU-Kommissar Algirdas Semeta ...</p><br class="br5"> Code:
<blockquote class="calibre_9">...</blockquote> Code:
blockquote { margin: 0em 0em 0em 2em; } Code:
.calibre_9 { margin-top: 1em; text-indent: 0pt }
None of those worked. Can anyone point me in the right direction? Do I need to change the CSS, adjust the recipe (pre-filter the padding-left:4px?), or is there a totally different solution? Thanks in advance. Cheers, - aerodynamik Last edited by aerodynamik; 04-08-2011 at 06:45 PM. |
04-08-2011, 08:32 PM | #2 |
creator of calibre
Posts: 44,743
Karma: 24967300
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Add
remove_attributes = ['style'] to the recipe |
Advert | |
|
04-09-2011, 03:39 AM | #3 |
Enthusiast
Posts: 43
Karma: 136
Join Date: Mar 2011
Device: Kindle Paperwhite
|
Changed the existing
Code:
remove_attributes = ['height','width'] Code:
remove_attributes = ['height','width','style'] I will start looking into the documentation on recipes and hope to tweak this recipe some more, e.g. extra line breaks removed, download additional online sections that are not online every day. Thanks again! |
04-09-2011, 07:53 AM | #4 | |
Enthusiast
Posts: 43
Karma: 136
Join Date: Mar 2011
Device: Kindle Paperwhite
|
Quote:
To remove extra line brikes add 'br' to the 2nd dictionary in remove_tags: Code:
remove_tags =[ dict(attrs={'class':'hidePrint'}) ,dict(name=['link','object','embed','base','iframe','br']) ] Code:
,(u'Muenchen City' , INDEX + 'M%FCnchen+City/' ) ,(u'Wochenende' , INDEX + 'SZ+am+Wochenende/' ) This is great. I'll test some more, complete the list of feeds and would then post the complete updated recipe here. Hope that helps anyone, have a good weekend. - aerodynamik |
|
04-16-2011, 06:44 AM | #5 |
Enthusiast
Posts: 43
Karma: 136
Join Date: Mar 2011
Device: Kindle Paperwhite
|
Updated recipe
Here we go.
Changes
I only tested the updated recipe on Kindle 3 with direct output to Mobi. Would be good if someone with other target formats and devices could give it a test. Not sure if I had to update the author section somehow... Code:
__license__ = 'GPL v3' __copyright__ = '2010, Darko Miletic <darko.miletic at gmail.com>' ''' www.sueddeutsche.de/sz/ ''' from calibre.web.feeds.news import BasicNewsRecipe from calibre import strftime class SueddeutcheZeitung(BasicNewsRecipe): title = 'Sueddeutche Zeitung Ext' __author__ = 'Darko Miletic' description = 'News from Germany. Access to paid content.' publisher = 'Sueddeutche Zeitung' category = 'news, politics, Germany' no_stylesheets = True oldest_article = 2 encoding = 'cp1252' needs_subscription = True remove_empty_feeds = True delay = 1 PREFIX = 'http://www.sueddeutsche.de' INDEX = PREFIX + '/app/epaper/textversion/' use_embedded_content = False masthead_url = 'http://pix.sueddeutsche.de/img/layout/header/SZ_solo288x31.gif' language = 'de' publication_type = 'newspaper' extra_css = ' body{font-family: Arial,Helvetica,sans-serif} ' conversion_options = { 'comment' : description , 'tags' : category , 'publisher' : publisher , 'language' : language , 'linearize_tables' : True } remove_attributes = ['height','width','style'] def get_browser(self): br = BasicNewsRecipe.get_browser() if self.username is not None and self.password is not None: br.open(self.INDEX) br.select_form(name='lbox') br['login_name' ] = self.username br['login_passwort'] = self.password br.submit() return br remove_tags =[ dict(attrs={'class':'hidePrint'}) ,dict(name=['link','object','embed','base','iframe','br']) ] keep_only_tags = [dict(attrs={'class':'artikelBox'})] remove_tags_before = dict(attrs={'class':'artikelTitel'}) remove_tags_after = dict(attrs={'class':'author'}) feeds = [ (u'Politik' , INDEX + 'Politik/' ) ,(u'Seite drei' , INDEX + 'Seite+drei/' ) ,(u'Meinungsseite' , INDEX + 'Meinungsseite/') ,(u'Wissen' , INDEX + 'Wissen/' ) ,(u'Panorama' , INDEX + 'Panorama/' ) ,(u'Feuilleton' , INDEX + 'Feuilleton/' ) ,(u'Medien' , INDEX + 'Medien/' ) ,(u'Wirtschaft' , INDEX + 'Wirtschaft/' ) ,(u'Sport' , INDEX + 'Sport/' ) ,(u'Bayern' , INDEX + 'Bayern/' ) ,(u'Muenchen' , INDEX + 'M%FCnchen/' ) ,(u'Muenchen City' , INDEX + 'M%FCnchen+City/' ) ,(u'Jetzt.de' , INDEX + 'Jetzt.de/' ) ,(u'Reise' , INDEX + 'Reise/' ) ,(u'SZ Extra' , INDEX + 'SZ+Extra/' ) ,(u'Wochenende' , INDEX + 'SZ+am+Wochenende/' ) ,(u'Stellen-Markt' , INDEX + 'Stellen-Markt/') ,(u'Motormarkt' , INDEX + 'Motormarkt/') ,(u'Immobilien-Markt', INDEX + 'Immobilien-Markt/') ,(u'Thema' , INDEX + 'Thema/' ) ,(u'Forum' , INDEX + 'Forum/' ) ,(u'Leute' , INDEX + 'Leute/' ) ,(u'Jugend' , INDEX + 'Jugend/' ) ,(u'Beilage' , INDEX + 'Beilage/' ) ] def parse_index(self): src = self.index_to_soup(self.INDEX) id = '' for itt in src.findAll('a',href=True): if itt['href'].startswith('/app/epaper/textversion/inhalt/'): id = itt['href'].rpartition('/inhalt/')[2] totalfeeds = [] lfeeds = self.get_feeds() for feedobj in lfeeds: feedtitle, feedurl = feedobj self.report_progress(0, _('Fetching feed')+' %s...'%(feedtitle if feedtitle else feedurl)) articles = [] soup = self.index_to_soup(feedurl + id) tbl = soup.find(attrs={'class':'szprintd'}) for item in tbl.findAll(name='td',attrs={'class':'topthema'}): atag = item.find(attrs={'class':'Titel'}).a ptag = item.find('p') stag = ptag.find('script') if stag: stag.extract() url = self.PREFIX + atag['href'] title = self.tag_to_string(atag) description = self.tag_to_string(ptag) articles.append({ 'title' :title ,'date' :strftime(self.timefmt) ,'url' :url ,'description':description }) totalfeeds.append((feedtitle, articles)) return totalfeeds |
Advert | |
|
11-06-2011, 09:50 AM | #6 |
Junior Member
Posts: 3
Karma: 10
Join Date: Nov 2011
Device: Nook Color
|
Hi readers of Süddeutsche Zeitung,
I made some modifications to the recipe to pull in a cover and add "Thema des Tages" which was missing. Hope the cover retrieval holds up, will test over the next week. MB Code:
__license__ = 'GPL v3' __copyright__ = '2010, Darko Miletic <darko.miletic at gmail.com>' ''' www.sueddeutsche.de/sz/ ''' from calibre.web.feeds.news import BasicNewsRecipe from calibre import strftime class SueddeutcheZeitung(BasicNewsRecipe): title = 'Süddeutsche Zeitung' __author__ = 'Darko Miletic' description = 'News from Germany. Access to paid content.' publisher = 'Süddeutsche Zeitung' category = 'news, politics, Germany' no_stylesheets = True oldest_article = 2 encoding = 'iso-8859-1' needs_subscription = True remove_empty_feeds = True delay = 1 cover_source = 'http://www.sueddeutsche.de/verlag' PREFIX = 'http://www.sueddeutsche.de' INDEX = PREFIX + '/app/epaper/textversion/' use_embedded_content = False masthead_url = 'http://pix.sueddeutsche.de/img/layout/header/SZ_solo288x31.gif' language = 'de' publication_type = 'newspaper' extra_css = ' body{font-family: Arial,Helvetica,sans-serif} ' conversion_options = { 'comment' : description , 'tags' : category , 'publisher' : publisher , 'language' : language , 'linearize_tables' : True } remove_attributes = ['height','width','style'] def get_browser(self): br = BasicNewsRecipe.get_browser() if self.username is not None and self.password is not None: br.open(self.INDEX) br.select_form(name='lbox') br['login_name' ] = self.username br['login_passwort'] = self.password br.submit() return br remove_tags =[ dict(attrs={'class':'hidePrint'}) ,dict(name=['link','object','embed','base','iframe','br']) ] keep_only_tags = [dict(attrs={'class':'artikelBox'})] remove_tags_before = dict(attrs={'class':'artikelTitel'}) remove_tags_after = dict(attrs={'class':'author'}) feeds = [ (u'Politik' , INDEX + 'Politik/' ) ,(u'Seite drei' , INDEX + 'Seite+drei/' ) ,(u'Thema des Tages' , INDEX + 'Thema+des+Tages/' ) ,(u'Meinungsseite' , INDEX + 'Meinungsseite/') ,(u'Wissen' , INDEX + 'Wissen/' ) ,(u'Panorama' , INDEX + 'Panorama/' ) ,(u'Feuilleton' , INDEX + 'Feuilleton/' ) ,(u'Medien' , INDEX + 'Medien/' ) ,(u'Wirtschaft' , INDEX + 'Wirtschaft/' ) ,(u'Sport' , INDEX + 'Sport/' ) ,(u'Bayern' , INDEX + 'Bayern/' ) ,(u'Muenchen' , INDEX + 'M%FCnchen/' ) ,(u'Muenchen City' , INDEX + 'M%FCnchen+City/' ) ,(u'Jetzt.de' , INDEX + 'Jetzt.de/' ) ,(u'Reise' , INDEX + 'Reise/' ) ,(u'SZ Extra' , INDEX + 'SZ+Extra/' ) ,(u'Wochenende' , INDEX + 'SZ+am+Wochenende/' ) ,(u'Stellen-Markt' , INDEX + 'Stellen-Markt/') ,(u'Motormarkt' , INDEX + 'Motormarkt/') ,(u'Immobilien-Markt', INDEX + 'Immobilien-Markt/') ,(u'Thema' , INDEX + 'Thema/' ) ,(u'Forum' , INDEX + 'Forum/' ) ,(u'Leute' , INDEX + 'Leute/' ) ,(u'Jugend' , INDEX + 'Jugend/' ) ,(u'Beilage' , INDEX + 'Beilage/' ) ] def get_cover_url(self): cover_source_soup = self.index_to_soup(self.cover_source) preview_image_div = cover_source_soup.find(attrs={'class':'preview-image'}) return preview_image_div.div.img['src'] def parse_index(self): src = self.index_to_soup(self.INDEX) id = '' for itt in src.findAll('a',href=True): if itt['href'].startswith('/app/epaper/textversion/inhalt/'): id = itt['href'].rpartition('/inhalt/')[2] totalfeeds = [] lfeeds = self.get_feeds() for feedobj in lfeeds: feedtitle, feedurl = feedobj self.report_progress(0, _('Fetching feed')+' %s...'%(feedtitle if feedtitle else feedurl)) articles = [] soup = self.index_to_soup(feedurl + id) tbl = soup.find(attrs={'class':'szprintd'}) for item in tbl.findAll(name='td',attrs={'class':'topthema'}): atag = item.find(attrs={'class':'Titel'}).a ptag = item.find('p') stag = ptag.find('script') if stag: stag.extract() url = self.PREFIX + atag['href'] title = self.tag_to_string(atag) description = self.tag_to_string(ptag) articles.append({ 'title' :title ,'date' :strftime(self.timefmt) ,'url' :url ,'description':description }) totalfeeds.append((feedtitle, articles)) return totalfeeds |
01-28-2015, 05:36 PM | #7 |
Junior Member
Posts: 2
Karma: 10
Join Date: Jan 2015
Device: none
|
SZ of next day...
Hi SZ Recipe Community!
first of all - thanks for the great SZ recipe - it is a welcome alternative to Amazon's expensive SZ abo! As you can already download the newspaper of the next day after about 7 pm on the SZ e-paper website - I was wondering if this is also possible with the Calibre recipe. I tried to find out the id pattern in the download URL eg. http://epaper.sueddeutsche.de/app/ep...lt/1422486000/. obviously id's from 27.Jan. to 29.Jan. have all fix places 1422wxyz00. Where wxyz was rising each day. 26.Jan.: wxyz = 2268 28.Jan.: wxyz = 3996 29.Jan.: wxyz = 4860 the id difference between two days is 4860-3996=864 I added my code below. It is tested just by downloading (an our ago) the paper of tomorrow which worked well. I expect that the code will work till 30.Jan. - Will be exciting what the URL ID will actually look like in February :-) Most probably the ID calculation has to be adjusted... I also don't adjusted the date which is entered in the calibre database as I don't know at the moment how to rise the date by one day... Code:
,'date' :strftime(self.timefmt) My additional Code in short: Code:
from datetime import datetime d = 864 #id delta between two days d29 = 4860 #start id @ day 29 now = datetime.now() dy = int(strftime('%j')) dyt = dy + 1 # day of the year tomorrow dg = dyt - 29 id_d = d * dg d_d = d29 + id_d id = "1422"+str(d_d)+"00" #1422 jan.2015 feeds = [ (u'Politik' , INDEX + 'Politik/{}/'.format(id) ) ... Code:
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai __license__ = 'GPL v3' __copyright__ = '2010, Darko Miletic <darko.miletic at gmail.com>' ''' www.sueddeutsche.de/sz/ ''' # History # 2014.10.02 Fixed url Problem von lala-rob(web@lala-rob.de) from calibre.web.feeds.news import BasicNewsRecipe from calibre import strftime from datetime import datetime class SueddeutcheZeitung(BasicNewsRecipe): title = u'Süddeutsche Zeitung' __author__ = 'Darko Miletic' description = 'News from Germany. Access to paid content.' publisher = u'Süddeutsche Zeitung' category = 'news, politics, Germany' no_stylesheets = True oldest_article = 2 encoding = 'iso-8859-1' needs_subscription = True remove_empty_feeds = True delay = 1 cover_source = 'http://www.sueddeutsche.de/verlag' PREFIX = 'http://epaper.sueddeutsche.de' INDEX = PREFIX + '/app/epaper/textversion/' use_embedded_content = False masthead_url = 'http://pix.sueddeutsche.de/img/layout/header/SZ_solo288x31.gif' language = 'de' publication_type = 'newspaper' extra_css = ' body{font-family: Arial,Helvetica,sans-serif} ' conversion_options = { 'comment' : description , 'tags' : category , 'publisher' : publisher , 'language' : language , 'linearize_tables' : True } remove_attributes = ['height','width','style'] def get_browser(self): browser = BasicNewsRecipe.get_browser(self) # Login via fetching of Streiflicht -> Fill out login request #url = self.root_url + 'show.php?id=streif' url = 'https://id.sueddeutsche.de/login' browser.open(url) browser.select_form(nr=0) # to select the first form browser['login'] = self.username browser['password'] = self.password browser.submit() return browser remove_tags =[ dict(attrs={'class':'hidePrint'}) ,dict(name=['link','object','embed','base','iframe','br']) ] keep_only_tags = [dict(attrs={'class':'artikelBox'})] remove_tags_before = dict(attrs={'class':'artikelTitel'}) remove_tags_after = dict(attrs={'class':'author'}) #P.S. 28.01.15 #BEG d = 864 #id delta between two days d29 = 4860 #start id @ day 29 now = datetime.now() dy = int(strftime('%j')) dyt = dy + 1 # day of the year tomorrow dg = dyt - 29 id_d = d * dg d_d = d29 + id_d id = "1422"+str(d_d)+"00" #1422 jan.2015 #END feeds = [ (u'Politik' , INDEX + 'Politik/{}/'.format(id) ) ,(u'Seite drei' , INDEX + 'Seite+drei/{}/'.format(id) ) ,(u'Thema des Tages' , INDEX + 'Thema+des+Tages/{}/'.format(id) ) ,(u'Meinungsseite' , INDEX + 'Meinungsseite/{}/'.format(id)) ,(u'Wissen' , INDEX + 'Wissen/{}/'.format(id) ) ,(u'Panorama' , INDEX + 'Panorama/{}/'.format(id) ) ,(u'Feuilleton' , INDEX + 'Feuilleton/{}/'.format(id) ) ,(u'Medien' , INDEX + 'Medien/{}/'.format(id) ) ,(u'Wirtschaft' , INDEX + 'Wirtschaft/{}/'.format(id) ) ,(u'Sport' , INDEX + 'Sport/{}/'.format(id) ) ,(u'Bayern' , INDEX + 'Bayern/{}/'.format(id) ) ,(u'Muenchen' , INDEX + 'M%FCnchen/{}/'.format(id) ) ,(u'Muenchen City' , INDEX + 'M%FCnchen+City/{}/'.format(id) ) ,(u'Jetzt.de' , INDEX + 'Jetzt.de/{}/'.format(id) ) ,(u'Reise' , INDEX + 'Reise/{}/'.format(id) ) ,(u'SZ Extra' , INDEX + 'SZ+Extra/{}/'.format(id) ) ,(u'Wochenende' , INDEX + 'SZ+am+Wochenende/{}/'.format(id) ) ,(u'Stellen-Markt' , INDEX + 'Stellen-Markt/{}/'.format(id)) ,(u'Motormarkt' , INDEX + 'Motormarkt/{}/'.format(id)) ,(u'Immobilien-Markt', INDEX + 'Immobilien-Markt/{}/'.format(id)) ,(u'Thema' , INDEX + 'Thema/{}/'.format(id) ) ,(u'Forum' , INDEX + 'Forum/{}/'.format(id) ) ,(u'Leute' , INDEX + 'Leute/{}/'.format(id) ) ,(u'Jugend' , INDEX + 'Jugend/{}/'.format(id) ) ,(u'Beilage' , INDEX + 'Beilage/{}/'.format(id) ) ] def get_cover_url(self): cover_source_soup = self.index_to_soup(self.cover_source) preview_image_div = cover_source_soup.find(attrs={'class':'preview-image'}) return preview_image_div.div.img['src'] def parse_index(self): src = self.index_to_soup(self.INDEX) id = '' for itt in src.findAll('a',href=True): if itt['href'].startswith('/app/epaper/textversion/inhalt/{}/'.format(id)): id = itt['href'].rpartition('/inhalt/{}/'.format(id))[2] totalfeeds = [] lfeeds = self.get_feeds() for feedobj in lfeeds: feedtitle, feedurl = feedobj self.report_progress(0, ('Fetching feed')+' %s...'%(feedtitle if feedtitle else feedurl)) articles = [] soup = self.index_to_soup(feedurl + id) tbl = soup.find(attrs={'class':'szprintd'}) for item in tbl.findAll(name='td',attrs={'class':'topthema'}): atag = item.find(attrs={'class':'Titel'}).a ptag = item.find('p') stag = ptag.find('script') if stag: stag.extract() url = self.PREFIX + atag['href'] title = self.tag_to_string(atag) description = self.tag_to_string(ptag) articles.append({ 'title' :title ,'date' :strftime(self.timefmt) ,'url' :url ,'description':description }) totalfeeds.append((feedtitle, articles)) return totalfeeds |
02-03-2015, 04:39 PM | #8 |
Junior Member
Posts: 2
Karma: 10
Join Date: Jan 2015
Device: none
|
Hi SZ readers!
The code above was not working anymore today since the ID calculation went wrong. Actually the ID is just built by adding 86400 for each passing day. The corrected code is written below. Bye! ----- ID examples: 1422486000 29.Jan. (d29) 1423004400 04.Feb. (d35) #P.S. 03.02.15 #BEG d = 86400 #id delta between two days d29 = 1422486000 #start id @ day 29 now = datetime.now() dy = int(strftime('%j')) dyt = dy + 1 # day of the year tomorrow dg = dyt - 29 id_d = d * dg id = d29 + id_d #END |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Problem with recipe for Sueddeutsche Zeitung | amontiel69 | Recipes | 0 | 02-25-2011 12:05 PM |
PRS-505 Minor issue when putting books on my reader | Atagahi | Sony Reader | 1 | 12-09-2010 01:39 PM |
German: Sueddeutsche Zeitung is broken | kbaerwald | Recipes | 3 | 11-18-2010 06:57 AM |
Minor issue on searches | Sydney's Mom | Calibre | 6 | 06-14-2010 05:46 PM |
Mobi Conversion to EPub very minor issue | markbond1007 | Calibre | 1 | 08-06-2009 03:49 PM |