|  11-12-2010, 01:17 PM | #1 | 
| Member  Posts: 11 Karma: 14 Join Date: Nov 2010 Device: none |  Handelsblatt 
			
			hey Kovid, this is ready to be built in ;-) Code: import re
class Handelsblatt(BasicNewsRecipe):
    title          = u'Handelsblatt'
    oldest_article = 7
    max_articles_per_feed = 100
    no_stylesheets = True
    cover_url = 'http://www.handelsblatt.com/images/logo/logo_handelsblatt.com.png'
    language = 'de'
    keep_only_tags = []
    keep_only_tags.append(dict(name = 'div', attrs = {'class': 'structOneCol'}))
    keep_only_tags.append(dict(name = 'div', attrs = {'id': 'fullText'}))
    remove_tags    = [dict(name='img', attrs = {'src': 'http://www.handelsblatt.com/images/icon/loading.gif'})]
    feeds          = [
                        (u'Handelsblatt Exklusiv',u'http://www.handelsblatt.com/rss/exklusiv'),
                        (u'Handelsblatt Top-Themen',u'http://www.handelsblatt.com/rss/top-themen'),
                        (u'Handelsblatt Schlagzeilen',u'http://www.handelsblatt.com/rss/ticker/'),
                        (u'Handelsblatt Finanzen',u'http://www.handelsblatt.com/rss/finanzen/'),
                        (u'Handelsblatt Unternehmen',u'http://www.handelsblatt.com/rss/unternehmen/'),
                        (u'Handelsblatt Politik',u'http://www.handelsblatt.com/rss/politik/'),
                        (u'Handelsblatt Technologie',u'http://www.handelsblatt.com/rss/technologie/'),
                        (u'Handelsblatt Meinung',u'http://www.handelsblatt.com/rss/meinung'),
                        (u'Handelsblatt Magazin',u'http://www.handelsblatt.com/rss/magazin/'),
                        (u'Handelsblatt Weblogs',u'http://www.handelsblatt.com/rss/blogs')
                     ]
    extra_css = '''
        h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
        h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;}
        p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
        body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
        '''
    def print_version(self, url):
         m = re.search('(?<=;)[0-9]*', url)
         return u'http://www.handelsblatt.com/_b=' + str(m.group(0)) + ',_p=21,_t=ftprint,doc_page=0;printpage' | 
|   |   | 
|  11-14-2010, 08:03 AM | #2 | 
| Connoisseur  Posts: 57 Karma: 10 Join Date: Nov 2009 Device: Kindle 3 | 
			
			Cool! Kleine Anregung: Manche Artikel sind nur für Abonnenten freigeschaltet. Vielleicht kann man den entsprechenden Login noch hinterlegen ... | 
|   |   | 
|  02-02-2011, 01:51 AM | #3 | 
| Member Retired  Posts: 47 Karma: 10 Join Date: Oct 2010 Device: Kindle 3 | 
			
			Thank you very much! I tried it and it is awesome! If possible I would also appreciate the implementation of a subscriber's login. I is it possible to implement the picture shows on the Kindle? | 
|   |   | 
|  02-19-2011, 03:07 AM | #4 | 
| Member Retired  Posts: 47 Karma: 10 Join Date: Oct 2010 Device: Kindle 3 | 
			
			Oh no, Handelsblatt has updated their homepage... it is not working anymore    | 
|   |   | 
|  02-19-2011, 03:07 PM | #5 | 
| Member Retired  Posts: 47 Karma: 10 Join Date: Oct 2010 Device: Kindle 3 | 
			
			Okay guys, sorry for the double (technically even triple) post. But I really put some effort on this and hope somebody is willing to help. I think Handelsblatt basically changed the way they are linking a print-version to an article. Here an example: regular: http://www.handelsblatt.com/politik/...t/3862170.html print: http://www.handelsblatt.com/politik/...t,3862170.html So I thought all I have to do is to change the def print_version on the very bottom Code: import re
from calibre.web.feeds.news import BasicNewsRecipe
class Handelsblatt(BasicNewsRecipe):
    title          = u'Handelsblatt2'
    __author__ = 'malfi'
    oldest_article = 7
    max_articles_per_feed = 100
    no_stylesheets = True
    cover_url = 'http://www.handelsblatt.com/images/logo/logo_handelsblatt.com.png'
    language = 'de'
    keep_only_tags = []
    keep_only_tags.append(dict(name = 'div', attrs = {'class': 'structOneCol'}))
    keep_only_tags.append(dict(name = 'div', attrs = {'id': 'fullText'}))
    remove_tags    = [dict(name='img', attrs = {'src': 'http://www.handelsblatt.com/images/icon/loading.gif'})]
    feeds          = [
                        (u'Handelsblatt Exklusiv',u'http://www.handelsblatt.com/rss/exklusiv'),
                        (u'Handelsblatt Top-Themen',u'http://www.handelsblatt.com/rss/top-themen'),
                        (u'Handelsblatt Schlagzeilen',u'http://www.handelsblatt.com/rss/ticker/'),
                        (u'Handelsblatt Finanzen',u'http://www.handelsblatt.com/rss/finanzen/'),
                        (u'Handelsblatt Unternehmen',u'http://www.handelsblatt.com/rss/unternehmen/'),
                        (u'Handelsblatt Politik',u'http://www.handelsblatt.com/rss/politik/'),
                        (u'Handelsblatt Technologie',u'http://www.handelsblatt.com/rss/technologie/'),
                        (u'Handelsblatt Meinung',u'http://www.handelsblatt.com/rss/meinung'),
                        (u'Handelsblatt Magazin',u'http://www.handelsblatt.com/rss/magazin/'),
                        (u'Handelsblatt Weblogs',u'http://www.handelsblatt.com/rss/blogs')
                     ]
    extra_css = '''
        h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
        h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;}
        p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
        body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
        '''
    def print_version(self, url):
         m = re.search('[0-9]*(?=\.html)', url)
         n = re.search('.(?=[0-9]*\.html)',url)
         return str(n.group(0)) + 'v_detail_tab_print,' + str(m.group(0)) + '.html'Code:  m = re.search('[0-9]*(?=\.html)', url)Code: n = re.search('.(?=[0-9]*\.html)',url)Code: return str(n.group(0)) + 'v_detail_tab_print,' + str(m.group(0)) + '.html' Unfortunately it doesn't work but I have this feeling that I am pretty close to the solution but just have done a small error somewhere! Can anybody see it? Thank you! | 
|   |   | 
|  03-20-2011, 08:24 PM | #6 | 
| creator of calibre            Posts: 45,604 Karma: 28548974 Join Date: Oct 2006 Location: Mumbai, India Device: Various | 
			
			a much easier way is url = url.split('/') url[-1] = 'v_detail_tab_print,'+url[-1] return '/'.join(url) | 
|   |   | 
|  03-23-2011, 11:13 AM | #7 | 
| Connoisseur  Posts: 57 Karma: 10 Join Date: Feb 2010 Device: Kindle Paperwhite 1 | 
			
			Regrettably doesn't work neither, this identifies the articles, downloads their description for contents section, but the body of the article only contains hyperlink to it.
		 | 
|   |   | 
|  03-23-2011, 11:30 AM | #8 | 
| Wizard            Posts: 4,004 Karma: 177841 Join Date: Dec 2009 Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T | 
			
			Kovid's code produces the print link described above, but it does it more easily.  If that link doesn't work, then it isn't the right one or there's something else wrong with the recipe.
		 | 
|   |   | 
|  03-23-2011, 07:08 PM | #9 | 
| Enthusiast            Posts: 25 Karma: 4472 Join Date: Jan 2011 Device: Kindle | 
			
			I didn't get the print version to work, but found the standard site surprisingly manageable with the keep_tags option. The attached version should be ok. The style sheet is rather messy, though, it would be nice if  the print version could be fixed. Spoiler: 
 | 
|   |   | 
|  03-25-2011, 09:59 PM | #10 | 
| Member Retired  Posts: 47 Karma: 10 Join Date: Oct 2010 Device: Kindle 3 | 
			
			I tried Kovid's version as well and also found that it only produces hyperlinks. But when I look at the details of fetching the article it says it fetches the standard version of the article not the printing version. Code: Fetching http://www.handelsblatt.com/panorama/aus-aller-welt/lage-in-fukushima-immer-dramatischer/3991192.html | 
|   |   | 
|  03-26-2011, 04:43 PM | #11 | |
| Member Retired  Posts: 47 Karma: 10 Join Date: Oct 2010 Device: Kindle 3 | Quote: 
   | |
|   |   | 
|  04-19-2011, 12:40 AM | #12 | 
| Member Retired  Posts: 47 Karma: 10 Join Date: Oct 2010 Device: Kindle 3 | 
			
			Has anybody any further ideas to solve this? I keep hoping to see "Handelsblatt" at recipe updates on each calibre update but I never see it. And I am afraid my means of programming are at an end...
		 | 
|   |   | 
|  04-20-2011, 06:52 PM | #13 | |
| Enthusiast   Posts: 43 Karma: 136 Join Date: Mar 2011 Device: Kindle Paperwhite | Quote: 
 I changed the remove_tags to a remove_tags_before and _after, and removed the non-existing logo. Try this: Spoiler: 
 I just ran this recipe successfully. It takes kind of long (10 minutes) and a mobi for Kindle is about 5 MB, I think mostly because of the many images. Maybe it would also be sufficient to reduce the number of feeds. Let me know if you are happy how it is or if you want to change something | |
|   |   | 
|  04-20-2011, 11:08 PM | #14 | 
| Member Retired  Posts: 47 Karma: 10 Join Date: Oct 2010 Device: Kindle 3 | 
			
			Thank you so much aerodynamik! It finally seems to work again! Thank you! I really do appreciate it. And on first sight it even seems better than ever before! I don't know what went wrong before... | 
|   |   | 
|  04-23-2011, 01:33 PM | #15 | 
| Connoisseur  Posts: 57 Karma: 10 Join Date: Feb 2010 Device: Kindle Paperwhite 1 | 
			
			Wow! Vielen Dank, aerodynamik, endlich funktioniert es!
		 | 
|   |   | 
|  | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Recipe Request for Handelsblatt [GER] | Moik | Recipes | 6 | 10-15-2010 07:13 PM |