![]() |
#1 |
Member
![]() Posts: 11
Karma: 14
Join Date: Nov 2010
Device: none
|
![]()
hey Kovid, this is ready to be built in ;-)
Code:
import re class Handelsblatt(BasicNewsRecipe): title = u'Handelsblatt' oldest_article = 7 max_articles_per_feed = 100 no_stylesheets = True cover_url = 'http://www.handelsblatt.com/images/logo/logo_handelsblatt.com.png' language = 'de' keep_only_tags = [] keep_only_tags.append(dict(name = 'div', attrs = {'class': 'structOneCol'})) keep_only_tags.append(dict(name = 'div', attrs = {'id': 'fullText'})) remove_tags = [dict(name='img', attrs = {'src': 'http://www.handelsblatt.com/images/icon/loading.gif'})] feeds = [ (u'Handelsblatt Exklusiv',u'http://www.handelsblatt.com/rss/exklusiv'), (u'Handelsblatt Top-Themen',u'http://www.handelsblatt.com/rss/top-themen'), (u'Handelsblatt Schlagzeilen',u'http://www.handelsblatt.com/rss/ticker/'), (u'Handelsblatt Finanzen',u'http://www.handelsblatt.com/rss/finanzen/'), (u'Handelsblatt Unternehmen',u'http://www.handelsblatt.com/rss/unternehmen/'), (u'Handelsblatt Politik',u'http://www.handelsblatt.com/rss/politik/'), (u'Handelsblatt Technologie',u'http://www.handelsblatt.com/rss/technologie/'), (u'Handelsblatt Meinung',u'http://www.handelsblatt.com/rss/meinung'), (u'Handelsblatt Magazin',u'http://www.handelsblatt.com/rss/magazin/'), (u'Handelsblatt Weblogs',u'http://www.handelsblatt.com/rss/blogs') ] extra_css = ''' h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;} h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;} p{font-family:Arial,Helvetica,sans-serif;font-size:small;} body{font-family:Helvetica,Arial,sans-serif;font-size:small;} ''' def print_version(self, url): m = re.search('(?<=;)[0-9]*', url) return u'http://www.handelsblatt.com/_b=' + str(m.group(0)) + ',_p=21,_t=ftprint,doc_page=0;printpage' |
![]() |
![]() |
![]() |
#2 |
Connoisseur
![]() Posts: 57
Karma: 10
Join Date: Nov 2009
Device: Kindle 3
|
Cool!
Kleine Anregung: Manche Artikel sind nur für Abonnenten freigeschaltet. Vielleicht kann man den entsprechenden Login noch hinterlegen ... |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Member Retired
![]() Posts: 47
Karma: 10
Join Date: Oct 2010
Device: Kindle 3
|
Thank you very much! I tried it and it is awesome!
If possible I would also appreciate the implementation of a subscriber's login. I is it possible to implement the picture shows on the Kindle? |
![]() |
![]() |
![]() |
#4 |
Member Retired
![]() Posts: 47
Karma: 10
Join Date: Oct 2010
Device: Kindle 3
|
Oh no, Handelsblatt has updated their homepage... it is not working anymore
![]() |
![]() |
![]() |
![]() |
#5 |
Member Retired
![]() Posts: 47
Karma: 10
Join Date: Oct 2010
Device: Kindle 3
|
Okay guys, sorry for the double (technically even triple) post. But I really put some effort on this and hope somebody is willing to help.
I think Handelsblatt basically changed the way they are linking a print-version to an article. Here an example: regular: http://www.handelsblatt.com/politik/...t/3862170.html print: http://www.handelsblatt.com/politik/...t,3862170.html So I thought all I have to do is to change the def print_version on the very bottom Code:
import re from calibre.web.feeds.news import BasicNewsRecipe class Handelsblatt(BasicNewsRecipe): title = u'Handelsblatt2' __author__ = 'malfi' oldest_article = 7 max_articles_per_feed = 100 no_stylesheets = True cover_url = 'http://www.handelsblatt.com/images/logo/logo_handelsblatt.com.png' language = 'de' keep_only_tags = [] keep_only_tags.append(dict(name = 'div', attrs = {'class': 'structOneCol'})) keep_only_tags.append(dict(name = 'div', attrs = {'id': 'fullText'})) remove_tags = [dict(name='img', attrs = {'src': 'http://www.handelsblatt.com/images/icon/loading.gif'})] feeds = [ (u'Handelsblatt Exklusiv',u'http://www.handelsblatt.com/rss/exklusiv'), (u'Handelsblatt Top-Themen',u'http://www.handelsblatt.com/rss/top-themen'), (u'Handelsblatt Schlagzeilen',u'http://www.handelsblatt.com/rss/ticker/'), (u'Handelsblatt Finanzen',u'http://www.handelsblatt.com/rss/finanzen/'), (u'Handelsblatt Unternehmen',u'http://www.handelsblatt.com/rss/unternehmen/'), (u'Handelsblatt Politik',u'http://www.handelsblatt.com/rss/politik/'), (u'Handelsblatt Technologie',u'http://www.handelsblatt.com/rss/technologie/'), (u'Handelsblatt Meinung',u'http://www.handelsblatt.com/rss/meinung'), (u'Handelsblatt Magazin',u'http://www.handelsblatt.com/rss/magazin/'), (u'Handelsblatt Weblogs',u'http://www.handelsblatt.com/rss/blogs') ] extra_css = ''' h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;} h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;} p{font-family:Arial,Helvetica,sans-serif;font-size:small;} body{font-family:Helvetica,Arial,sans-serif;font-size:small;} ''' def print_version(self, url): m = re.search('[0-9]*(?=\.html)', url) n = re.search('.(?=[0-9]*\.html)',url) return str(n.group(0)) + 'v_detail_tab_print,' + str(m.group(0)) + '.html' Code:
m = re.search('[0-9]*(?=\.html)', url) Code:
n = re.search('.(?=[0-9]*\.html)',url) Code:
return str(n.group(0)) + 'v_detail_tab_print,' + str(m.group(0)) + '.html' Unfortunately it doesn't work but I have this feeling that I am pretty close to the solution but just have done a small error somewhere! Can anybody see it? Thank you! |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,187
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
a much easier way is
url = url.split('/') url[-1] = 'v_detail_tab_print,'+url[-1] return '/'.join(url) |
![]() |
![]() |
![]() |
#7 |
Connoisseur
![]() Posts: 57
Karma: 10
Join Date: Feb 2010
Device: Kindle Paperwhite 1
|
Regrettably doesn't work neither, this identifies the articles, downloads their description for contents section, but the body of the article only contains hyperlink to it.
|
![]() |
![]() |
![]() |
#8 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Kovid's code produces the print link described above, but it does it more easily. If that link doesn't work, then it isn't the right one or there's something else wrong with the recipe.
|
![]() |
![]() |
![]() |
#9 |
Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 24
Karma: 4472
Join Date: Jan 2011
Device: Kindle
|
I didn't get the print version to work, but found the standard site surprisingly manageable with the keep_tags option. The attached version should be ok. The style sheet is rather messy, though, it would be nice if the print version could be fixed.
Spoiler:
|
![]() |
![]() |
![]() |
#10 |
Member Retired
![]() Posts: 47
Karma: 10
Join Date: Oct 2010
Device: Kindle 3
|
I tried Kovid's version as well and also found that it only produces hyperlinks. But when I look at the details of fetching the article it says it fetches the standard version of the article not the printing version.
Code:
Fetching http://www.handelsblatt.com/panorama/aus-aller-welt/lage-in-fukushima-immer-dramatischer/3991192.html |
![]() |
![]() |
![]() |
#11 | |
Member Retired
![]() Posts: 47
Karma: 10
Join Date: Oct 2010
Device: Kindle 3
|
Quote:
![]() |
|
![]() |
![]() |
![]() |
#12 |
Member Retired
![]() Posts: 47
Karma: 10
Join Date: Oct 2010
Device: Kindle 3
|
Has anybody any further ideas to solve this? I keep hoping to see "Handelsblatt" at recipe updates on each calibre update but I never see it. And I am afraid my means of programming are at an end...
|
![]() |
![]() |
![]() |
#13 | |
Enthusiast
![]() ![]() Posts: 43
Karma: 136
Join Date: Mar 2011
Device: Kindle Paperwhite
|
Quote:
I changed the remove_tags to a remove_tags_before and _after, and removed the non-existing logo. Try this: Spoiler:
I just ran this recipe successfully. It takes kind of long (10 minutes) and a mobi for Kindle is about 5 MB, I think mostly because of the many images. Maybe it would also be sufficient to reduce the number of feeds. Let me know if you are happy how it is or if you want to change something |
|
![]() |
![]() |
![]() |
#14 |
Member Retired
![]() Posts: 47
Karma: 10
Join Date: Oct 2010
Device: Kindle 3
|
Thank you so much aerodynamik! It finally seems to work again! Thank you! I really do appreciate it. And on first sight it even seems better than ever before!
I don't know what went wrong before... |
![]() |
![]() |
![]() |
#15 |
Connoisseur
![]() Posts: 57
Karma: 10
Join Date: Feb 2010
Device: Kindle Paperwhite 1
|
Wow! Vielen Dank, aerodynamik, endlich funktioniert es!
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Recipe Request for Handelsblatt [GER] | Moik | Recipes | 6 | 10-15-2010 07:13 PM |