![]() |
#1 |
Member
![]() Posts: 18
Karma: 10
Join Date: Jan 2012
Device: Nook
|
Recipe for seekingalpha.com
Hi Kovid,
I am building a feed from seekingalpha.com. One thing I like is to include the "comments" from reader. I have tried very options and can not get this to work. Here is my situation: 1> Recipe file is in below 2> I run this in the debug mode, and check the "input" folder: the "comments" are not part of the html file at all. 3> I tried with commenting out all the keep_only_tag.append below. That will leave the keep_only_tag as "". the comments still not appear in the html file 4> I open the webpage and "view page source", the "comments" are not as part of the source. 5> However, when I exam with firebug, "comments" are there. 6> When I save the web page, the "comments" are there. 7> you may check out this page: http://seekingalpha.com/article/6431...nk?source=feed What did I do wrong? Am I missing something? I need some direction on this, please! Regards, class AdvancedUserRecipe1335053294(BasicNewsRecipe): title = u'SA RSS' no_stylesheets = True use_embedded_content = False remove_javascript = True auto_cleanup = False keep_only_tags = [] remove_tags = [] # heading keep_only_tags.append(dict(name='div', attrs={'id':'page_header'})) # author profile keep_only_tags.append(dict(name='div', attrs={'class':'the_pic'})) keep_only_tags.append(dict(name='div', attrs={'class':'followup_contributor_info_text'})) keep_only_tags.append(dict(name='div', attrs={'class':'author_info_nav'})) keep_only_tags.append(dict(name='div', attrs={'class':'user_followers_following'})) # article body keep_only_tags.append(dict(name='div', attrs={'id':'article_body'})) # comments # keep_only_tags.append(dict(name='div', attrs={'id':'content_follow_up'})) # keep_only_tags.append(dict(name='div', attrs={'class':'comments_with_more'})) # keep_only_tags.append(dict(name='div', attrs={'id':'comments_section'})) # keep_only_tags.append(dict(name='div', attrs={'id':'comments'})) # keep_only_tags.append(dict(name='ul', attrs={'id':'talkback_list'})) # keep_only_tags.append(dict(name='div', attrs={'id':'comment_container'})) # keep_only_tags.append(dict(name='div', attrs={'class':'base_level'})) # keep_only_tags.append(dict(name='div', attrs={'class':'com_cont'})) feeds = [(u'Most Popular Articles', u'http://seekingalpha.com/listing/most-popular-articles.xml')] |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,231
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
That will most likely be because the comments are being loaded via javascript. Turn off javascript in firefox and you likely wont see any comments. calibre's news download system doesn't support javascript.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Member
![]() Posts: 18
Karma: 10
Join Date: Jan 2012
Device: Nook
|
Kovid,
I checked that the "comments" are not part of the webpage once I disable javascript. Do you have any plan to include javascript for future release? |
![]() |
![]() |
![]() |
#4 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,231
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Not in the near future. Add javascript support to web scrapers is not a trivial task.
|
![]() |
![]() |
![]() |
#5 |
Member
![]() Posts: 18
Karma: 10
Join Date: Jan 2012
Device: Nook
|
Hi Kovid
I have a question from this scenario: 1> I am building the articles list without the RSS: articles.append(dict(title=title, url=url, description=desc, date=date)) 2> I like to control how the article is list within the same section I add another element to the “dict”. E.g.: articles.append(dict(sortseq=sortseq, title=title, url=url, description=desc, date=date)) 3> right before the “feeds.append”, I do a articles.sort() But it is not sort by “sortseq”. I have tried with these kind of format: s = sorted(s, key = lambda x: (x[1], x[2])) s = sorted(s, key = operator.itemgetter(1, 2)) s.sort(key = operator.itemgetter(1, 2)) Still have the issue. My questions are : - Am I ok to add “sortseq” to the list? - If yes, am I doing this right on the sort()? Do you have an example for me? Regards, |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,231
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
key=lambda x:x['sortseq']
|
![]() |
![]() |
![]() |
#7 |
Member
![]() Posts: 18
Karma: 10
Join Date: Jan 2012
Device: Nook
|
That works!
Thanks you very much. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Recipes - seekingalpha.com | cnfmsu | Recipes | 0 | 01-24-2012 07:56 PM |
Recipe works when mocked up as Python file, fails when converted to Recipe | ode | Recipes | 7 | 09-04-2011 04:57 AM |
New Recipe | UtahJames | Recipes | 3 | 04-18-2011 08:02 PM |
I need some help with a recipe | jefferson_frantz | Recipes | 14 | 11-22-2010 02:06 PM |
Recipe Help Please | estral | Calibre | 1 | 06-11-2009 02:35 PM |