MobileRead Forums - View Single Post - Custom recipes (archive, read-only)

TonytheBookworm · 09-24-2010, 01:45 AM

Starson17,
I need your help on this one if you gotta minute. I have been battling this feed which I would figure would be simple to do. But for some reason it is giving me trouble even with the basic. If i take the keep_only tag out it will work but of course I want to use that to get rid of the ads and all the other junk.
I have tried every dang tag I can think of by trying to filter it with firebug. This is what i have come up with so far. Basic for sure but I get no content when I keep only the tag that appears to be the parent. HELP

here is what I got so far. If you can just help me with the keep_only I think I can figure out the rest unless there is something screwy that I have never faced before going on here.
Here is what i have so far and thanks.

Spoiler:

edit:
alright I got it working but i'm confused on this. In previous feeds I have done i enter the feed address and it gets the link and uses it as the title and then the content that is listed under it parses part of it and uses it as a description. Well in this feed here the content is all on the feed page so it doesn't go to the actual link. In the code above I was assuming that it went to the links one by one inside the feed. I was trying to strip the content that the link showed.
So my question to you is, what determines if it uses the feed main page content (the one that has all the links on it) or if it navigates to each link? I hope you understand what I'm asking if not i will try to explain myself better.
this code here works cause for whatever reason the links on the feed page are not followed. but in other basic feeds i have simply done nothing more than add the feed and it follows the link

Spoiler:

09-24-2010, 01:45 AM	#2835
TonytheBookworm Addict Posts: 264 Karma: 62 Join Date: May 2010 Device: kindle 2, kindle 3, Kindle fire	Starson17, I need your help on this one if you gotta minute. I have been battling this feed which I would figure would be simple to do. But for some reason it is giving me trouble even with the basic. If i take the keep_only tag out it will work but of course I want to use that to get rid of the ads and all the other junk. I have tried every dang tag I can think of by trying to filter it with firebug. This is what i have come up with so far. Basic for sure but I get no content when I keep only the tag that appears to be the parent. HELP here is what I got so far. If you can just help me with the keep_only I think I can figure out the rest unless there is something screwy that I have never faced before going on here. Here is what i have so far and thanks. Spoiler: Code: from calibre.web.feeds.news import BasicNewsRecipe from calibre.ebooks.BeautifulSoup import BeautifulSoup, re class AdvancedUserRecipe1282101454(BasicNewsRecipe): title = 'How To Geek' language = 'en' __author__ = 'TonytheBookworm' description = 'Daily Computer Tips and Tricks' publisher = 'Howtogeek' category = 'PC,tips,tricks' oldest_article = 2 max_articles_per_feed = 100 linearize_tables = True no_stylesheets = True remove_javascript = True keep_only_tags = [ dict(name='div', attrs={'class':['yui-u']}) ] feeds = [ ('Tips', 'http://feeds.howtogeek.com/howtogeek') ] edit: alright I got it working but i'm confused on this. In previous feeds I have done i enter the feed address and it gets the link and uses it as the title and then the content that is listed under it parses part of it and uses it as a description. Well in this feed here the content is all on the feed page so it doesn't go to the actual link. In the code above I was assuming that it went to the links one by one inside the feed. I was trying to strip the content that the link showed. So my question to you is, what determines if it uses the feed main page content (the one that has all the links on it) or if it navigates to each link? I hope you understand what I'm asking if not i will try to explain myself better. this code here works cause for whatever reason the links on the feed page are not followed. but in other basic feeds i have simply done nothing more than add the feed and it follows the link Spoiler: Code: from calibre.web.feeds.news import BasicNewsRecipe from calibre.ebooks.BeautifulSoup import BeautifulSoup, re class AdvancedUserRecipe1282101454(BasicNewsRecipe): title = 'How To Geek' language = 'en' __author__ = 'TonytheBookworm' description = 'Daily Computer Tips and Tricks' publisher = 'Howtogeek' category = 'PC,tips,tricks' oldest_article = 2 max_articles_per_feed = 100 linearize_tables = True no_stylesheets = True remove_javascript = True remove_tags =[dict(name='a', attrs={'target':['_blank']}), dict(name='table', attrs={'id':['articleTable']}), dict(name='div', attrs={'class':['feedflare']}), ] feeds = [ ('Tips', 'http://feeds.howtogeek.com/howtogeek') ] Last edited by TonytheBookworm; 09-24-2010 at 02:03 AM. Reason: confused see addition