Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 09-22-2010, 03:05 PM   #2821
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Thetasquared View Post
It worked! downloaded the issue and it looks fine in Calibre (not near my ereader right now). I know next to nothing about coding, so I don't know what to clean up. Only thing I noticed is that the words "Fire & Ice" are in the code. Does that mean if the current issue changes it will still download the "fire & ice" issue?

I love Science News too! Thank you for your time and effort in making this recipe work. It will add tons of enjoyment to my life! umm.. and knowledge. will increase my knowledge of science stuff :-)
You can delete the fire and ice line that starts with #. It's a comment to remind me of the structure of the links needed to be recursed. Next I'll look it over closely, then submit a final version to Kovid. I may want to do a little cleaning, and I need to credit the original author(s) of the previous ScienceNews recipe. I expected to have to write it from scratch, but I was able to use most of the earlier recipe.

Last edited by Starson17; 09-22-2010 at 03:20 PM.
Starson17 is offline  
Old 09-22-2010, 04:36 PM   #2822
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by Starson17 View Post
1) It has class="cd_mainarticle", not class="cdmainarticle",
2) It has inline style on your header. Strip that first:
thanks. I was looking at the page (after it ran through the recipe, DUH HAHA) that is why i had cdmainarticle but i went back and looked at the original page and went well duh there it is clear as day with the _

thanks again
TonytheBookworm is offline  
Old 09-22-2010, 04:38 PM   #2823
krunk
Member
krunk began at the beginning.
 
Posts: 19
Karma: 10
Join Date: Feb 2010
Location: Los Angeles, CA
Device: Kindle 3
Quote:
Originally Posted by Starson17 View Post
AFAIK, it appears in stylesheet.css.
They're not appearing in the sylesheet.css either (the recursive grep would catch it) but i'll do a deeper inspection there.

Quote:
Originally Posted by Starson17 View Post
You need indents in the extra_css, just like elsewhere, or they'll get ignored.
Is this a quirk of the calibre library? It's not a python syntax rule. A python syntax error would also throw an exception.


Code:
>>> class A(object):
...     foo = """
... This is a docstring. No 
... indents necessary within the block.
... """
... 
>>> a = A()
>>> a.foo
'\nThis is a docstring. No \nindents necessary within the block.\n'
krunk is offline  
Old 09-22-2010, 04:46 PM   #2824
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by krunk View Post
They're not appearing in the sylesheet.css either (the recursive grep would catch it) but i'll do a deeper inspection there.
Yes, the grep should have caught it if it was there. I'd make sure you strip out the stylesheet and the internal style attributes. That's usually the problem with extra_css not showing up. Examine the page of interest, see if it has internal styles, and if so, try:
Code:
    def preprocess_html(self, soup):
        for item in soup.findAll(attrs={'style':True}):
            del item['style']
        return soup
Starson17 is offline  
Old 09-22-2010, 05:12 PM   #2825
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by Starson17 View Post
Here's a quick and dirty version. Why don't you look it over and spot what needs to get cleaned up better. Post here and I'll address it. I really like Science News.
cool I just learned something from you. the match_regex is great. I would have done that with the make_links() like you showed me in the past. but i seen the match_regex and was wondering okay what the heck does this do. then i see well cool he looks at the page and fines those links and follows only those links. thanks for using that
TonytheBookworm is offline  
Old 09-22-2010, 07:36 PM   #2826
Flexicat
Junior Member
Flexicat began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Aug 2010
Device: Kobo
Quote:
Originally Posted by Starson17 View Post
I'll take a look at it. IIRC, I spotted this error a while back and wrote some code to bypass it, but didn't see anyone complaining and never got around to uploading it. I'll hunt it up and post it. It's not you.

Edit: I checked and apparently, I did upload the revised recipe. I tested the current built in and it works fine. Are you perhaps using an earlier version that I uploaded here? If you are, switch to the built in that is now supplied with Calibre. The error you are getting looks like the error from the earlier version.
Thank you Starson17, that was the issue.
Flexicat is offline  
Old 09-22-2010, 08:42 PM   #2827
bhandarisaurabh
Enthusiast
bhandarisaurabh began at the beginning.
 
Posts: 49
Karma: 10
Join Date: Aug 2009
Device: none
Quote:
Originally Posted by TonytheBookworm View Post
alright, first lets not piggyback but yet make our own version since the feeds are different and all. With that being said, I had to test the code to get it correct because the index on the split is 0 based. and also the very last index was blank so even though lets say the length of the split array was 8 then the id would be in the 6th position. so i just idnum = len(split1) -2
anyway this code works.
Spoiler:

Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup, re
class AdvancedUserRecipe1282101454(BasicNewsRecipe):
    title = 'Business Standard modified'
    language = 'en'
    __author__ = 'TonytheBookworm'
    description = 'Business Standard modified'
    publisher = 'Business Standard'
    category = ''
    oldest_article = 5
    max_articles_per_feed = 100
    no_stylesheets = True
    #extra_css = '.headline {font-size: x-large;} \n .fact { padding-top: 10pt }'
    #masthead_url = 'http://gawand.org/wp-content/uploads/2010/06/ajc-logo.gif'
    #keep_only_tags    = [
     #                    dict(name='div', attrs={'class':['blogEntryHeader','blogEntryContent']})
      #                 ,dict(attrs={'id':['cxArticleText','cxArticleBodyText']})
      #                  ]
    feeds = [
             (u'Todays Newspaper',u'http://feeds.business-standard.com/rss/paper.xml'),
             (u'Banking & finance',u'http://feeds.business-standard.com/rss/1.xml'),
             (u'Companies & Industry', u'http://feeds.business-standard.com/rss/2.xml'),
             (u'Economy & Policy'    , u'http://feeds.business-standard.com/rss/3.xml'),
             (u'Opinion and analysis', u'http://feeds.business-standard.com/rss/5_0.xml'),
             (u'Life & Leisure'      , u'http://feeds.business-standard.com/rss/6_0.xml'),
             (u'Markets & Investing' , u'http://feeds.business-standard.com/rss/12.xml'),
             (u'Management & Mktg'   , u'http://feeds.business-standard.com/rss/7_0.xml'),
             (u'Tech World',u'http://feeds.business-standard.com/rss/8_0.xml'),
            ]
    def print_version(self, url):
        split1 = url.split("/")
        print 'ORG URL IS: ', url
        id = len(split1)-2 # had to offset it by 2 because it is 0 based and also the last index is blank 
        idnum = split1[id] # get the actual value of the id article
        print 'the idnum is: ', idnum
        print_url = 'http://www.business-standard.com/india/printpage.php?autono=' + idnum + '&tp='
        print 'PRINT URL IS: ', print_url
        return print_url
thanks a ton its working fine
bhandarisaurabh is offline  
Old 09-22-2010, 08:43 PM   #2828
bhandarisaurabh
Enthusiast
bhandarisaurabh began at the beginning.
 
Posts: 49
Karma: 10
Join Date: Aug 2009
Device: none
there is already a recipe for foreign policy but it covers rss feeds can anyone make the recipe for print edition
http://www.foreignpolicy.com/issues/current
thanks in advance
bhandarisaurabh is offline  
Old 09-22-2010, 10:05 PM   #2829
jenden
Junior Member
jenden began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Sep 2010
Device: kindle dx
Could you please create a Kindle recipe for the french version of the Jerusalem post.
http://fr.jpost.com/

Thanks

Last edited by jenden; 09-22-2010 at 11:06 PM.
jenden is offline  
Old 09-22-2010, 11:36 PM   #2830
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by bhandarisaurabh View Post
there is already a recipe for foreign policy but it covers rss feeds can anyone make the recipe for print edition
http://www.foreignpolicy.com/issues/current
thanks in advance
I know this is gonna sound rude but just curious why can't you try to do it now?
I know I have personally done 5 or better recipes for you and gave you detailed tips on how to do it. I have no problem what so ever helping you and I think I speak for the rest of us here when I say give it a try. Post some of your code, ask specific questions, search the built in recipes, search this forum starting with my first post and work yourself foward. I didn't know anything about this at all other than the fact that it could be done with a will. So, may I suggest trying to learn how to do it so you can join us in making calibre better by contributing your recipes. I hope you understand where I'm coming from and hate to sound off base.
Once again, I am here to help and wouldn't know what I know without the help of others. Yet, the only way you are gonna learn this stuff is by doing it
TonytheBookworm is offline  
Old 09-23-2010, 05:35 AM   #2831
rayh
Member
rayh began at the beginning.
 
Posts: 24
Karma: 10
Join Date: Mar 2010
Location: Australia
Device: Kindle latest Generation
Is there any chance someone could produce a recipe for an Melbourne, Australian newspaper called Herald Sun.

Thanks Ray
rayh is offline  
Old 09-23-2010, 06:39 AM   #2832
bhandarisaurabh
Enthusiast
bhandarisaurabh began at the beginning.
 
Posts: 49
Karma: 10
Join Date: Aug 2009
Device: none
Quote:
Originally Posted by TonytheBookworm View Post
I know this is gonna sound rude but just curious why can't you try to do it now?
I know I have personally done 5 or better recipes for you and gave you detailed tips on how to do it. I have no problem what so ever helping you and I think I speak for the rest of us here when I say give it a try. Post some of your code, ask specific questions, search the built in recipes, search this forum starting with my first post and work yourself foward. I didn't know anything about this at all other than the fact that it could be done with a will. So, may I suggest trying to learn how to do it so you can join us in making calibre better by contributing your recipes. I hope you understand where I'm coming from and hate to sound off base.
Once again, I am here to help and wouldn't know what I know without the help of others. Yet, the only way you are gonna learn this stuff is by doing it
okay I will try it I am not a programmer but I will try to understand can you give me some link from where I can learn about parsing the whole page
bhandarisaurabh is offline  
Old 09-23-2010, 07:45 AM   #2833
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by TonytheBookworm View Post
cool I just learned something from you. the match_regex is great. I would have done that with the make_links() like you showed me in the past. but i seen the match_regex and was wondering okay what the heck does this do. then i see well cool he looks at the page and fines those links and follows only those links. thanks for using that
Note that first I turned on recursion. The match_regex is to prevent recursion from crawling all over the web to unrelated places.
Starson17 is offline  
Old 09-23-2010, 12:38 PM   #2834
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by rayh View Post
Is there any chance someone could produce a recipe for an Melbourne, Australian newspaper called Herald Sun.

Thanks Ray

Please goto http://www.heraldsun.com.au/help/rss and tell me which feeds you would like and I will work on it for you.

I will do the breaking news feed for now and await your reply.


Edit: I went ahead and done the whole thing. I commented out the AFL teams and you can pick whichever one you like.
Attached Files
File Type: rar Heraldsun_au.rar (1.7 KB, 387 views)

Last edited by TonytheBookworm; 09-23-2010 at 01:59 PM. Reason: added code
TonytheBookworm is offline  
Old 09-24-2010, 01:45 AM   #2835
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Starson17,
I need your help on this one if you gotta minute. I have been battling this feed which I would figure would be simple to do. But for some reason it is giving me trouble even with the basic. If i take the keep_only tag out it will work but of course I want to use that to get rid of the ads and all the other junk.
I have tried every dang tag I can think of by trying to filter it with firebug. This is what i have come up with so far. Basic for sure but I get no content when I keep only the tag that appears to be the parent. HELP
here is what I got so far. If you can just help me with the keep_only I think I can figure out the rest unless there is something screwy that I have never faced before going on here.
Here is what i have so far and thanks.
Spoiler:

Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup, re
class AdvancedUserRecipe1282101454(BasicNewsRecipe):
    title = 'How To Geek'
    language = 'en'
    __author__ = 'TonytheBookworm'
    description = 'Daily Computer Tips and Tricks'
    publisher = 'Howtogeek'
    category = 'PC,tips,tricks'
    oldest_article = 2
    max_articles_per_feed = 100
    linearize_tables = True
    no_stylesheets = True
    remove_javascript   = True
    
    
    keep_only_tags    = [
                         dict(name='div', attrs={'class':['yui-u']})
                     
                        ]
    
    feeds          = [
                      ('Tips', 'http://feeds.howtogeek.com/howtogeek')
                      
                    ]


edit:
alright I got it working but i'm confused on this. In previous feeds I have done i enter the feed address and it gets the link and uses it as the title and then the content that is listed under it parses part of it and uses it as a description. Well in this feed here the content is all on the feed page so it doesn't go to the actual link. In the code above I was assuming that it went to the links one by one inside the feed. I was trying to strip the content that the link showed.
So my question to you is, what determines if it uses the feed main page content (the one that has all the links on it) or if it navigates to each link? I hope you understand what I'm asking if not i will try to explain myself better.
this code here works cause for whatever reason the links on the feed page are not followed. but in other basic feeds i have simply done nothing more than add the feed and it follows the link
Spoiler:

Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup, re
class AdvancedUserRecipe1282101454(BasicNewsRecipe):
    title = 'How To Geek'
    language = 'en'
    __author__ = 'TonytheBookworm'
    description = 'Daily Computer Tips and Tricks'
    publisher = 'Howtogeek'
    category = 'PC,tips,tricks'
    oldest_article = 2
    max_articles_per_feed = 100
    linearize_tables = True
    no_stylesheets = True
    remove_javascript   = True
    
    
    
    
    remove_tags =[dict(name='a', attrs={'target':['_blank']}),
                  dict(name='table', attrs={'id':['articleTable']}),
                  dict(name='div',   attrs={'class':['feedflare']}),
                  ]
                   
    feeds          = [
                      ('Tips', 'http://feeds.howtogeek.com/howtogeek')
                      
                    ]

Last edited by TonytheBookworm; 09-24-2010 at 02:03 AM. Reason: confused see addition
TonytheBookworm is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 10:33 AM.


MobileRead.com is a privately owned, operated and funded community.