Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 09-12-2010, 04:06 AM   #2701
lady kay
Junior Member
lady kay began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Aug 2010
Device: sony prs 600
Private Eye

Hi

Has anyone done a recipe for Private Eye http://www.private-eye.co.uk/

Here's hoping.
lady kay is offline  
Old 09-12-2010, 05:55 AM   #2702
marbs
Zealot
marbs began at the beginning.
 
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
TheMarker recipe and some help needed

never programed in python. its different. anyway, this is my 1st recipe of themarker print version in hebrew (a financial newspaper from israel).

the recipe is in the attached file

i think it could be added to the built in news feeds for calibre once i work out the issues here.

and now some things i need help with.
1) some for the articles in the feed above do not get dounloaded. i get a blank page (with the link that the artical is no and "next page" and so on.)

2)i want to add a web page that does no have a RSS feed. it is the stock qoutes for the TA stock market. to get to the table you go to:
http://www.tase.co.il/TASE/MarketDat...&subDataType=0

find the table with the scroll bar (middle of the page). under that table are two links. an excel link and a link in hebrew to the left of the excel link. if you hit that link you get the correct table (http://www.tase.co.il/TASE/Managemen...9C%D7%9C%D7%99).

that is for TheMarker recipe.

i am trying to do 2 more things.
a recipe that will download all of yesterdays stock filing.
the link is
http://maya.tase.co.il/bursa/index.asp?view=yesterday
the things that are holding me back:
1)i need to down load a few pages (if you go to the bottom of the page you will see "1 of X" (1 מתוך X). i need all of the pages for that day.
2)if you click on any one of the links, you can see that some of the reports are in HTML form, some are in PDF form and some are in HTML but have a PDF in addition. i have no idea how to do anything with that. (see example next)

this one is pretty much the same as the last:
i want all the reports of a specific company from the past.
i go to http://maya.tase.co.il/bursa/indeximptoday.htm
there are 2 drop down menus in the box on the left. i choose a company in the 2nd menu (i will choose the last company on the list for the example. its called "תשואה 10").
i change the start date on the bottom of the box to 1/1/00 and hit search( the left hand button on the the bottom).
i got to http://maya.tase.co.il/bursa/index.a...company_press=
i want to download 23 pages if links.
the 1st report (from 19:24 31.08.10 link: http://maya.tase.co.il/bursa/report....port_cd=570152) is an html report.
the second report (from 18:18 31.08.10 link:http://maya.tase.co.il/bursa/report....port_cd=570085) is a PDF report (notice there are 2 PDF symbuls on the top marked "1" and "2". both need to be downloaded)
the 3rd report (from 15:53 19.08.10 link:http://maya.tase.co.il/bursa/report....port_cd=566255) is html and PDF. they both need to be in the news feed.

i dont know if what i asked here can be done, and i cant expect people to do it for me (i have no idea if it is a lot of work or not.), but i would love a point in the right direction if it is to much to ask.

very much for all the help in advance.
marbs
Attached Files
File Type: txt New Text Document1.txt (1.5 KB, 255 views)

Last edited by marbs; 09-12-2010 at 05:59 AM. Reason: recipe was not indented.
marbs is offline  
Old 09-12-2010, 05:58 AM   #2703
marbs
Zealot
marbs began at the beginning.
 
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
the recipe is not indented.

so here is a text file. sorry about that
Attached Files
File Type: txt New Text Document1.txt (1.5 KB, 256 views)
marbs is offline  
Old 09-12-2010, 12:19 PM   #2704
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by cynvision View Post
Ah yes. I'm still not comfortable with how the multiple page link following works. You'd have to follow the 'more articles' link at least once to get more than one article from that author.
Okay I'm sure there might be another way to do this and reduce the redundancy yet I'm not certain how to do that yet. Anyway, this will work. The only issue I see is the title says the same for all the articles (but I'll leave that one to you )
Spoiler:

Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup, re
class AlisonB(BasicNewsRecipe):
    title      = 'Alison Berkley Column'
    __author__ = 'Tonythebookworm'
    description = 'Some dudes column'
    language = 'en'
    no_stylesheets = True
    publisher           = 'Tonythebookworm'
    category            = 'column'
    use_embedded_content= False
    no_stylesheets      = True
    oldest_article      = 24
    remove_javascript   = True
    remove_empty_feeds  = True
    
    max_articles_per_feed = 10
    INDEX = 'http://www.aspentimes.com'
    
    
    def parse_index(self):
        feeds = []
        for title, url in [
                            (u"Alison Berkley", u"http://www.aspentimes.com/SECTION/&Profile=1021&ParentProfile=1061"),
                            
                            
                             ]:
            articles = self.make_links(url)
            if articles:
                feeds.append((title, articles))
        return feeds
        
    def make_links(self, url):
        title = 'Temp'
        current_articles = []
        soup = self.index_to_soup(url)
        print 'The soup is: ', soup
        for item in soup.findAll('div',attrs={'class':'title'}):
            print 'item is: ', item
            link = item.find('a')
            print 'the link is: ', link
            titlecheck = self.tag_to_string(link)
            #once we get a link we need to check to see if it contains Alison Berkley and if it does use it
            if link.find(text=re.compile('Alison Berkley')) :
                print 'FOUND TITLE AND IT IS : ', titlecheck
            
                url         = self.INDEX + link['href']
                title       = self.tag_to_string(link)
                print 'the title is: ', title
                print 'the url is: ', url
                print 'the title is: ', title
                current_articles.append({'title': title, 'url': url, 'description':'', 'date':''}) # append all this
         
        #FIND MORE LINKS HERE   
        counter = 0
        while counter <= 5:
         for item in soup.findAll('span',attrs={'class':'links'}):
           # print 'item is: ', item
            link = item.find('a')
            if link.find(text=re.compile('More Articles')):
               print 'counter is : ', counter
               url = self.INDEX + link['href']
               print 'THE NEXT URL IS: ', url
               soup = self.index_to_soup(url)
       
         for item in soup.findAll('div',attrs={'class':'title'}):
           
            link = item.find('a')
           
            titlecheck = self.tag_to_string(link)
            #once we get a link we need to check to see if it contains Alison Berkley and if it does use it
            if link.find(text=re.compile('Alison Berkley')) :
                print 'FOUND NEW TITLES AND IT IS : ', titlecheck
            
                url         = self.INDEX + link['href']
                title       = self.tag_to_string(link)
                print 'the title is: ', title
                print 'the url is: ', url
                print 'the title is: ', title
                current_articles.append({'title': title, 'url': url, 'description':'', 'date':''}) # append all this        
         counter +=1
            
           
        return current_articles
        
        
    def print_version(self, url):
        split1 = url.split("article")
        print 'THE SPLIT IS: ', split1 
        #original is: http://www.aspentimes.com/article/20100909/COLUMN/100909869/1021&parentprofile=1061
        #need this to be print_url:
        #http://www.aspentimes.com/apps/pbcs.dll/article?AID=/20100909/COLUMN/100909869/1021&parentprofile=1061&template=printart         
         
        print_url = 'http://www.aspentimes.com/apps/pbcs.dll/article?AID=' + split1[1] + '&template=printart'
        print 'THIS URL WILL PRINT: ', print_url # this is a test string to see what the url is it will return
        return print_url

Last edited by TonytheBookworm; 09-12-2010 at 04:45 PM. Reason: updated code to run 5 times
TonytheBookworm is offline  
Old 09-12-2010, 02:31 PM   #2705
rcoslow
Junior Member
rcoslow began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Sep 2010
Device: Kindle
Custom Receipts

Hi

Can some on create a fetch for Popular Science

Thanks
rcoslow is offline  
Old 09-12-2010, 04:39 PM   #2706
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
How is this done?
I know you can take and make a counter and then check to see if it is less than a number. However with an if statement it only get executed once.
How do you do a for loop that uses a counter?
Something like for (int i; i<=5, i++)
I tried
for counter <=5: with no go

Last edited by TonytheBookworm; 09-12-2010 at 04:44 PM. Reason: figured it out with a while
TonytheBookworm is offline  
Old 09-12-2010, 05:22 PM   #2707
kbrand
Enthusiast
kbrand began at the beginning.
 
Posts: 28
Karma: 10
Join Date: Jun 2010
Device: Sony 300
I wish I had some clue as to how to do this, but I am completely lost. Could someone make a recipe for the Colorado Springs Gazette at http://www.gazette.com. Thank You so much in advance!


Nevermind, figured it out, thank you

Last edited by kbrand; 09-12-2010 at 05:58 PM.
kbrand is offline  
Old 09-12-2010, 11:03 PM   #2708
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Outlook India Magazine Per Facebook request

Here is a working recipe for outlookindia.com
Rss feed: http://www.outlookindia.com/rss/main/magazine

I only done the magazine portion of the rss feeds.
Attached Files
File Type: rar outlookindia.rar (3.6 KB, 257 views)
TonytheBookworm is offline  
Old 09-13-2010, 12:35 AM   #2709
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Slightly puzzled and not sure what is going on here.
When I run this recipe at the console with
ebook-convert test.recipe output_dir --test -vv > myrecipe.txt

I end up getting a nice formatted article with no junk.
Then when i take and import it into calibre to fully test it. I get junk.

So, I went a step further and did this.

ebook-convert test.recipe myrecipe.mobi --test

And again I get nice pretty articles with no junk. So what could be going on that is different when I actually load it into calibre ? I can remove the tags but kinda hard to do that when they don't show up in the test

here is the code i'm working with
Spoiler:

Code:
from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1282101454(BasicNewsRecipe):
    title = 'Popular Science'
    language = 'en'
    __author__ = 'TonytheBookworm'
    description = 'Popular Science'
    publisher = 'Popular Science'
    category = 'gadgets,science'
    oldest_article = 7
    max_articles_per_feed = 100
    no_stylesheets = True
    remove_javascript = True
    #extra_css = '.headline {font-size: x-large;} \n .fact { padding-top: 10pt }'
    #masthead_url = 'http://gawand.org/wp-content/uploads/2010/06/ajc-logo.gif'
    #keep_only_tags    = [
     #                    dict(name='div', attrs={'class':['content']})
      #                 ,dict(attrs={'id':['cxArticleText','cxArticleBodyText']})
      #                  ]
    remove_tags = [dict(name='div', attrs={'id':['main_supplements']})]                     
    feeds          = [
                      
                      ('Gadgets', 'http://www.popsci.com/full-feed/gadgets'),
                      ('Cars', 'http://www.popsci.com/full-feed/cars'),
                      ('Science', 'http://www.popsci.com/full-feed/science'),
                      ('Technology', 'http://www.popsci.com/full-feed/technology'),
                      ('DIY', 'http://www.popsci.com/full-feed/diy'),
                      
                    ]




   # def print_version(self, url):
    #    return url.partition('?')[0] +'?printArticle=y'

TonytheBookworm is offline  
Old 09-13-2010, 10:43 AM   #2710
chainanim
Junior Member
chainanim began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Sep 2010
Device: kindle 2
Hi I am new here. Can any one assist in providing recipes for Indian magazines like Outlook Business (www.outlookbusiness.com), Open (www.openthemagazine.com) and international magazines like Fortune etc.
chainanim is offline  
Old 09-13-2010, 10:56 AM   #2711
chainanim
Junior Member
chainanim began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Sep 2010
Device: kindle 2
@Stewie1-I had the same query. Were you able to get any replies/recipes for this?


Quote:
Originally Posted by stewie1 View Post
The Financial Times recipe that's currently posted isn't the complete print edition. I'm a subscriber and trying to put something together that will allow me to get the day's print edition (http://www.ft.com/us-edition). Unfortunately, there is no RSS feed for this.

Can anyone help, either by putting a recipe together, or directing me to a template I might be able to use to give it a shot myself (note I am a complete novice at this).

Thanks.
chainanim is offline  
Old 09-13-2010, 04:22 PM   #2712
burbank_atl
Junior Member
burbank_atl began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Sep 2010
Device: Nook
Quote:
Originally Posted by chainanim View Post
@Stewie1-I had the same query. Were you able to get any replies/recipes for this?
Actually there are RSS feeds for almost the entire print edition at http://www.ft.com/servicestools/newstracking/rss.

Creating a recipe that would resemble the print edition would require deciding which feeds should be included and in what order.

For example, the Comment section contains the editorial parts of the daily edition. There isn't one feed, there are four. This is how the RSS feeds are setup.

I have been playing around with a version of the current recipe, but somehow I managed to completely screw it up. When time permits, I will try again.
burbank_atl is offline  
Old 09-13-2010, 05:47 PM   #2713
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by TonytheBookworm View Post
I end up getting a nice formatted article with no junk.
Then when i take and import it into calibre to fully test it. I get junk.
I get the same result both ways.
Starson17 is offline  
Old 09-13-2010, 06:02 PM   #2714
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by Starson17 View Post
I get the same result both ways.
Strange. I'll try rebooting and see what that does. Just curious when you say your getting the same results? are you get unjunked version or the junked version ? I would assume you are getting the version with the home and other crap in it. Which is what I was expecting but when i ran the test and it showed up clean I was saying great don't have to do as much work on it. Anyway thanks for testing and I will see what I come up with it. Just thought it was weird. One thing that might be causing me to see a clean version is the fact i run Ad Block. I forget to turn that off at times.
TonytheBookworm is offline  
Old 09-13-2010, 08:07 PM   #2715
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by TonytheBookworm View Post
Just curious when you say your getting the same results? are you get unjunked version or the junked version ? I would assume you are getting the version with the home and other crap in it. Which is what I was expecting but when i ran the test and it showed up clean I was saying great don't have to do as much work on it. Anyway thanks for testing and I will see what I come up with it. Just thought it was weird. One thing that might be causing me to see a clean version is the fact i run Ad Block. I forget to turn that off at times.
I'm getting a pretty clean version. I also run Adblock, but that only affects FireFox, not Calibre.
Starson17 is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 06:21 PM.


MobileRead.com is a privately owned, operated and funded community.