Custom recipes (archive, read-only) - Page 181

lady kay · 09-12-2010, 04:06 AM

Hi

Has anyone done a recipe for Private Eye http://www.private-eye.co.uk/

Here's hoping.

marbs · 09-12-2010, 05:55 AM

never programed in python. its different. anyway, this is my 1st recipe of themarker print version in hebrew (a financial newspaper from israel).

the recipe is in the attached file

i think it could be added to the built in news feeds for calibre once i work out

the issues here.

and now some things i need help with.
1) some for the articles in the feed above do not get dounloaded. i get a blank page (with the link that the artical is no and "next page" and so on.)

2)i want to add a web page that does no have a RSS feed. it is the stock qoutes for the TA stock market. to get to the table you go to:
http://www.tase.co.il/TASE/MarketDat...&subDataType=0

find the table with the scroll bar (middle of the page). under that table are two links. an excel link and a link in hebrew to the left of the excel link. if you hit that link you get the correct table (http://www.tase.co.il/TASE/Managemen...9C%D7%9C%D7%99).

that is for TheMarker recipe.

i am trying to do 2 more things.
a recipe that will download all of yesterdays stock filing.
the link is
http://maya.tase.co.il/bursa/index.asp?view=yesterday
the things that are holding me back:
1)i need to down load a few pages (if you go to the bottom of the page you will see "1 of X" (1 מתוך X). i need all of the pages for that day.
2)if you click on any one of the links, you can see that some of the reports are in HTML form, some are in PDF form and some are in HTML but have a PDF in addition. i have no idea how to do anything with that. (see example next)

this one is pretty much the same as the last:
i want all the reports of a specific company from the past.
i go to http://maya.tase.co.il/bursa/indeximptoday.htm
there are 2 drop down menus in the box on the left. i choose a company in the 2nd menu (i will choose the last company on the list for the example. its called "תשואה 10").
i change the start date on the bottom of the box to 1/1/00 and hit search( the left hand button on the the bottom).
i got to http://maya.tase.co.il/bursa/index.a...company_press=
i want to download 23 pages if links.
the 1st report (from 19:24 31.08.10 link: http://maya.tase.co.il/bursa/report....port_cd=570152) is an html report.
the second report (from 18:18 31.08.10 link:http://maya.tase.co.il/bursa/report....port_cd=570085) is a PDF report (notice there are 2 PDF symbuls on the top marked "1" and "2". both need to be downloaded)
the 3rd report (from 15:53 19.08.10 link:http://maya.tase.co.il/bursa/report....port_cd=566255) is html and PDF. they both need to be in the news feed.

i dont know if what i asked here can be done, and i cant expect people to do it for me (i have no idea if it is a lot of work or not.), but i would love a point in the right direction if it is to much to ask.

very much for all the help in advance.
marbs

marbs · 09-12-2010, 05:58 AM

so here is a text file. sorry about that

TonytheBookworm · 09-12-2010, 12:19 PM

Quote:

Originally Posted by cynvision

Ah yes. I'm still not comfortable with how the multiple page link following works. You'd have to follow the 'more articles' link at least once to get more than one article from that author.

Okay I'm sure there might be another way to do this and reduce the redundancy yet I'm not certain how to do that yet. Anyway, this will work. The only issue I see is the title says the same for all the articles (but I'll leave that one to you

)

Spoiler:

Code:

from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup, re
class AlisonB(BasicNewsRecipe):
    title      = 'Alison Berkley Column'
    __author__ = 'Tonythebookworm'
    description = 'Some dudes column'
    language = 'en'
    no_stylesheets = True
    publisher           = 'Tonythebookworm'
    category            = 'column'
    use_embedded_content= False
    no_stylesheets      = True
    oldest_article      = 24
    remove_javascript   = True
    remove_empty_feeds  = True
    
    max_articles_per_feed = 10
    INDEX = 'http://www.aspentimes.com'
    
    
    def parse_index(self):
        feeds = []
        for title, url in [
                            (u"Alison Berkley", u"http://www.aspentimes.com/SECTION/&Profile=1021&ParentProfile=1061"),
                            
                            
                             ]:
            articles = self.make_links(url)
            if articles:
                feeds.append((title, articles))
        return feeds
        
    def make_links(self, url):
        title = 'Temp'
        current_articles = []
        soup = self.index_to_soup(url)
        print 'The soup is: ', soup
        for item in soup.findAll('div',attrs={'class':'title'}):
            print 'item is: ', item
            link = item.find('a')
            print 'the link is: ', link
            titlecheck = self.tag_to_string(link)
            #once we get a link we need to check to see if it contains Alison Berkley and if it does use it
            if link.find(text=re.compile('Alison Berkley')) :
                print 'FOUND TITLE AND IT IS : ', titlecheck
            
                url         = self.INDEX + link['href']
                title       = self.tag_to_string(link)
                print 'the title is: ', title
                print 'the url is: ', url
                print 'the title is: ', title
                current_articles.append({'title': title, 'url': url, 'description':'', 'date':''}) # append all this
         
        #FIND MORE LINKS HERE   
        counter = 0
        while counter <= 5:
         for item in soup.findAll('span',attrs={'class':'links'}):
           # print 'item is: ', item
            link = item.find('a')
            if link.find(text=re.compile('More Articles')):
               print 'counter is : ', counter
               url = self.INDEX + link['href']
               print 'THE NEXT URL IS: ', url
               soup = self.index_to_soup(url)
       
         for item in soup.findAll('div',attrs={'class':'title'}):
           
            link = item.find('a')
           
            titlecheck = self.tag_to_string(link)
            #once we get a link we need to check to see if it contains Alison Berkley and if it does use it
            if link.find(text=re.compile('Alison Berkley')) :
                print 'FOUND NEW TITLES AND IT IS : ', titlecheck
            
                url         = self.INDEX + link['href']
                title       = self.tag_to_string(link)
                print 'the title is: ', title
                print 'the url is: ', url
                print 'the title is: ', title
                current_articles.append({'title': title, 'url': url, 'description':'', 'date':''}) # append all this        
         counter +=1
            
           
        return current_articles
        
        
    def print_version(self, url):
        split1 = url.split("article")
        print 'THE SPLIT IS: ', split1 
        #original is: http://www.aspentimes.com/article/20100909/COLUMN/100909869/1021&parentprofile=1061
        #need this to be print_url:
        #http://www.aspentimes.com/apps/pbcs.dll/article?AID=/20100909/COLUMN/100909869/1021&parentprofile=1061&template=printart         
         
        print_url = 'http://www.aspentimes.com/apps/pbcs.dll/article?AID=' + split1[1] + '&template=printart'
        print 'THIS URL WILL PRINT: ', print_url # this is a test string to see what the url is it will return
        return print_url

rcoslow · 09-12-2010, 02:31 PM

Hi

Can some on create a fetch for Popular Science

Thanks

TonytheBookworm · 09-12-2010, 04:39 PM

How is this done?
I know you can take and make a counter and then check to see if it is less than a number. However with an if statement it only get executed once.
How do you do a for loop that uses a counter?
Something like for (int i; i<=5, i++)
I tried
for counter <=5: with no go

kbrand · 09-12-2010, 05:22 PM

I wish I had some clue as to how to do this, but I am completely lost. Could someone make a recipe for the Colorado Springs Gazette at http://www.gazette.com. Thank You so much in advance!

Nevermind, figured it out, thank you

TonytheBookworm · 09-12-2010, 11:03 PM

Here is a working recipe for outlookindia.com
Rss feed: http://www.outlookindia.com/rss/main/magazine

I only done the magazine portion of the rss feeds.

TonytheBookworm · 09-13-2010, 12:35 AM

Slightly puzzled and not sure what is going on here.
When I run this recipe at the console with
ebook-convert test.recipe output_dir --test -vv > myrecipe.txt

I end up getting a nice formatted article with no junk.
Then when i take and import it into calibre to fully test it. I get junk.

So, I went a step further and did this.

ebook-convert test.recipe myrecipe.mobi --test

And again I get nice pretty articles with no junk. So what could be going on that is different when I actually load it into calibre ? I can remove the tags but kinda hard to do that when they don't show up in the test

here is the code i'm working with

Spoiler:

chainanim · 09-13-2010, 10:43 AM

Hi I am new here. Can any one assist in providing recipes for Indian magazines like Outlook Business (www.outlookbusiness.com), Open (www.openthemagazine.com) and international magazines like Fortune etc.

chainanim · 09-13-2010, 10:56 AM

@Stewie1-I had the same query. Were you able to get any replies/recipes for this?

Quote:

Originally Posted by stewie1

The Financial Times recipe that's currently posted isn't the complete print edition. I'm a subscriber and trying to put something together that will allow me to get the day's print edition (http://www.ft.com/us-edition). Unfortunately, there is no RSS feed for this.

Can anyone help, either by putting a recipe together, or directing me to a template I might be able to use to give it a shot myself (note I am a complete novice at this).

Thanks.

burbank_atl · 09-13-2010, 04:22 PM

Quote:

Originally Posted by chainanim

@Stewie1-I had the same query. Were you able to get any replies/recipes for this?

Actually there are RSS feeds for almost the entire print edition at http://www.ft.com/servicestools/newstracking/rss.

Creating a recipe that would resemble the print edition would require deciding which feeds should be included and in what order.

For example, the Comment section contains the editorial parts of the daily edition. There isn't one feed, there are four. This is how the RSS feeds are setup.

I have been playing around with a version of the current recipe, but somehow I managed to completely screw it up. When time permits, I will try again.

Starson17 · 09-13-2010, 05:47 PM

Quote:

Originally Posted by TonytheBookworm

I end up getting a nice formatted article with no junk.
Then when i take and import it into calibre to fully test it. I get junk.

I get the same result both ways.

TonytheBookworm · 09-13-2010, 06:02 PM

Quote:

Originally Posted by Starson17

I get the same result both ways.

Strange. I'll try rebooting and see what that does. Just curious when you say your getting the same results? are you get unjunked version or the junked version ? I would assume you are getting the version with the home and other crap in it. Which is what I was expecting but when i ran the test and it showed up clean I was saying great don't have to do as much work on it. Anyway thanks for testing and I will see what I come up with it. Just thought it was weird. One thing that might be causing me to see a clean version is the fact i run Ad Block. I forget to turn that off at times.

Starson17 · 09-13-2010, 08:07 PM

Quote:

Originally Posted by TonytheBookworm

Just curious when you say your getting the same results? are you get unjunked version or the junked version ? I would assume you are getting the version with the home and other crap in it. Which is what I was expecting but when i ran the test and it showed up clean I was saying great don't have to do as much work on it. Anyway thanks for testing and I will see what I come up with it. Just thought it was weird. One thing that might be causing me to see a clean version is the fact i run Ad Block. I forget to turn that off at times.

I'm getting a pretty clean version. I also run Adblock, but that only affects FireFox, not Calibre.

09-12-2010, 04:06 AM	#2701
lady kay Junior Member Posts: 3 Karma: 10 Join Date: Aug 2010 Device: sony prs 600	Private Eye Hi Has anyone done a recipe for Private Eye http://www.private-eye.co.uk/ Here's hoping.

09-12-2010, 02:31 PM	#2705
rcoslow Junior Member Posts: 1 Karma: 10 Join Date: Sep 2010 Device: Kindle	Custom Receipts Hi Can some on create a fetch for Popular Science Thanks

09-12-2010, 04:39 PM	#2706
TonytheBookworm Addict Posts: 264 Karma: 62 Join Date: May 2010 Device: kindle 2, kindle 3, Kindle fire	How is this done? I know you can take and make a counter and then check to see if it is less than a number. However with an if statement it only get executed once. How do you do a for loop that uses a counter? Something like for (int i; i<=5, i++) I tried for counter <=5: with no go Last edited by TonytheBookworm; 09-12-2010 at 04:44 PM. Reason: figured it out with a while

09-12-2010, 05:22 PM	#2707
kbrand Enthusiast Posts: 28 Karma: 10 Join Date: Jun 2010 Device: Sony 300	I wish I had some clue as to how to do this, but I am completely lost. Could someone make a recipe for the Colorado Springs Gazette at http://www.gazette.com. Thank You so much in advance! Nevermind, figured it out, thank you Last edited by kbrand; 09-12-2010 at 05:58 PM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Custom column read ?	pchrist7	Calibre	2	10-04-2010 02:52 AM
Archive for custom screensavers	sleeplessdave	Amazon Kindle	1	07-07-2010 12:33 PM
How to back up preferences and custom recipes?	greenapple	Calibre	3	03-29-2010 05:08 AM
Donations for Custom Recipes	ddavtian	Calibre	5	01-23-2010 04:54 PM
Help understanding custom recipes	andersent	Calibre	0	12-17-2009 02:37 PM

09-13-2010, 10:43 AM	#2710
chainanim Junior Member Posts: 6 Karma: 10 Join Date: Sep 2010 Device: kindle 2	Hi I am new here. Can any one assist in providing recipes for Indian magazines like Outlook Business (www.outlookbusiness.com), Open (www.openthemagazine.com) and international magazines like Fortune etc.