View Single Post
Old 09-05-2010, 12:14 PM   #2641
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by somedayson View Post
Getting even closer.

I can read all the articles now, but there's stuff before and after them that I'm picking up off the web site. I can't figure out how to

1. Get it to the print only page

2. Get the stuff at the beginning (really disruptive for reading) and the end (not as bad but would love to remove it)


Thanks for any assistance anyone can provide. I certainly wouldn't mind a little .rar pack with the answer in it either!

Grateful either way,
Matt
You stated you are getting the print only page. I don't think you actually were getting the printer friendly version for some reason. Anyway. What you need to do is something like this. I haven't fully tested it but it should work.

Also please in the future wrap your code in spoiler and code tags. it makes it easier for all of us here

Spoiler:

Code:
from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1282101454(BasicNewsRecipe):
    title = 'FW'
    language = 'en'
    __author__ = 'TonytheBookworm'
    description = 'FW'
    publisher = 'Tony'
    category = 'whateveryouwant'
    oldest_article = 1
    max_articles_per_feed = 100
    no_stylesheets = True
    
    
      
      
    remove_tags = [dict(name='div', attrs={'id':['sidebar1']})]       
    feeds = [(u'Opinion', u'http://journalgazette.net/apps/pbcs.dll/section?Category=EDIT&template=blogrss&mime=xml'), 
             (u'Local News',u'http://journalgazette.net/apps/pbcs.dll/section?Category=LOCAL&template=blogrss&mime=xml') ,
             (u'Sports',u'http://journalgazette.net/apps/pbcs.dll/section?Category=SPORTS&template=blogrss&mime=xml' ),
             (u'Features',u'http://journalgazette.net/apps/pbcs.dll/section?Category=FEAT&template=blogrss&mime=xml'),
             (u'Business',u'http://journalgazette.net/apps/pbcs.dll/section?Category=BIZ&template=blogrss&mime=xml'),
             (u'Ice Chips',u'http://journalgazette.net/apps/pbcs.dll/section?Category=BLOGS11&template=blogrss&mime=xml '),
             (u'Entertainment',u'http://journalgazette.net/apps/pbcs.dll/section?Category=ENT&template=blogrss&mime=xml'),
             (u'Food',u'http://journalgazette.net/apps/pbcs.dll/section?Category=FOOD&template=blogrss&mime=xml')
            ]




    def print_version(self, url):
        split1 = url.split("/")
        print 'THE SPLIT IS: ', split1
        url1 = split1[0]
        url2 = split1[1]
        url3 = split1[2]
        url4 = split1[3]
        url5 = split1[4]
        url6 = split1[5]
        url7 = split1[6]
        url8 = split1[7]
      
  #need to convert to print_version
  #originalversion is : http://www.journalgazette.net/article/20100905/EDIT10/309059959/1021/EDIT
  #printversion should be: http://www.journalgazette.net/apps/pbcs.dll/article?AID=/20100905/EDIT10/309059959/-1/EDIT01&template=printart      
  #results of the split
  #THE SPLIT IS:  [u'http:', u'', u'www.journalgazette.net', u'article', u'20100905', u'EDIT10', u'309059959', u'1021', u'EDIT']
        
        
        
        print_url = 'http://' + url3 + '/apps/pbcs.dll/article?AID=/' + url5 + '/' + url6 + '/' + url7 + '/-1/EDIT01&template=printart'
        print 'THIS URL WILL PRINT: ', print_url # this is a test string to see what the url is it will return
        return print_url
TonytheBookworm is offline