Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 12-02-2010, 07:19 AM   #1
mufc
Connoisseur
mufc doesn't littermufc doesn't litter
 
Posts: 99
Karma: 170
Join Date: Nov 2010
Location: Airdrie Alberta
Device: Sony 650
Is it possible to pull the print version

Is it possible to pull the print version from this. The code would be great but an explanation would be even greater. That way I might earn something.


http://www.ctv.ca/CTVNews/TopStories...harper-101201/


http://www.ctv.ca/servlet/ArticleNew...hub=PrintStory


Much Thanks in advance

Full url not shown.
mufc is offline   Reply With Quote
Old 12-05-2010, 06:32 PM   #2
jangliss
Junior Member
jangliss began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Dec 2010
Device: Sony PR-650
Relatively Easy...

At least I think it should be... The manual has an example on how to do this for the BBC News website, and takes just a minor tweek to get working with this one.

Code:
class CTVNews(BasicNewsRecipe):
    title = u'CTV News Feed'
    oldest_article = 7
    max_articles_per_feed = 100

    feeds = [(http://www.ctv.ca/generic/generated/freeheadlines/rdf/allNewsRss.xml)]

    def print_version(self, url):
        return url.replace('http://www.ctv.ca/','http://www.ctv.ca/servlet/ArticleNews/print/') + '?hub=TopStories&subhub=PrintStory'
I did notice that the URL had a date string before the print ?hub bit at the end, but on a random guess, I took it out, and it worked. I've not tested the above, and might be missing other critical parts (this is my first attempt), but give it a shot.
jangliss is offline   Reply With Quote
Advert
Old 12-05-2010, 09:53 PM   #3
mufc
Connoisseur
mufc doesn't littermufc doesn't litter
 
Posts: 99
Karma: 170
Join Date: Nov 2010
Location: Airdrie Alberta
Device: Sony 650
Did not work

Getting this
GIS.Common.GIException: Trying to get HTMLTemplate filenames
at GIS.Servlets.HTMLTemplate.buildSubValues(HTMLTempl ate.java:422)
at GIS.Servlets.HTMLTemplate.doGet(HTMLTemplate.java: 131)
at javax.servlet.http.HttpServlet.service(HttpServlet .java:126)
at GIS.Common.Servlet.service(Servlet.java:122)
at javax.servlet.http.HttpServlet.service(HttpServlet .java:103)
at com.caucho.server.http.FilterChainServlet.doFilter (FilterChainServlet.java:95)
at com.caucho.server.http.Invocation.service(Invocati on.java:291)
at com.caucho.server.http.CacheInvocation.service(Cac heInvocation.java:132)
at com.caucho.server.http.RunnerRequest.handleRequest (RunnerRequest.java:341)
at com.caucho.server.http.RunnerRequest.handleConnect ion(RunnerRequest.java:271)
at com.caucho.server.TcpConnection.run(TcpConnection. java:136)
at java.lang.Thread.run(Thread.java:595)
mufc is offline   Reply With Quote
Old 12-09-2010, 11:24 AM   #4
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
What I would do is strip the url and re-append it
The original article within say the Health feed is something like this
http://www.ctv.ca/CTVNews/Health/201...umbers-101209/

and the print version is something like this:
http://www.ctv.ca/servlet/ArticleNew...hub=PrintStory

so what you would wanna do is strip the url after the http://www.ctv.ca/CTVNews/Health/ in the original url
that would leave you with two indexes.
1) http://www.ctv.ca/CTVNews/Health/
2) /20101209/nurses-numbers-101209/

then simply re split the second index so you get
1) 20101209
2) nurses-numbers-101209

then simply piece it back together
Spoiler:

Code:
def print_version(self, url):
        split1 = url.split("/CTVNews/Health/")
        split2 = url.split1("/")
        
        print_url = http://www.ctv.ca/servlet/ArticleNews/print/CTVNews/' + split1[1] + '/' + split2[0] + '/?hub=Health&subhub=PrintStory'
        
        return print_url


notice that this would only work for the health section. so you would have to take and use some for statements and if statements and a regexpression search to check and see if health is contained in the url. or whatever feeds you might have then break the url accordingly.

By the way the above code is not tested and will more than likely fail but you should get the general concept of how to go about it... Good luck.
TonytheBookworm is offline   Reply With Quote
Old 12-09-2010, 06:33 PM   #5
mufc
Connoisseur
mufc doesn't littermufc doesn't litter
 
Posts: 99
Karma: 170
Join Date: Nov 2010
Location: Airdrie Alberta
Device: Sony 650
Thanks for your time

Thanks for your time but I think it is much faster to strip the pages of divs etc that I do not need.
mufc is offline   Reply With Quote
Advert
Old 01-31-2011, 12:36 PM   #6
sorcer
Junior Member
sorcer began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jan 2011
Device: Kindle 3 WIFI
Does anyone have any idea how to use print version in my case? If the original article is www.somesite.com/news/article, then in the print version one needs to add ?preview=print at the end of URL, so it becomes like www.somesite.com/news/article?preview=print. What code should I use in order to do that?
sorcer is offline   Reply With Quote
Old 01-31-2011, 01:59 PM   #7
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by sorcer View Post
Does anyone have any idea how to use print version in my case? If the original article is www.somesite.com/news/article, then in the print version one needs to add ?preview=print at the end of URL, so it becomes like www.somesite.com/news/article?preview=print. What code should I use in order to do that?
Use this:
Code:
    def print_version(self, url):
        return url + '?preview=print'
Starson17 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
If i have a print version... dhume01 General Discussions 3 10-06-2010 08:38 PM
recipe to pull web page similar to 'print/save as pdf' JPD Recipes 15 09-29-2010 09:20 AM
Downloading and Converting Print version of RSS article Daanish87 Calibre 1 06-11-2010 02:08 AM
Classic Softroot to my books -- can't pull current version nabour Nook Developer's Corner 3 02-28-2010 10:02 PM
Print Version Neels Calibre 5 10-12-2009 03:09 PM


All times are GMT -4. The time now is 11:33 AM.


MobileRead.com is a privately owned, operated and funded community.