![]() |
#1 |
Junior Member
![]() Posts: 2
Karma: 10
Join Date: Nov 2011
Device: none
|
Print friendly url unrelated to regular url (and javascript)
The url of the print friendly version is not contained anywhere in the url of the article. Instead the print-friendly url looks like: http://url.com/article_print.html?id=94148
Now there is an id tag in the original article but how do I extract that id tag and then direct calibre to the new url that contains this id? I tried using br.follow_link and url_regex but the link to the print friendly page is actually javascript and it apparently it does not work with javascript. (I am also very new to python) Any help would be appreciated. This is my first post and I tried to find the answer in the forums but could not. If it has been posted many times already I apologize. If you are interested here is the link to my feed: http://feeds.christianitytoday.com/c...itytoday/ctmag and here is an article: http://www.christianitytoday.com/ct/...isability.html and here is that article's print friendly page: http://www.christianitytoday.com/ct/....html?id=94467 Last edited by sleepless; 11-30-2011 at 03:54 PM. |
![]() |
![]() |
![]() |
#2 |
doofus
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,543
Karma: 13088847
Join Date: Sep 2010
Device: Kobo Libra 2, Kindle Voyage
|
I don't know if there's a better way to do this but it seems to work
Code:
def print_version(self, url): soup = self.index_to_soup(url) regex = re.compile(r'javascript:printPage\((\d+?)\)',re.I) atag = soup.find('a',attrs={'href':regex}) if atag is not None: m = regex.search(atag['href']) if m: url = 'http://www.christianitytoday.com/ct/article_print.html?id='+m.group(1) return url Note: add Code:
import re |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,194
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
That is a perfectly correct way to do it
![]() |
![]() |
![]() |
![]() |
#4 |
Junior Member
![]() Posts: 2
Karma: 10
Join Date: Nov 2011
Device: none
|
Wow. That works perfectly. Thanks so much. I definitely could not have thought of that. Really thanks a lot.
|
![]() |
![]() |
![]() |
Tags |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
How get full article when good looking page do not have print version and same url? | newnick | Recipes | 2 | 07-08-2011 03:58 AM |
Ho to get the print url(Little complex) | sexymax15 | Recipes | 2 | 06-19-2011 12:11 AM |
get print-url and somtimes non-print-url | schuster | Recipes | 4 | 05-28-2011 03:01 AM |
Bookmark this URL! | borisb | enTourage Archive | 2 | 03-31-2011 09:23 PM |
Need Help Splitting a Print URL ... easy stuff. HELP! | mjcassel | Recipes | 2 | 11-25-2010 09:30 AM |