Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 07-07-2011, 10:22 AM   #1
newnick
Junior Member
newnick began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jul 2011
Device: kindle 3 wifi
How get full article when good looking page do not have print version and same url?

Hello!
Sorry, for my bad English.
I need to get articles from the site sciencedirect. We at work have a subscription to this site, but there is always time for reading. Usually I make RSS feed by keywords, and use this code:
Spoiler:
Code:
class ScienceDirectSearch(BasicNewsRecipe):
    title                 = 'ScienceDirect Search: nonviral gene delivery cancer career'
    oldest_article        = 2
    max_articles_per_feed = 100
    language              = 'en'
    no_stylesheets        = True
    remove_javascript = True
    keep_only_tags     = [dict(name='div',attrs={'id':'articleContent'})]	

    feeds       = [
(u'ScienceDirect Search: nonviral gene delivery cancer career', u'http://rss.sciencedirect.com/getMessage?registrationId=JEBCKHJCKGBKREFGLECDJLBJJNEDNEBGPWDKMNFDLE')
]

This code working well, but pictures very small (Thumbnails), and page have links "Full-Size images". I trying to find some regesp for this versions, but it too complicated for me. I will try to explan:

In rss url links is
Code:
http://www.sciencedirect.com/science?_ob=GatewayURL&_origin=IRSSSEARCH&_method=citationSearch&_piikey=S0142961211001487&_version=1&md5=a9937225219b20142aafab27e5043b87
Then in browser url is:
Code:
http://www.sciencedirect.com/science/article/pii/S0142961211001487
"Full-Size images" url look:
Code:
http://www.sciencedirect.com/science/article/pii/S0142961211001487?_rdoc=1&_fmt=full&_origin=gateway&md5=12247b4a7282dff569e83636d280c9ca&artImgPref=F
But when I press to this link in browser same url, like before:
Code:
http://www.sciencedirect.com/science/article/pii/S0142961211001487
So I trying to use this code:
Spoiler:
Code:
class ScienceDirectSearch(BasicNewsRecipe):
    title                 = 'ScienceDirect Search: nonviral gene delivery cancer career'
    oldest_article        = 2
    max_articles_per_feed = 100
    language              = 'en'
    no_stylesheets        = True
    remove_javascript = True
    keep_only_tags     = [dict(name='div',attrs={'id':'articleContent'})]	

    feeds       = [
(u'ScienceDirect Search: nonviral gene delivery cancer career', u'http://rss.sciencedirect.com/getMessage?registrationId=JEBCKHJCKGBKREFGLECDJLBJJNEDNEBGPWDKMNFDLE')
]

    temp_files = []
    articles_are_obfuscated = True

    def get_obfuscated_article(self, url):
        br = self.get_browser()
        br.open(url)
        response = br.follow_link(url_regex='&artImgPref=F$', nr = 0)
        html = response.read()
        self.temp_files.append(PersistentTemporaryFile('_fa.html'))
        self.temp_files[-1].write(html)
        self.temp_files[-1].close()
        return self.temp_files[-1].name


But calibre say:
Spoiler:
Code:
Failed to download article: Incorporation of active DNA/cationic polymer polyplexes into hydrogel scaffolds from http://www.sciencedirect.com/science...0ade4cd9d0bc48
Traceback (most recent call last):
  File "site-packages\calibre\utils\threadpool.py", line 95, in run
  File "site-packages\calibre\web\feeds\news.py", line 856, in fetch_obfuscated_article
  File "c:\users\rg\appdata\local\temp\calibre_0.8.7_tmp_c5awao\calibre_0.8.7_gdmxs__recipes\recipe0.py", line 25, in get_obfuscated_article
NameError: global name 'PersistentTemporaryFile' is not defined


Сan someone explain to me how to get the full version of this article with pictures?

I have another question:
is it possible to change only the pictures in the article, ie if there are pictures in the article with the address http://.../small/.../image.jpg. Can I change them to the pictures with the address http://.../medium/.../image.jpg?
Thank you!
newnick is offline   Reply With Quote
Old 07-07-2011, 01:40 PM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by newnick View Post
I have another question:
is it possible to change only the pictures in the article, ie if there are pictures in the article with the address http://.../small/.../image.jpg. Can I change them to the pictures with the address http://.../medium/.../image.jpg?
Thank you!
You could use preprocess_html, BeautifulSoup and this:
https://www.mobileread.com/forums/sho...9&postcount=12

Or you could use preprocess_regexps to modify the URL. See here:
http://manual.calibre-ebook.com/news...rocess_regexps
Starson17 is offline   Reply With Quote
Old 07-08-2011, 03:58 AM   #3
newnick
Junior Member
newnick began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jul 2011
Device: kindle 3 wifi
Quote:
Originally Posted by Starson17 View Post
You could use preprocess_html, BeautifulSoup and this:
https://www.mobileread.com/forums/sho...9&postcount=12

Or you could use preprocess_regexps to modify the URL. See here:
http://manual.calibre-ebook.com/news...rocess_regexps
Thanks!
newnick is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Ho to get the print url(Little complex) sexymax15 Recipes 2 06-19-2011 12:11 AM
get print-url and somtimes non-print-url schuster Recipes 4 05-28-2011 03:01 AM
Need Help Splitting a Print URL ... easy stuff. HELP! mjcassel Recipes 2 11-25-2010 09:30 AM
Decorate article headings as hyperlinks to full article? tomsem Recipes 5 10-15-2010 08:30 PM
Downloading and Converting Print version of RSS article Daanish87 Calibre 1 06-11-2010 02:08 AM


All times are GMT -4. The time now is 03:18 AM.


MobileRead.com is a privately owned, operated and funded community.