05-19-2011, 04:56 AM | #1 |
Zealot
Posts: 119
Karma: 100
Join Date: Jan 2011
Location: Germany / NRW /Köln
Device: prs-650 / prs-350 /kindle 3
|
grab the big-ones (img)
hi,
little prob to fetch the real pictures from the feeds to fetch the thumbnails is not the goal i think. can i fetch the big ones? here are the links from inside html: the thumbnail Code:
http://www.ngz-online.de/polopoly_fs/1.1255103.1305792647!/httpImage/1593972352.jpg_gen/derivatives/rpo54_195/1593972352.jpg Code:
http://www.ngz-online.de/polopoly_fs/1.1255103.1305792647!/httpImage/1593972352.jpg_gen/derivatives/rpoPanorama_786/1593972352.jpg |
05-19-2011, 12:12 PM | #2 |
creator of calibre
Posts: 43,779
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Use preprocess_html() and change the links.
|
Advert | |
|
05-19-2011, 12:24 PM | #3 |
Zealot
Posts: 119
Karma: 100
Join Date: Jan 2011
Location: Germany / NRW /Köln
Device: prs-650 / prs-350 /kindle 3
|
have got work on it right the hole day.
i don't get it work please can you help me a bit with right syntax? in this there is the code to change Code:
<div class="goodiebox l box_right"> <div class="boxframe"> <div class="headline_empty"/> <div class="content"> <center> <div class="imgtop"> <a href="/polopoly_fs/1.248951.1297710265!/httpImage/16361081.jpg_gen/derivatives/rpoPanorama_786/16361081.jpg" class="lightbox" rel="lightbox" title="Erster Gast auf dem Blauen Sofa in Grevenbroich: Bürgermeisterin Ursula Kwasny. Foto: Michael Reuter"> <img title="Erster Gast auf dem Blauen Sofa in Grevenbroich: Bürgermeisterin Ursula Kwasny. Foto: Michael Reuter" height="156" style="" alt="" width="195" class="" src="/polopoly_fs/1.248951.1297710265!/httpImage/16361081.jpg_gen/derivatives/rpo54_195/16361081.jpg"/> </a> <a title="Erster Gast auf dem Blauen Sofa in Grevenbroich: Bürgermeisterin Ursula Kwasny. Foto: Michael Reuter" class="iconpic_zoom lightbox" rel="lightbox" href="/polopoly_fs/1.248951.1297710265!/httpImage/16361081.jpg_gen/derivatives/rpoPanorama_786/16361081.jpg"/> <script type="text/javascript" src="http://www.ngz-online.de:80/js/lightbox.js" language="JavaScript"/> rpo54_195 with this one rpoPanorama_786 |
05-19-2011, 12:28 PM | #4 |
creator of calibre
Posts: 43,779
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Code:
for img in soup.findAll('img', src=True): img['src'] = img['src'].replace('rpo54_195', 'rpoPanorama_786') |
05-19-2011, 12:52 PM | #5 |
Zealot
Posts: 119
Karma: 100
Join Date: Jan 2011
Location: Germany / NRW /Köln
Device: prs-650 / prs-350 /kindle 3
|
o.k. that is next step, the string are changed
and now i got this: Code:
Referenced file '/polopoly_fs/1.1255129.1305794277%21/httpImage/2533432514.jpg_gen/derivatives/rpoPanorama_786/2533432514.jpg' not found Referenced file '/polopoly_fs/1.1255125.1305793701%21/httpImage/4171936915.jpg_gen/derivatives/rpoPanorama_786/4171936915.jpg' not found Referenced file '/polopoly_fs/1.1264249.1305799303%21/image/1299281949.jpg_gen/derivatives/rpoPanorama_786/1299281949.jpg' not found |
Advert | |
|
05-19-2011, 12:55 PM | #6 |
creator of calibre
Posts: 43,779
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Make the URL absolute.
|
05-19-2011, 01:08 PM | #7 |
Zealot
Posts: 119
Karma: 100
Join Date: Jan 2011
Location: Germany / NRW /Köln
Device: prs-650 / prs-350 /kindle 3
|
o.k. that is tooooo difficult for me at beginner. i want to read books to learn it. is this the right way to make it absolut: soup = BeautifulSoup() tag1 = Tag(soup, "mytag") tag2 = Tag(soup, "myOtherTag") |
05-19-2011, 01:10 PM | #8 |
creator of calibre
Posts: 43,779
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I just meant add the http://servername part to the links.
img['src'] = 'http://servername-whatever/' + img['src'].replace('xx', 'yy') |
05-19-2011, 02:09 PM | #9 |
Zealot
Posts: 119
Karma: 100
Join Date: Jan 2011
Location: Germany / NRW /Köln
Device: prs-650 / prs-350 /kindle 3
|
ah o.k. ,
i understand now! but it show's like that the target picture is in a script-container also. i believe , this is much more difficult. i let it be and make a bit more simpler stuff. thank you for assistance. greetings |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
<img longdesc= | traskilajussi | EPUBReader | 3 | 02-23-2011 08:15 AM |
Grab news error with PDF output | chatainsim | Calibre | 3 | 02-28-2010 06:59 PM |
How to grab plain (Sciencedirect) HTML? | johndoesecond | Workshop | 7 | 02-02-2010 04:17 PM |
Automatically grab news at windows startup ? | phkoech | Calibre | 3 | 08-25-2009 02:14 AM |
The Sunday Times:Google makes a grab for e-books | Kris777 | News | 8 | 03-29-2009 01:18 PM |