![]() |
#1 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 615
Karma: 85520
Join Date: May 2021
Device: kindle
|
how to fetch images in soup
This is the html code in page.
Code:
/html/body/section/section/section/article/a/figure/picture <picture> <source media="__" srcset="Actual link for image"> <img style="__" data-src="Image link" </picture> Code:
for img in soup.findAll('img', attrs={'data-src'}): img['src'] = img['data-src'] return soup How do i make it work |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,311
Karma: 27111242
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
That will work fine provided you do it in preprocess_html() and data-src has the correct url.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 615
Karma: 85520
Join Date: May 2021
Device: kindle
|
Code:
<div class="itgimage"> <img class="lazyload" data-src="https://akm-img-a-in.tosshub.com/indiatoday/images/bodyeditor/202112/4-PK-b-x721.jpg?X3hMZ0Z5Dz9GQjbHLA1rUxxmdMmXNu8v" src="https://akm-img-a-in.tosshub.com/indiatoday/../sites/all/themes/itg/images/itg_image370x208.jpg" alt=""></div> data-src has .jpg extension with ?______ .. if calibre can open this link it can save the image while fetching.. I think it fails to try opening this link as its not seen as image link.. EDIT Okay looks like data-src isn't fetching Last edited by unkn0wn; 12-14-2021 at 05:22 AM. |
![]() |
![]() |
![]() |
#4 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,311
Karma: 27111242
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
img['src'] = img['data-src'].split('?')[0]
|
![]() |
![]() |
![]() |
#5 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 615
Karma: 85520
Join Date: May 2021
Device: kindle
|
def preprocess_html(self, soup):
for img in soup.findAll Does above not work when 'auto_cleanup = True'? I think auto_cleanup ignores all images even after preprocess for finding images is used! |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,311
Karma: 27111242
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Yes, auto_cleanup does its own thing, you cant use any of the regular facilities with auto_cleanup.
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to fetch images from those RSSes | Aoxd3p4 | Recipes | 0 | 12-08-2019 04:29 AM |
Disable download images when fetch news | legeekdv | Recipes | 1 | 05-21-2018 09:35 AM |
Word Soup | kranu | Amazon Kindle | 8 | 03-11-2011 04:25 PM |
feature request for ebook-convert: fetch remote images | bpeters | Calibre | 1 | 06-16-2010 11:13 AM |
Supernatural soup | bmwvan | Reading Recommendations | 30 | 08-01-2008 11:25 PM |