View Single Post
Old 03-10-2023, 09:57 AM   #3
alvarob
Junior Member
alvarob began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Feb 2023
Device: Kindle Paperwhite 11th
You're right, I was looking at the css but it seems that comes from the html.

Any idea on how to parse that from the source?
The website looks like this:

Code:
<img height="0" width="414" src="">
And I would like to remove both height and width from that tag, whatever size they specify (it can vary), leaving only

Code:
<img src="">
I guess I coud try with soup, something like

Code:
def postprocess_html(self, soup, first_fetch):
while len(soup.find_all('width')) > 0:
    soup.width.extract()
while len(soup.find_all('height')) > 0:
    soup.height.extract()
return soup
alvarob is offline   Reply With Quote