Some sites don't include the figures/images into articles and instead the reader needs to click on an href link to see the image/figure. This wouldn't be possible on many ebook readers. To embed the images into output ebook, the tag type needs to be changed from <a> to <img>. Also the "href" property needs to be changed to "src". The following code does the job by looking for all the links to jpg files, then changed them to <img> tags.The code should be included into preprocess_html
Spoiler:
PHP Code:
def preprocess_html(self, soup):
# Includes all the figures inside the final ebook
# Finds all the jpg links
for figure in soup.findAll('a', attrs = {'href' : lambda x: x and 'jpg' in x}):
# makes sure that the link points to the absolute web address
if figure['href'].startswith('/'):
figure['href'] = self.site + figure['href']
figure.name = 'img' # converts the links to img
figure['src'] = figure['href'] # with the same address as href
figure['style'] = 'display:block' # adds /n before and after the image
del figure['href']
del figure['target']
return soup