View Single Post
Old 09-02-2020, 02:49 PM   #9
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,731
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by playful View Post
Didn't realize that a simple curl command could iterate through the branches of a website and download all the leaves, I'd never used it like that. Thank you for the tip.
You could easily create a custom Sigil edit plugin that doesn't require curl. The following proof-of-concept edit plugin will download all external images to the plugin folder.


Spoiler:
Code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys, os
from sigil_bs4 import BeautifulSoup
from urllib.request import urlretrieve

# main routine
def run(bk):

    # get plugin folder path
    plugin_folder = os.path.join(bk._w.plugin_dir, bk._w.plugin_name)

    # iterate over all html files
    for html_id, href in bk.text_iter():
        print('Checking', href)
        # read orignal html code from file
        html = bk.readfile(html_id)

        # get all image urls
        soup = BeautifulSoup(html, 'html.parser')
        for image in soup.find_all('img'):
            if 'src' in image.attrs:
                src = image['src']
                if src.startswith('http'):
                    image_name = os.path.basename(src)
                    image_path = os.path.join(plugin_folder, image_name)

                    try:
                        # download file to the plugin folder
                        urlretrieve(src, image_path)
                        print(image_name, 'downloaded')
                    except:
                        print(image_name, 'NOT downloaded')
                        pass

    return 0

def main():
    print('I reached main when I should not have\n')
    return -1

if __name__ == "__main__":
    sys.exit(main())


All you need to do is change the src attribute paths and add the downloaded images to the epub.

For more information on how to do this see the Sigil Framework Guide.
Doitsu is offline   Reply With Quote