![]() |
#1 |
Mammal
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 126
Karma: 21380
Join Date: Oct 2010
Location: Right Here
Device: Onyx Note Pro, Kindle DXG
|
Embed remote images ("save complete?")
Hi everyone,
I have a book assembled from many html files saved from the web. The pages include images on remote servers, e.g. Code:
<img src="http://www.example.com/01.jpg"> Does Sigil have an "embed remote images" or "save as complete" function? I could not find it. If not, is there a workaround using Calibre? |
![]() |
![]() |
![]() |
#2 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,750
Karma: 5706256
Join Date: Nov 2009
Device: many
|
Only epub3 supports use of remote resources and those typically are video and audio thatare simply too large for the book. Note the remote property must be properly added to the manifest for those opf manifest entries.
To get the actual images simply use curl or a browser to download and save those images and properly add themto the epub as is normal. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Mammal
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 126
Karma: 21380
Join Date: Oct 2010
Location: Right Here
Device: Onyx Note Pro, Kindle DXG
|
Thank you very much for sharing your thoughts, Kevin.
With 600 files, I really need an automated way to do this. With Calibre I found one workaround: - open the Epub generated by Sigil - Edit - Tools / External Links / Download external resources - Save a copy Seems to work. Obviously if there's a way to do it without leaving Sigil, would rather go that route. |
![]() |
![]() |
![]() |
#4 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,750
Karma: 5706256
Join Date: Nov 2009
Device: many
|
curl can be used from the command line to recursively pull down any website including all resources so pointing curl at you toc should be enough to batch download all images. curl is installed on macOS and linux boxes. I am sure there are versions you can get for Windows.
600 images unless most are really small are going to take up a lot of space. Is this a comic book or graphic novel? It almost sounds like you are slurping up someone else's website, not your own. Please be careful of licensing issues for all the images and copyright on the text if this is for public release and not your own personal use. |
![]() |
![]() |
![]() |
#5 | |
Mammal
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 126
Karma: 21380
Join Date: Oct 2010
Location: Right Here
Device: Onyx Note Pro, Kindle DXG
|
Quote:
Didn't realize that a simple curl command could iterate through the branches of a website and download all the leaves, I'd never used it like that. Thank you for the tip. When you say to point it at the toc, let me see if I understand the process. Unzip the Epub, navigate to the toc.ncx folder. In that folder, run... curl toc.ncx? I guess I'll have to study the curl manual. Thanks to both for the pointers. |
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,559
Karma: 204127028
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Sounds like a good candidate for a Sigil plugin. You've already admitted to knowing some Python.
![]() |
![]() |
![]() |
![]() |
#7 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,750
Karma: 5706256
Join Date: Nov 2009
Device: many
|
No point curl at the toc.html or index file of the website.
|
![]() |
![]() |
![]() |
#8 |
Mammal
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 126
Karma: 21380
Join Date: Oct 2010
Location: Right Here
Device: Onyx Note Pro, Kindle DXG
|
In the meantime I'll modify my Python script, but will also experiment with these options. Big thanks for your sound and helpful advice, you all!
|
![]() |
![]() |
![]() |
#9 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,725
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
Spoiler:
All you need to do is change the src attribute paths and add the downloaded images to the epub. For more information on how to do this see the Sigil Framework Guide. |
|
![]() |
![]() |
![]() |
#10 |
Mammal
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 126
Karma: 21380
Join Date: Oct 2010
Location: Right Here
Device: Onyx Note Pro, Kindle DXG
|
Wow. @Doitsu, you are making it really pain-free for me to slide in to the world of Sigil plug-ins.
At the moment I'm swimming in thousands of lines of Python code for a Spanish conjugation flashcard project I just released, and next week I have a big exam. But definitely saving this for later. The first step is always the hardest, and it's tremendously helpful being shown a full working piece from where to expand. Thank you, and hope the rest of your week goes well. |
![]() |
![]() |
![]() |
#11 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 24,905
Karma: 47303824
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
For the record, the calibre editor can retrieve external resources such as images from a URL. Beyond a couple of quick tests, I haven't used it.
Also, @playful, if this is something you do frequently, you might want to look at FanFicFare. It can download stories from web sites and build epubs. While it's primary purpose is fan fiction, it does support other sites and adding an adapter for a new site is not that hard. It is available as a calibre plugin, and a command-line tool. |
![]() |
![]() |
![]() |
#12 | ||
Mammal
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 126
Karma: 21380
Join Date: Oct 2010
Location: Right Here
Device: Onyx Note Pro, Kindle DXG
|
Quote:
Quote:
In order to avoid having a second step, I ended up downloading the images in the Python script and setting the dpi using exiftool at the same time. Had to mess around with the options a bit. (There's an `exif` package for Python but as of writing the `x_resolution` and `y_resolution` does not properly set the dpi for Windows.) In case it can help anyone on the same track, here's my download function (just fix the path to exiftool on the `comm =` line). Usage: url → "http://example.com/my.jpg" folder → "path/to/output/folder" target → "file_name.jpg" [optional] dpi → 72 Code:
import requests import shutil import subprocess from requests.exceptions import Timeout def download_image(url, folder, target_name, dpi=False, timeout=20): """ Usage: url → "http://example.com/my.jpg" folder → "path/to/output/folder" target → "file_name.jpg" [optional] dpi → 72 """ try: resp = requests.get(url, timeout=timeout, stream=True) if resp.status_code != 200: return False path = f"{folder}{target_name}" with open(path, 'wb') as f: resp.raw.decode_content = True shutil.copyfileobj(resp.raw, f) # Try to change the dpi if dpi and type(dpi) is int: # https://exiftool.org/exiftool_pod.html#OPTIONS comm = ["PATH/TO/exiftool.exe", '-charset', 'filename=' , '-q', '-overwrite_original_in_place' , f'-Xresolution={dpi}', f'-Yresolution={dpi}' , path] subprocess.run(comm) return True except: # bare exception: the caller just needs the False value return False Last edited by playful; 09-04-2020 at 05:53 PM. Reason: Add Timeout import |
||
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Embed the functions of "TOC Edit" into "TOC Tab"? | hhtmp88 | Plugins | 2 | 08-20-2020 02:09 PM |
Cannot embed metadata: "End of file was reached unxexpectedly." | MarjaE | Library Management | 1 | 07-17-2019 11:07 PM |
Calibre "save to disk", "last_modified" field format | bodiccea | Calibre | 20 | 09-01-2018 04:09 AM |
"Add a book" template like "Save to disk"? | vr8ce | Library Management | 10 | 06-09-2017 08:16 AM |
Does Calibre embed "normal" font? | radius | Calibre | 8 | 08-30-2008 05:04 PM |