Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 09-01-2020, 07:30 PM   #1
playful
Mammal
playful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmos
 
playful's Avatar
 
Posts: 126
Karma: 21380
Join Date: Oct 2010
Location: Right Here
Device: Onyx Note Pro, Kindle DXG
Embed remote images ("save complete?")

Hi everyone,
I have a book assembled from many html files saved from the web. The pages include images on remote servers, e.g.
Code:
<img src="http://www.example.com/01.jpg">
These images show up when I am browsing in Sigil. They also show up in the exported Epub on the computer. They do not show up in the e-reader because the images are not embedded in the Epub itself and the e-reader has no connection.

Does Sigil have an "embed remote images" or "save as complete" function? I could not find it. If not, is there a workaround using Calibre?
playful is offline   Reply With Quote
Old 09-01-2020, 08:12 PM   #2
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,750
Karma: 5706256
Join Date: Nov 2009
Device: many
Only epub3 supports use of remote resources and those typically are video and audio thatare simply too large for the book. Note the remote property must be properly added to the manifest for those opf manifest entries.

To get the actual images simply use curl or a browser to download and save those images and properly add themto the epub as is normal.
KevinH is online now   Reply With Quote
Advert
Old 09-01-2020, 10:32 PM   #3
playful
Mammal
playful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmos
 
playful's Avatar
 
Posts: 126
Karma: 21380
Join Date: Oct 2010
Location: Right Here
Device: Onyx Note Pro, Kindle DXG
Thank you very much for sharing your thoughts, Kevin.
With 600 files, I really need an automated way to do this.

With Calibre I found one workaround:
- open the Epub generated by Sigil
- Edit
- Tools / External Links / Download external resources
- Save a copy

Seems to work. Obviously if there's a way to do it without leaving Sigil, would rather go that route.
playful is offline   Reply With Quote
Old 09-02-2020, 08:52 AM   #4
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,750
Karma: 5706256
Join Date: Nov 2009
Device: many
curl can be used from the command line to recursively pull down any website including all resources so pointing curl at you toc should be enough to batch download all images. curl is installed on macOS and linux boxes. I am sure there are versions you can get for Windows.

600 images unless most are really small are going to take up a lot of space. Is this a comic book or graphic novel?

It almost sounds like you are slurping up someone else's website, not your own. Please be careful of licensing issues for all the images and copyright on the text if this is for public release and not your own personal use.
KevinH is online now   Reply With Quote
Old 09-02-2020, 12:05 PM   #5
playful
Mammal
playful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmos
 
playful's Avatar
 
Posts: 126
Karma: 21380
Join Date: Oct 2010
Location: Right Here
Device: Onyx Note Pro, Kindle DXG
Quote:
Originally Posted by KevinH View Post
It almost sounds like you are slurping up someone else's website, not your own.
Exactly. There's someone's writing that I like but I can't stand reading it on a computer screen. I wrote a bit of Python to download and clean the pages. It's for my own use, wouldn't think of distributing it.

Didn't realize that a simple curl command could iterate through the branches of a website and download all the leaves, I'd never used it like that. Thank you for the tip.

When you say to point it at the toc, let me see if I understand the process. Unzip the Epub, navigate to the toc.ncx folder. In that folder, run...
curl toc.ncx?
I guess I'll have to study the curl manual.

Thanks to both for the pointers.
playful is offline   Reply With Quote
Advert
Old 09-02-2020, 12:22 PM   #6
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,559
Karma: 204127028
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Sounds like a good candidate for a Sigil plugin. You've already admitted to knowing some Python.
DiapDealer is offline   Reply With Quote
Old 09-02-2020, 02:21 PM   #7
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,750
Karma: 5706256
Join Date: Nov 2009
Device: many
No point curl at the toc.html or index file of the website.
KevinH is online now   Reply With Quote
Old 09-02-2020, 02:48 PM   #8
playful
Mammal
playful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmos
 
playful's Avatar
 
Posts: 126
Karma: 21380
Join Date: Oct 2010
Location: Right Here
Device: Onyx Note Pro, Kindle DXG
In the meantime I'll modify my Python script, but will also experiment with these options. Big thanks for your sound and helpful advice, you all!
playful is offline   Reply With Quote
Old 09-02-2020, 02:49 PM   #9
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,725
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by playful View Post
Didn't realize that a simple curl command could iterate through the branches of a website and download all the leaves, I'd never used it like that. Thank you for the tip.
You could easily create a custom Sigil edit plugin that doesn't require curl. The following proof-of-concept edit plugin will download all external images to the plugin folder.


Spoiler:
Code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys, os
from sigil_bs4 import BeautifulSoup
from urllib.request import urlretrieve

# main routine
def run(bk):

    # get plugin folder path
    plugin_folder = os.path.join(bk._w.plugin_dir, bk._w.plugin_name)

    # iterate over all html files
    for html_id, href in bk.text_iter():
        print('Checking', href)
        # read orignal html code from file
        html = bk.readfile(html_id)

        # get all image urls
        soup = BeautifulSoup(html, 'html.parser')
        for image in soup.find_all('img'):
            if 'src' in image.attrs:
                src = image['src']
                if src.startswith('http'):
                    image_name = os.path.basename(src)
                    image_path = os.path.join(plugin_folder, image_name)

                    try:
                        # download file to the plugin folder
                        urlretrieve(src, image_path)
                        print(image_name, 'downloaded')
                    except:
                        print(image_name, 'NOT downloaded')
                        pass

    return 0

def main():
    print('I reached main when I should not have\n')
    return -1

if __name__ == "__main__":
    sys.exit(main())


All you need to do is change the src attribute paths and add the downloaded images to the epub.

For more information on how to do this see the Sigil Framework Guide.
Doitsu is offline   Reply With Quote
Old 09-02-2020, 05:29 PM   #10
playful
Mammal
playful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmos
 
playful's Avatar
 
Posts: 126
Karma: 21380
Join Date: Oct 2010
Location: Right Here
Device: Onyx Note Pro, Kindle DXG
Wow. @Doitsu, you are making it really pain-free for me to slide in to the world of Sigil plug-ins.

At the moment I'm swimming in thousands of lines of Python code for a Spanish conjugation flashcard project I just released, and next week I have a big exam. But definitely saving this for later.

The first step is always the hardest, and it's tremendously helpful being shown a full working piece from where to expand.

Thank you, and hope the rest of your week goes well.
playful is offline   Reply With Quote
Old 09-03-2020, 08:57 AM   #11
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,905
Karma: 47303824
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
For the record, the calibre editor can retrieve external resources such as images from a URL. Beyond a couple of quick tests, I haven't used it.

Also, @playful, if this is something you do frequently, you might want to look at FanFicFare. It can download stories from web sites and build epubs. While it's primary purpose is fan fiction, it does support other sites and adding an adapter for a new site is not that hard. It is available as a calibre plugin, and a command-line tool.
davidfor is offline   Reply With Quote
Old 09-04-2020, 12:34 AM   #12
playful
Mammal
playful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmosplayful has become one with the cosmos
 
playful's Avatar
 
Posts: 126
Karma: 21380
Join Date: Oct 2010
Location: Right Here
Device: Onyx Note Pro, Kindle DXG
Quote:
Originally Posted by davidfor View Post
For the record, the calibre editor can retrieve external resources such as images from a URL. Beyond a couple of quick tests, I haven't used it.
Haha, I think you missed this. : )

Quote:
Originally Posted by playful View Post
With Calibre I found one workaround:
- open the Epub generated by Sigil
- Edit
- Tools / External Links / Download external resources
- Save a copy
Anyhow… Realized that the dpi needs to be changed in order for the images to display properly on my Onyx. When I download the pictures via Calibre as above, the images keep their original nominal dpi (meaning, according to metadata) of 300 and look like postage stamps.

In order to avoid having a second step, I ended up downloading the images in the Python script and setting the dpi using exiftool at the same time. Had to mess around with the options a bit. (There's an `exif` package for Python but as of writing the `x_resolution` and `y_resolution` does not properly set the dpi for Windows.)

In case it can help anyone on the same track, here's my download function (just fix the path to exiftool on the `comm =` line).

Usage:
url → "http://example.com/my.jpg"
folder → "path/to/output/folder"
target → "file_name.jpg"
[optional] dpi → 72

Code:
import requests
import shutil
import subprocess

from requests.exceptions import Timeout

def download_image(url, folder, target_name, dpi=False, timeout=20):
    """
    Usage:
    url → "http://example.com/my.jpg"
    folder → "path/to/output/folder"
    target → "file_name.jpg"
    [optional] dpi → 72
    """

    try:
        resp = requests.get(url, timeout=timeout, stream=True)
        if resp.status_code != 200:
            return False
        path = f"{folder}{target_name}"
        with open(path, 'wb') as f:
            resp.raw.decode_content = True
            shutil.copyfileobj(resp.raw, f)

        # Try to change the dpi
        if dpi and type(dpi) is int:
            # https://exiftool.org/exiftool_pod.html#OPTIONS
            comm = ["PATH/TO/exiftool.exe", '-charset', 'filename='
                           , '-q', '-overwrite_original_in_place'
                           , f'-Xresolution={dpi}', f'-Yresolution={dpi}'
                           , path]
            subprocess.run(comm)
        return True

    except:  # bare exception: the caller just needs the False value
        return False

Last edited by playful; 09-04-2020 at 05:53 PM. Reason: Add Timeout import
playful is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Embed the functions of "TOC Edit" into "TOC Tab"? hhtmp88 Plugins 2 08-20-2020 02:09 PM
Cannot embed metadata: "End of file was reached unxexpectedly." MarjaE Library Management 1 07-17-2019 11:07 PM
Calibre "save to disk", "last_modified" field format bodiccea Calibre 20 09-01-2018 04:09 AM
"Add a book" template like "Save to disk"? vr8ce Library Management 10 06-09-2017 08:16 AM
Does Calibre embed "normal" font? radius Calibre 8 08-30-2008 05:04 PM


All times are GMT -4. The time now is 10:20 PM.


MobileRead.com is a privately owned, operated and funded community.