I've currently have a problem that's a bit baffling.
Basically, if you have any svg images in your epub then your svg code should look something like this in the code:
Code:
<div>
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" height="100%" preserveAspectRatio="none" version="1.1" viewBox="0 0 571 910" width="100%">
<image height="910" width="571" xlink:href="../Images/00005.jpeg"/>
</svg>
</div>
Please note the attribute order within the svg tag. The above svg code will also pass Epubcheck.
However, if you have a bs4 method in your plugin -- any bs4 method -- then using bs4 always changes the svg code in your epub to this:
Code:
<div>
<svg height="100%" preserveaspectratio="none" version="1.1" viewbox="0 0 571 910" width="100%" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<image height="910" width="571" xlink:href="../Images/00005.jpeg"></image>
</svg>
</div
Note that the attribute order of this svg code is different from the first version and this svg code always causes Epubcheck errors.
I eventually found bs4 to be the culprit causing this problem by writing a test plugin with this code:
Spoiler:
Code:
from __future__ import unicode_literals, division, absolute_import, print_function
import os, os.path, sys
from tempfile import mkdtemp
try:
from sigil_bs4 import BeautifulSoup, Comment
except:
from bs4 import BeautifulSoup, Comment
def run(bk):
print('Python version: ', sys.version, '\n')
print('Running Test SVG Plugin...Please wait\n')
WDIR= mkdtemp()
files = []
# copies all epub xhtml files to the wprk dir
for (id, href) in bk.text_iter():
file = bk.href_to_basename(href)
file = os.path.join(WDIR, file)
with open(file, 'wt', encoding='utf-8') as outfp:
data = bk.readfile(id)
outfp.write(data)
files.append(file)
print('\n>>> Files copied to WORK_DIR...')
for _, f in enumerate(files):
print(f)
# a bs4 routine that just removes <br> tags
# so we can test bs4's disruptive effects on svg image
# formatting in xhtml files.
for fname in files:
output = os.path.join(WDIR, 'remove_br_tags.html')
outfp = open(output, 'wt', encoding='utf-8')
html = open(file, 'rt', encoding='utf-8').read()
soup = BeautifulSoup(html, 'html.parser')
orig_soup = str(soup)
for tag in soup.find_all(True):
if tag.name == 'br':
tag.attrs = {}
tag.decompose()
# write the modified work files back to the epub.
if str(soup) != orig_soup:
bk.writefile(id, str(soup))
print('\n\n>>> Completed successfully...')
return(0)
def main():
print('I reached main when I should not have\n')
return -1
if __name__ == "__main__":
sys.exit(main())
Just run this test plugin on any epub that contains svg images. Then run Epubcheck and you will see the problem for yourself.
If you run this test plugin on any epub in Sigil that contains svg images and stop it after it writes files to the work directory then any svg code in the xhtml files will be fine and it will pass Epubcheck. But if you run the plugin to include the trivial bs4 module then the svg attribute order will be changed by bs4 and this always causes errors when you run Epubcheck.
Is there anything I can do to prevent bs4 altering the svg code like this?
I'm using 'html.parser' whenever I initialize BeautifulSoup4. Could using the wrong parser be causing this svg problem ?
Does this problem have anything to do with xmlns declarations involving svg ?