View Single Post
Old 08-22-2017, 07:21 AM   #214
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,746
Karma: 24032915
Join Date: Dec 2010
Device: Kindle PW2
I haven't looked at your code in detail, but it looks like you're using both bs4 and regular expressions, which, IMHO, isn't a good idea.

The invalid code that Sigil is reporting is caused by the following lines in LSH.xhtml

before:

Code:
<div class="illus">
<img alt="" class="img065" src="../Images/lsh-23-065.png"/></div>
after:

Code:
<div class="illus">

<img alt="" class="mobionly" height="441px" src="../Images/lsh-23-065.png" width="630px"/>

<img alt="" class="kf8only" src="../Images/lsh-23-065.png" style="width: 100%;height: auto;"/>
For some reason your code removed the closing </div> tag.

I don't know what the exact cause is, but since you seem to be using regular expressions, it might help to normalize the HTML input using the Sigil bs4 prettyprint_xhtml function. When I ran the following plugin prior to running yours, I didn't get any error messages:

Spoiler:
Code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from sigil_bs4 import BeautifulSoup

def run(bk):
    html_id = 'LSH.xhtml'
    html = bk.readfile(html_id)
    soup = BeautifulSoup(html, 'html.parser')
    normalized_html = str(soup.prettyprint_xhtml(indent_level=0, eventual_encoding="utf-8", formatter="minimal", indent_chars="  "))
    bk.writefile(html_id, normalized_html)

    return 0

def main():
    print('I reached main when I should not have\n')
    return -1

if __name__ == "__main__":
    sys.exit(main())
Doitsu is offline   Reply With Quote