View Single Post
Old 08-22-2017, 10:58 AM   #217
slowsmile
Witchman
slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.
 
Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
I think that I might have found the cause of the missing div tag problem when the plugin is run on GG's epub. To be clear about my conclusions, I now believe that the HTML Parser module, which is always used in my own plugin code with bs4 is the main reason and culprit for the missing div tag. See below for reasons.

My bs4 code always uses 'html.parser' and its the only parser that I use for epub html. I use this parser for two reasons. First, this parser is the correct parser to use because it uses the same version of html -- html 4.01 -- as epub 2. Second, 'html.parser' is the only bs4 parser that actually fixes html errors.

Here's what I did in the test. In turn, I changed all my BeautifulSoup parser declarations in the reformatSmallImages() and formatImages() functions in my code to using the 'lxml', 'html5' and no parser and ran the plugin on GG's epub. In all instances of testing these other parsers with GG's epub I got a whole bunch of errors. But the missing </div> tag error was not among any of the errors for any of these other parsers. When I changed the parser back to using bs4 with 'html.parser' and ran the test again then that single missing </div> tag problem returned after the plugin was again run. I also ran all these same test on Bundled Python and ran them all again using my own Python version on my computer(Python 3.4.4) with exactly the same results.

Therefore it would seem that the HTML Parser module that bs4 uses is the cause of this missing div tag problem.

If someone can also confirm the above test result then obviously this problem can't be fixed and I should perhaps advise another caveat in my plugin release advising the plugin user to run Sigil's Tools > Mend All HTML Files option which will fix any missing div tags problem if it occurs when running the plugin.

Last edited by slowsmile; 08-22-2017 at 11:01 AM.
slowsmile is offline   Reply With Quote