07-03-2017, 11:47 PM | #211 |
Witchman
Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
|
@Diapdealer...Your suggestion worked a treat. The plugin is now working properly. Thanks again.
|
08-22-2017, 05:51 AM | #212 |
Witchman
Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
|
Re: AddKindleMediaQueries problem
(Please see release notes for a description of the above plugin)
Hi everyone...I'm currently working, together with Doitsu, to fix various problems with the above plugin when div tags are used in various formats with the image tag lines. The current problem -- and the last problem -- that I'm having is that when I run Granny Grump's Irving Washington epub through this plugin I'm still getting one error with this problem which is a missing end </div> tag for just one image only in dual format. All the other images in the ebook which are formatted in exactly the same way are OK. And when I test other ebooks with and without the div tag formatting I never get this problem. So this is both a consistent problem occurring with one image only in one epub and it's also a quirky problem because I can't reproduce this problem in any other epub ebook on test. I also cannot understand why this missing div tag problem is occurring. The plugin itself is quite small -- all it does is add the relevant media queries to the selected stylesheet and then dual format all images in the epub HTML files for KF7(in pixels) and KF8(as % values). Here is the main driver code for the program: Spoiler:
And here is the main function that does all the dual formatting for the html image lines in GG's ebook(please also note that there is no div tag formatting at all in this function). This function appears to be working without any problems: Spoiler:
Also, both the reformatDivLayout() and fixBrokenTags() functions were added after we got the missing div tag problem. Similarly the prettifyXHTMLFiles() function also has nothing to do with this problem because I've already used this function in all my other plugins without any problems whatsoever. So these functions have nothing to do with the missing div tag problem. Below you can also download the new version of this plugin(v0.1.5 -- which is only for test purposes only and not released yet) as well as Doitsu's test epub so that you can see the missing div tag problem on Epubcheck for yourself. The test epub by Granny Grump is called Irving,Washington-LegendOfSleepyHollow-illus-Rackham.epub. Last edited by slowsmile; 08-22-2017 at 06:36 AM. |
08-22-2017, 06:33 AM | #213 |
Witchman
Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
|
The above missing div tag problem can be fixed simply by running Sigil's Tool > Mend All HTML Files. But it would still be nice to know why this problem is occurring.
I'm also pretty sure that I could cure the above problem by using Tidy and setting the wrap option to off. My own suspicions about this problem are that it might be happening because of wrapping problems. Just a wild guess. But I'm trying to avoid using Tidy because libtidy and the associated dll are large files and Tidy also has its own set of peculiar quirks. And if I use Tidy then it would be like using a sledgehammer to crack a walnut to cure this problem. But any other ideas or thoughts concerning this problem would be welcome. Last edited by slowsmile; 08-22-2017 at 07:24 AM. |
08-22-2017, 07:21 AM | #214 |
Grand Sorcerer
Posts: 5,583
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
I haven't looked at your code in detail, but it looks like you're using both bs4 and regular expressions, which, IMHO, isn't a good idea.
The invalid code that Sigil is reporting is caused by the following lines in LSH.xhtml before: Code:
<div class="illus">
<img alt="" class="img065" src="../Images/lsh-23-065.png"/></div>
Code:
<div class="illus"> <img alt="" class="mobionly" height="441px" src="../Images/lsh-23-065.png" width="630px"/> <img alt="" class="kf8only" src="../Images/lsh-23-065.png" style="width: 100%;height: auto;"/> I don't know what the exact cause is, but since you seem to be using regular expressions, it might help to normalize the HTML input using the Sigil bs4 prettyprint_xhtml function. When I ran the following plugin prior to running yours, I didn't get any error messages: Spoiler:
|
08-22-2017, 08:08 AM | #215 |
Witchman
Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
|
@Doitsu...Yes, I agree with you on all counts. You really shouldn't use regex in code, couldn't agree more. But this is not a standard fault that can be spotted or cured easily -- so I thought that I would just try the regex and see if it would work(it didn't work). Remember, there are alot of other image tags in GG's LHS.xhtml file that are also formatted with divs around image tags in the same way. So, if it was my code(and I'm not saying its not), then why are all those other image tags dual formatted correctly in GG's epub after running the plugin? And why can't I reproduce this fault in any other ebook using image tags within div tags? That's what makes this problem so weird and strange.
I've also had strange wrap problems like this before in the past where it wasn't the code but was caused by wrap errors. And I really don't want to use Tidy again to cure this, not keen on that either because that's overkill. The wrapping thing is a wild guess by the way. But the only way way I can prove that that problem is caused by wrapping problems is by using the Tidy module and setting the wrap option to OFF to see if that cures the problem. I might do that later(bit busy now). But I'll let you know the outcome of this testing with Tidy when I can. Last edited by slowsmile; 08-22-2017 at 08:23 AM. |
08-22-2017, 08:33 AM | #216 |
Grand Sorcerer
Posts: 27,547
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I'll try to take a peek if I get some time. The gist is that the original (x)html is well-formed, and that the plugin is dropping a closing div tag under certain conditions, right?
|
08-22-2017, 10:58 AM | #217 |
Witchman
Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
|
I think that I might have found the cause of the missing div tag problem when the plugin is run on GG's epub. To be clear about my conclusions, I now believe that the HTML Parser module, which is always used in my own plugin code with bs4 is the main reason and culprit for the missing div tag. See below for reasons.
My bs4 code always uses 'html.parser' and its the only parser that I use for epub html. I use this parser for two reasons. First, this parser is the correct parser to use because it uses the same version of html -- html 4.01 -- as epub 2. Second, 'html.parser' is the only bs4 parser that actually fixes html errors. Here's what I did in the test. In turn, I changed all my BeautifulSoup parser declarations in the reformatSmallImages() and formatImages() functions in my code to using the 'lxml', 'html5' and no parser and ran the plugin on GG's epub. In all instances of testing these other parsers with GG's epub I got a whole bunch of errors. But the missing </div> tag error was not among any of the errors for any of these other parsers. When I changed the parser back to using bs4 with 'html.parser' and ran the test again then that single missing </div> tag problem returned after the plugin was again run. I also ran all these same test on Bundled Python and ran them all again using my own Python version on my computer(Python 3.4.4) with exactly the same results. Therefore it would seem that the HTML Parser module that bs4 uses is the cause of this missing div tag problem. If someone can also confirm the above test result then obviously this problem can't be fixed and I should perhaps advise another caveat in my plugin release advising the plugin user to run Sigil's Tools > Mend All HTML Files option which will fix any missing div tags problem if it occurs when running the plugin. Last edited by slowsmile; 08-22-2017 at 11:01 AM. |
08-22-2017, 11:13 AM | #218 | |
Grand Sorcerer
Posts: 5,583
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
Run the following prepossessing code, before running your plugin and you'll see that missing <div> error no longer occurs: Code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from sigil_bs4 import BeautifulSoup
def run(bk):
html_id = 'LSH.xhtml'
html = bk.readfile(html_id)
soup = BeautifulSoup(html, 'html.parser')
normalized_html = str(soup.prettyprint_xhtml(indent_level=0, eventual_encoding="utf-8", formatter="minimal", indent_chars=" "))
bk.writefile(html_id, normalized_html)
return 0
def main():
print('I reached main when I should not have\n')
return -1
if __name__ == "__main__":
sys.exit(main())
|
|
08-22-2017, 01:49 PM | #219 |
Grand Sorcerer
Posts: 27,547
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I've not investigated extensively yet, but I'm able to change the source code:
Code:
<div class="illus"> <img alt="" class="img065" src="../Images/lsh-23-065.png"/></div> Code:
<div class="illus"> <img alt="" class="img065" src="../Images/lsh-23-065.png"/> </div> It handles all conditions where the opening and closing divs around images are on their own lines just fine. |
08-22-2017, 02:14 PM | #220 |
Grand Sorcerer
Posts: 27,547
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
In formatImages() in cutils.py, you're souping a single line of xhtml that has an invalid closing div tag at the end of it. What did you think the parser would do with the extraneous closing tag?
Code:
soup = BeautifulSoup(line, 'html.parser') Code:
<img alt="" class="img065" src="../Images/lsh-23-065.png"/></div> This code: Code:
line = str(soup) You need to find a better way of isolating the img tag and writing the modified one(s) back without affecting the surrounding code. The attached python script should make what's happening clear. Last edited by DiapDealer; 08-22-2017 at 02:17 PM. |
08-22-2017, 09:11 PM | #221 |
Witchman
Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
|
@DiapDealer and @Doitsu...Thanks for testing and for your opinions.
You are both right -- it isn't the BS html parser causing the problem. I did some further checks to confirm this. After I ran the plugin I then checked the plugin working directory containing all the epub files. When I ran the LHS.xhtml(in the working directory) in Chrome and looked at the source code this showed the same missing div error as before. So this does seem to confirm that this problem is somewhere in my plugin code. To help confirm the problem area in my code, I think it would perhaps be helpful if I show recently added functions that are not involved with the missing div problem. I have recently added the following functions to my code: reformatImageLayout() fixBrokenTags() I added these two functions in after getting the missing div tag error. I decided to initially reformat the div tags as a possible way of curing the missing div problem and standardize the input. Similarly, I wrote a regex function that repaired broken tags to fix this problem.These two functions made no difference regarding this problem(ie they have neither caused nor cured the problem). The only function that actually formats div tags in the plugin is the reformatImageLayout() function and, as I've already said, that function was added in as a possible fix after I started getting the missing div problem in GG's epub. I've also thoroughly checked my plugin code and there is nowhere else in the plugin code where the code changes or removes div tags. So I am still in the dark as to why the plugin code only gives one missing div tag error on line 322 of the LHS.xhtml file while the other images in GG's epub, which are all formatted in exactly the same way as the image with the missing div tag, have no missing div tag errors. This fact would seem to actually contradict the belief that it is the plugin code that is causing the single missing div tag problem because if it was my code then wouldn't all the aforementioned images have the same missing div tag problem as well? After all, couldn't this missing div problem also be due to a problem in a used module or a wrapping problem that really has nothing to do with my actual plugin code? I might try using libtidy to confirm whether missing div tag error is a wrapping problem. As I've already said, this isn't going to be an easy error to locate or fix. So later I will try adding Tidy with the wrap option set to OFF in the plugin to see if this cures the problem. Can't do it now -- I have to dash off again. Last edited by slowsmile; 08-22-2017 at 09:46 PM. |
08-22-2017, 09:34 PM | #222 |
Grand Sorcerer
Posts: 27,547
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I told you exactly where the problem lies. It's in formatImages(). Was I unclear? You're using bs4 to parse single lines of an xhtml file with the assumption that they will always be self-contained, well-formed snippets of xhtml. Doing so will cause data-loss when it is not. The problematic line of code in LHS.xhtml will not parse without data-loss. In this case, the closing div tag in question gets clobbered.
Last edited by DiapDealer; 08-22-2017 at 09:40 PM. |
08-22-2017, 11:13 PM | #223 |
Sigil Developer
Posts: 7,637
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Yes, just as DiapDealer said, you need to extract just the image tag not the entire line before passing it to soup to parse/fix. Having a closing div tag on the end of that line and passing it to soup will remove it since there is no matching starting div in the piece of code you passed to it.
Last edited by KevinH; 08-23-2017 at 09:43 AM. |
08-23-2017, 04:56 AM | #224 |
Witchman
Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
|
@DiapDealer and Doitsu...Yes, of course DiapDealer was right. In fact I had already implemented his suggestion because reformatImageLayout() was specifically written to provide a standard input image line format for the main formatImages() function.
But when I went back and saw the offending image line (with the missing image tag) in a fresh copy GG's epub in Sigil, I saw this: Code:
<div class="illus"> <img alt="" class="img065" src="../Images/lsh-23-065.png"/></div> My grateful thanks to you both for helping me out with this last problem. @Doitsu...I will release the new version -- AddKindleMediaQueries(v0.1.5) --as soon as I can after some more testing. I'll let you know when the new version is released so you can test it in your own way. Thanks again. |
09-02-2017, 11:31 AM | #225 |
Grand Sorcerer
Posts: 5,583
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
How to add python3lib to sys.path
My simple epub2 output plugin uses ncxgenerator.py from the python3lib folder to optionally generate a new toc.ncx file from nav.xhtml.
(I know that the Python files in the python3lib folder are not intended for use with plugins, however, some of the functions might actually be helpful in special cases, such as my epub2 output plugin.) Since I couldn't figure out a good cross-platform method for adding python3lib to sys.path, I simply bundled ncxgenerator.py with my plugin. Even though I don't think that ncxgenerator.py will be significantly updated in future Sigil versions, I think that it'd be better to add python3lib to sys.path instead of bundling ncxgenerator.py with the plugin. 1. Would it be possible to add python3lib to sys.path in one of the next Sigil versions? or 2. Is there a robust cross-platform method for adding python3lib to sys.path that I overlooked? |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Loading Plugin in development | Sladd | Development | 6 | 06-17-2014 06:57 PM |
Question for plugin development gurus | DiapDealer | Plugins | 2 | 02-04-2012 11:33 PM |
DR800 Plugin development for DR800/DR1000 | yuri_b | iRex Developer's Corner | 0 | 09-18-2010 09:46 AM |
Device plugin development | reader42 | Plugins | 10 | 03-29-2010 12:39 PM |
Calibre plugin development - Newbie problems | minstrel | Plugins | 5 | 04-12-2009 12:44 PM |