MobileRead Forums - View Single Post

Rev. Bob · 11-01-2019, 06:44 PM

Quote:

Originally Posted by Brett Merkey

On the issue of the fail on deleting images referenced from the styles: Yes, kind of awkward since an image can be deleted but the actual style rule referencing it remains in the CSS. Kind of a smoking gun...

You have the issue somewhat backwards. The main hurdle is in reading CSS files and tracing any relative paths found there to check for existence, which the existing Modify code simply is not designed to do. As I recall, Modify’s parser doesn’t even try to deal with links in that respect; features like XPGT or pagemap removal depend entirely on either file extension or MIME type.

If I recall the image-removal code correctly, it makes a list of image elements by going through the OPF manifest (which means those paths are all relative to the same location) and checking the HTML files (ditto) to hunt for links to them. The spanner in the works, so to speak, is that an HTML file in one folder can reference two or more CSS files from anywhere else, each of which can include style rules that bring in images from elsewhere. That’s at least one new level of complexity, and I’m not sure how to deal with it correctly and completely.

All of that said, two approaches suggest themselves:

1. Make it an official “known issue” and build a wall to block it. At the outset of the routine, scan all CSS files for rules containing url() references which end in image extensions. If any are detected, toss an advisory message and quit without changing anything. (The unpretty routine does something similar in that it scans for PRE elements, as removing whitespace within those can be catastrophic.)

2. As I recall, the existing routine works by building a list of HTML files, scraping those to find links to images, and then checking each image in the manifest against that set to see if any manifest images aren’t referenced. It may be feasible to follow that step by, if any unreferenced images are found, tweaking the parameters for that same routine so that it will similarly scrape stylesheets and compare any references found there to the “unreferenced” list.

Both sound pretty decent at first blush, but there’s a hidden complication in the form of styles specified in the HTML files, either as local stylesheets or inline style declarations… and those get really nasty in a hurry. I would hope nobody would stoop so low as to build a url() reference into an inline style, but it sadly wouldn’t surprise me.

There is, of course, a third option: the status quo. Leave the code as is, a proverbial open manhole with warning tape posted around it, and place the onus on the end user to think before checking the box and enabling the option in the first place.

So, in sum: I’m not averse to the notion of patching, but I don’t want to hand out false senses of security. I would rather continue to note the behavior with a warning while not touching the code than risk tinkering blind and breaking what does work.

Right now, I don’t trust that I have sufficient headspace available to tackle option two. I know that’s not what anyone wants to hear, but it’s far better for me to admit my limitations than break something in the name of foolish pride.