MobileRead Forums - View Single Post

DiapDealer · 07-19-2017, 05:04 PM

Quote:

Originally Posted by Rev. Bob

In theory, ignoring HTML comments should be simple. If "" or EOF is found, and ignore everything in between: HTML code, equations, scripts, whatever. Of course, depending on how the algorithm in this instance is designed, "scan ahead" may not be feasible.

Maybe a preprocessing pass that removes comments to store them in a list somewhere, then a post pass that puts them back? (I'm really just spitballing here, having not looked at the code.)

Yes, in theory, it's quite simple. I'm somewhat limited by the fact that I'm using the SmartyPants algorithm which has its own html tokenizer routine that doesn't lend itself well to the "scan ahead" technique.

Luckily the fork of Python SmartyPants on PyPi, has a robust solution for handling html comments that I'm going to incorporate. Should have an updated version of the plugin very soon.