View Single Post
Old 07-19-2017, 04:04 PM   #162
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,695
Karma: 205039118
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by Rev. Bob View Post
In theory, ignoring HTML comments should be simple. If "<!--" is found, scan ahead until "-->" or EOF is found, and ignore everything in between: HTML code, equations, scripts, whatever. Of course, depending on how the algorithm in this instance is designed, "scan ahead" may not be feasible.

Maybe a preprocessing pass that removes comments to store them in a list somewhere, then a post pass that puts them back? (I'm really just spitballing here, having not looked at the code.)
Yes, in theory, it's quite simple. I'm somewhat limited by the fact that I'm using the SmartyPants algorithm which has its own html tokenizer routine that doesn't lend itself well to the "scan ahead" technique.

Luckily the fork of Python SmartyPants on PyPi, has a robust solution for handling html comments that I'm going to incorporate. Should have an updated version of the plugin very soon.

Last edited by DiapDealer; 07-19-2017 at 04:06 PM.
DiapDealer is offline   Reply With Quote