The DEBUG log shows exactly why MD was picked over HTML. It was because the 'normalizing factor' that was set was backwards. The divisor I used should have been the dividend, and vice versa. Fixed in new beta version uploaded to the prior post.
Code:
current_column: #ris_abstract
MD : regex match: (\*|\_)+(\S+)(\*|\_)+
MD : regex match: (^(\W{1})(\s)(.*)(?:$)?)+
MD : regex match: (\'{1})(.*)(\'{1})
HTML : regex match: href=
HTML : regex match: <a|</a|<a href=
HTML : regex match: <div|</div
HTML : regex match: <li|</li
HTML : regex match: <p|</p
HTML : regex match: <span|</span
HTML : regex match: <ul|</ul
HTML : regex match: <u|</u
guessing:
MD: normalizing factor: 0.9
MD: scores & ratios: 3 0.1875 0.16875
HTML: scores & ratios: 8 0.25806451612903225
--->>> best guess: html