View Single Post
Old 03-13-2023, 04:57 PM   #821
DaltonST
Deviser
DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.
 
DaltonST's Avatar
 
Posts: 2,265
Karma: 2090983
Join Date: Aug 2013
Location: Texas
Device: none
The DEBUG log shows exactly why MD was picked over HTML. It was because the 'normalizing factor' that was set was backwards. The divisor I used should have been the dividend, and vice versa. Fixed in new beta version uploaded to the prior post.

Code:
current_column:  #ris_abstract
MD : regex match:  (\*|\_)+(\S+)(\*|\_)+
MD : regex match:  (^(\W{1})(\s)(.*)(?:$)?)+
MD : regex match:  (\'{1})(.*)(\'{1})
HTML : regex match:   href=
HTML : regex match:  <a|</a|<a href=
HTML : regex match:  <div|</div
HTML : regex match:  <li|</li
HTML : regex match:  <p|</p
HTML : regex match:  <span|</span
HTML : regex match:  <ul|</ul
HTML : regex match:  <u|</u
guessing:
  MD: normalizing factor:  0.9
  MD: scores & ratios:  3 0.1875 0.16875
  HTML: scores & ratios:  8 0.25806451612903225
  --->>> best guess:  html

Last edited by DaltonST; 03-13-2023 at 05:00 PM.
DaltonST is offline   Reply With Quote