View Single Post
Old 10-01-2021, 05:57 AM   #8
lomkiri
Groupie
lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.
 
lomkiri's Avatar
 
Posts: 167
Karma: 1497966
Join Date: Jul 2021
Device: N/A
Quote:
Originally Posted by Karellen View Post
Ok, I figured it out. My Search Regex was wrong.
And now it works correctly.
Nice! As you figured out, data is a dict provided by Kovid in the regex-function system to hold persistent values from one call to another. If it is of some use for you, in the other parameters, number gives the number of iterations of the function when in mode "replace all" and file_name, well... ;-)

Quote:
Originally Posted by Karellen View Post
Ok, I figured it out. My Search Regex was wrong.
And now it works correctly.

Do you know of any way to convert Number to Word eg 17 to Seventeen?
I don't, but some others do ;-) A ddg search gives me this or this (same answer, in fact), so we have just to adapt it to your case :

Saying your search string is
(^\s*<[^>]+>)([^<\n]*)(<[^>]+>\n)
to target a line with a number inside a tag, ex: <h2>seventeen</h2> or <p>twenty one</p> (adapt it better if necessary) the function will be :
Code:
def words2int(textnum, numwords={}):
    # create our default word-lists
    if not numwords:

      # singles
      units = [
        "zero", "one", "two", "three", "four", "five", "six", "seven", "eight",
        "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen",
        "sixteen", "seventeen", "eighteen", "nineteen",
      ]

      # tens
      tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"]

      # larger scales
      scales = ["hundred", "thousand", "million", "billion", "trillion"]

      # divisors
      numwords["and"] = (1, 0)

      # perform our loops and start the swap
      for idx, word in enumerate(units):    numwords[word] = (1, idx)
      for idx, word in enumerate(tens):     numwords[word] = (1, idx * 10)
      for idx, word in enumerate(scales):   numwords[word] = (10 ** (idx * 3 or 2), 0)

    # primary loop
    current = result = 0
    # loop while splitting to break into individual words
    for word in textnum.replace("-"," ").split():
        # if problem then fail-safe
        if word not in numwords:
          raise Exception("Illegal word: " + word)

        # use the index by the multiplier
        scale, increment = numwords[word]
        current = current * scale + increment
        
        # if larger than 100 then push for a round 2
        if scale > 100:
            result += current
            current = 0

    # return the result plus the current
    return result + current


def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    wrd = match.group(2)
    if wrd:
        return match.group(1) + str(words2int(wrd)) + match.group(3)
        # or, same thing : return "{}{}{}".format(match.group(1), words2int(wrd), match.group(3))
    return  match.group(0)

Last edited by lomkiri; 10-01-2021 at 06:01 AM.
lomkiri is offline   Reply With Quote