MobileRead Forums - View Single Post

snarkophilus · 12-08-2019, 08:07 PM

Quote:

Originally Posted by JSWolf

Maybe counting syllables is that difficult. Or maybe the routine used is inefficient. You could give a look and see if you can improve it.

It turns out that counting syllables isn't hard on its own, but looks like if you count syllables in each word separately when trying to determine the complex word count (words with >= 3 syllables) then it is harder:

Code:

count all syllables
 .... count all syllables = 270010 done --- 1.28500008583 seconds ---
count syllables in all words for complex words
 .... count syllables done --- 43.6440000534 seconds ---

Turns out that hunch was also incorrect. I dug a bit deeper, and this appears to be the culprit:

Code:

                    for sentence in sentences:
                        if str(sentence).startswith(word):
                            found = True
                            break

If I understand that correctly, for every word we loop over (for Endymion which has only 200,000ish words and is faster to work with) around 13,000 sentences to check if that word appears at the start of a sentence, so we're potentially doing approx 3.5 billion compares?! Give or take a few for early matches of a word at the beginning of a sentence. For Oscar we're potentially doing around 79 billion compares. No wonder this isn't fast

I'm very new to Python. If this were in Perl I'd think about storing each first word of a sentence in a hash (an associative array) and instead of looping over all sentences for each word just check if the hash value exists. Is this type of thing possible in Python?