View Single Post
Old 12-08-2019, 08:07 PM   #1331
snarkophilus
Wannabe Connoisseur
snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.
 
Posts: 426
Karma: 2516674
Join Date: Apr 2011
Location: Geelong, Australia
Device: Kobo Libra 2, Kobo Aura 2, Sony PRS-T1, Sony PRS-350, Palm TX
Quote:
Originally Posted by JSWolf View Post
Maybe counting syllables is that difficult. Or maybe the routine used is inefficient. You could give a look and see if you can improve it.
It turns out that counting syllables isn't hard on its own, but looks like if you count syllables in each word separately when trying to determine the complex word count (words with >= 3 syllables) then it is harder:

Code:
count all syllables
 .... count all syllables = 270010 done --- 1.28500008583 seconds ---
count syllables in all words for complex words
 .... count syllables done --- 43.6440000534 seconds ---
Turns out that hunch was also incorrect. I dug a bit deeper, and this appears to be the culprit:

Code:
                    for sentence in sentences:
                        if str(sentence).startswith(word):
                            found = True
                            break
If I understand that correctly, for every word we loop over (for Endymion which has only 200,000ish words and is faster to work with) around 13,000 sentences to check if that word appears at the start of a sentence, so we're potentially doing approx 3.5 billion compares?! Give or take a few for early matches of a word at the beginning of a sentence. For Oscar we're potentially doing around 79 billion compares. No wonder this isn't fast

I'm very new to Python. If this were in Perl I'd think about storing each first word of a sentence in a hash (an associative array) and instead of looping over all sentences for each word just check if the hash value exists. Is this type of thing possible in Python?
snarkophilus is offline   Reply With Quote