View Single Post
Old 10-01-2020, 09:31 AM   #69
geek1011
Wizard
geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.
 
Posts: 2,736
Karma: 6990705
Join Date: May 2016
Location: Ontario, Canada
Device: Kobo Mini, Aura Edition 2 v1, Clara HD
Quote:
Originally Posted by jackie_w View Post
I probably shouldn't do this, but I feel strangely compelled ...

By my reckoning there are 8000+ more headwords in this new Kobo Oxford English than there are in the Kindle's equivalent Oxford English.
There are 275150 headwords in the new English dictionary. There are also 131119 of the v3 prefix exceptions, and there's a subtle change to the format of the words trie I didn't notice earlier involving two additional numbers. I'll have to look into that more later this week.

There also seems to be another change with v3 dictionaries in 15676 compared to 15672 which I didn't notice during my brief look at the changes. The prefix exceptions now appear to support referencing a prefix instead of an entire word. In other words, prefix exceptions really are now prefix exceptions, not word redirects. This would also explain why some of my test cases I didn't expect to be fixed by the support for multiple prefix exception entries for a word started working. This is even better than I originally expected of the changes in 15676 (thanks Kobo!), and I think I'll be able to easily work around any bug now, and this even allows for a bunch of crazy workarounds if there happen to be any bugs. This also means it'll take me a bit longer for the v3 support in dictutil since I'll have to go through everything carefully again.

Edit: It also appears the <a name="... matching is more robust.

Edit 2: Unlike the old one, the HTML in the new English dictionary is actually well-formed. Also, unlike the old dictionaries, based on the checks I've implemented in dictutil so far, there doesn't appear to be bugs with word matching in it.

Edit: If I'm reading this correctly, I it means words with this new style of prefix exceptions won't work on 15672. This would corroborate with the file modification times in the zip being September 28.

Last edited by geek1011; 10-01-2020 at 10:06 AM.
geek1011 is offline   Reply With Quote