View Single Post
Old 07-02-2016, 09:10 PM   #6
Fish-Face
Junior Member
Fish-Face began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Feb 2016
Device: Kindle PW3
Quote:
Originally Posted by HarryT View Post
Doing a linear search of the dictionary is of course the slowest possible way to do it. Given that you have programming skills, you could extract all the index entries to a separate file, storing the file offset of each one. You could then do a binary search of this index file, which would be enormously faster (eg it'll find any word in a 60,000-word dictionary with at most 16 comparisons, rather than the average of 30,000 comparisons that the linear search will require).
I'd even considered doing a binary search within the dictionary file, since it's laid out in alphabetical order. I'm now resigned to being unable to parse the file "properly" if I keep it as the mobipocket file. By that I mean, I will have to have an index into the file (as a byte position) - so the HTML hierarchy will be lost or rather, have to be assumed, since we don't know how far into the file relevant structure may be.

In fact I think this is how mobipocket does it natively - the anchors in the file have hrefs/ids "fileposxyz" which seem to be byte indices or something.
Fish-Face is offline   Reply With Quote