View Single Post
Old 09-02-2009, 03:09 PM   #514
Ankh
Guru
Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.
 
Ankh's Avatar
 
Posts: 714
Karma: 2003751
Join Date: Oct 2008
Location: Ottawa, ON
Device: Kobo Glo HD
Quote:
Originally Posted by ahi View Post
How would you go about building such a database, Ankh?

Just processing oodles and oodles of PG eTexts, and manually hyphenate the words therefrom?
Start with the source of nrapallo Webster 1913 dictionary.

Then yes, expect users to help with the growth of the database. The database-assisted hyphenation engine can ask for intervention whenever a word is not in the database. When job is done, process the database, extract the words that were added to basic text file, one line per hyphenated word, submit such file back to the maintainer. Review (use dictionaries and any other tools available), merge changes, new version of the database.

Open source.
Ankh is offline   Reply With Quote