View Single Post
Old 09-02-2009, 10:53 AM   #474
ahi
Wizard
ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.
 
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
Quote:
Originally Posted by frabjous View Post
Wow, such overreaction.
Me?

Quote:
Originally Posted by frabjous View Post
LaTeX already knows the hyphenation of most words, as already been not only stated but demonstrated.
If I understand correctly, LaTeX hyphenation patterns are not wordlists, but pattern lists. Which means there is no automatic way of identifying words for which correct hyphenation patterns are not known. (i.e.: meaning both words for which no hyphenation is possible with the given pattern-set, and words for which LaTeX's known hyphenation is actually incorrect)

In Hungarian, hyphenation of certain words (not an ennumaratably small list) depends on semantic context... literally no way to know the correct hyphenation without understanding the word/sentence.

In addition, the Hungarian double digraphs "ssz" (a long "sz"), "ccs" (a long "cs"), "zzs", "ggy", 'nny" are treated unorthodoxly. If "massza" is hyphenated as "masz-sza"... however "ssz" could also be "s+sz" as in "vasszarv" which is correctly hyphenated as "vas-szarv". The LaTeX solution is to manually mark double digraphs... so that if hyphenation needs to occur there, it is not mistakenly separated the wrong way. Oh... and, of course, this is also an issue with single digraphs. Is a "cs" sequence a digraph, or merely "c+s"--is a "sz" or a "zs" sequence a digraph or "s+z" or "z+s".

Tolerable hyphenation that is right most of the time will not forever be impossible to do at display-time. Professional hyphenation correct to the standards of books published by reputable publishers, however, I believe will remain so perpetually because of the myriad complications (most of which you and I do not even know, on account of being language-specific issues) on top of the already formidable challenges.

- Ahi
ahi is offline   Reply With Quote