MobileRead Forums - View Single Post

Elfwreck · 09-02-2009, 04:39 PM

Quote:

Originally Posted by ahi

Do you really consider feasible a hyphenation database that contains all actual words it addresses, along with all their compounded, conjugated/declined (and apostrophe-laden?) forms? Even for languages unlike English, where prefixes, suffices, conjugations, and declensions can create over a 100 valid and sensible words from a 3 letter root word?

Given how utterly technically simple this is, you'd think all word processors would be using it by now. (Though "present" in English, and many words in other languages would continue to be incorrectly auto-hyphenated.)

It's not possible to build a hyphenation database for every word, especially not every possible word in agglutinative languages, nor patchwork languages like English that create new words by smashing other words together.

However, a good, if not perfect, hyphenation algorithm could be made, based on linguistic analysis of the language. And it could be combined with a dictionary, so it would automatically put up flags for words that could be either compound words or identically-spelled words with different meanings. (I'd say "homonyms," but they might not have the same pronunciation.) It wouldn't fix all mishy-phens, but it'd allow the formatting person (whoever that is, author or editor) to quickly identify the possibilities, rather than doing a line-by-line proof every time the formatting shifts a bit.

As far as I know, "present" is always split "pre-sent," with any of its three possible pronunciations. However, unless I was using it in a sentence like
"we knew the authorization would arrive later that week, so we pre-
sent the package," I'd avoid hyphenating it to avoid confusion, because ending with "pre-" implies the long-e pronunciation.

There's no reason hyphenation software couldn't be as good as current spellcheck software--not perfect, but good enough to remove a lot of the gruntwork of proofreading, and good enough to reflow a book to avoid almost all troublesome hyphenations.

As amusing or sometimes annoying as bad hyphenations are, I'd rather publishers spent more time on actual typos, and apostrophe use, and a good table of contents. Oh, and an index for nonfiction books.

I need good content before I need great formatting.