Quote:
Originally Posted by ahi
2. (ERROR) Those that LaTeX thinks it knows how to hyphenate, but--the word being an exception to whatever LaTeX hyphenation pattern matches is--in fact hyphenates incorrectly.
|
After using LaTeX for a couple years, I have to say I've never yet noticed an example of this. I'll admit that I may not have looked hard enough, though. Still, the fact that I haven't noticed such errors even if they exist means something.
Quote:
The "traditional" approach is to not worry about all this nonsense, and proofread the book to catch #2's, and manually fix badboxes to catch #4's.
|
Indeed, but that's because "traditionally", .tex documents have been used to produce a single fixed format document. If TeX were to be introduced onto readers as a way of handling reflow, presumably new conventions would be necessary.
I don't know if a tool presently exists to parse a LaTeX document and return a list of words that it doesn't know how to hyphenate, but if it doesn't, I cannot imagine that such a tool would be at all difficult to create, even if it meant digging into (La)TeX's source code a little bit. My thought is that at book creation, this would be run once to generate a list, and then the person writing the tex code would use a \hypenation tag to deal with all of them.
But you raise a good point as to how it easy it would be to get that algorithm to distinguish between your cases #3 and #4. I'll admit I don't know enough about LaTeX's hyphenation algorithm to know how easy this would be, but even it does pattern matching rather than word matching (--actually my own experience makes me think that LaTeX does store its hyphenation rules at the word-level rather than pattern-level, but I'm not sure--) I don't think it would be that hard. Most unhyphenateable words would be common one-syllable words, and a list of such words to check against does not seem like it would be difficult to generate. (And if a few got through during this process it wouldn't be a problem... the book creator would just specify that they can't be hyphenated...)
(Again, I'm restricting my comments to English and similar languages. The market for English is big enough to make this worthwhile...)
And LaTeX is not the only software out there that does hyphenation... there's also Scribus, InDesign (though I think their algorithm is based on TeX's), etc. Surely, this is not such an unreachable goal. I'd very surprised if more than a few paragraphs per book on average are "hand-hyphenated" now, even with good presses, to be honest, though I don't have any first-hand knowledge of such things.
Quote:
As for the unsolved problem... it's unsolved, but not a problem. People who are fine with reflow formats do not complain about poor hyphenation.
|
I'm not sure what you mean by "people who are fine with reflow". You make it sound like a bad thing. If you want examples of people who want both reflow while having decent (if not perfect) hyphenation, you can count me as such a person. I at least want the matter fully explored.
Quote:
Such a renderer, unless you only care about English language books, would have to be several magnitudes more complex than the most sophisticated typesetting systems that exist today.
|
A renderer that handled English (and languages similar enough to English) this way but handled other languages in a manner similar to how current ePub renderers worked seems like a decent compromise, at least to be sold in English-speaking markets. Again, I'm delighted with the idea of sending the source along with one or a few human-created fixed formats to be used unless the user wants custom reflow.
Geez, now you have me wondering whether Tengwar and Klingon, etc. have hyphenation rules...
Anyway, I admit that there are some assumptions I'm making here that may be wrong. I just haven't seen what I would consider compelling evidence against the possibility of such things.