MobileRead Forums - View Single Post

frabjous · 09-05-2009, 03:21 PM

Quote:

Originally Posted by ahi

If I understand correctly, LaTeX hyphenation patterns are not wordlists, but pattern lists. Which means there is no automatic way of identifying words for which correct hyphenation patterns are not known.

I don't understand why it would make a difference. Whether the rules are stored as complete words or as patterns, they're applied to words at processing time, and there's no way it's not possible for it to recognize when it can't find any appropriate patterns in a word.

All the other examples you've given about other languages just add a level of complexity. If humans can correctly hyphenate, so can a computer. I can't for the life of me imagine what argument there could be for thinking otherwise. Maybe the software isn't quite there yet, but it's getting closer, and we might as well use the best currently available.

Quote:

Frabjous - A LaTeX install is also, at a minimum, hundreds of meg in size. This is one of the things I'm on about - it's not suitable as a typological processor in a low-resource environment. Typography is demnstrably mostly-solveable, by brute force, but that that soloution is not applicable to low-power devices.

MikTeX portable is about 90 MB in size, and it's a fully functioning and respectable LaTeX distribution -- yes, LaTeX, not TeX -- and widely used even by those who could have dedicated more space. I'm not saying that's tiny by comparison to what's on current readers, but I'd be more than happy to dedicate that much memory even in my 505 to get improved results, and with the next generation of readers, this amount of space will not be too much. What's possible on a PRS-500 hardly matters looking forward. None of those will be sold anymore. RAM is probably a bigger issue, but I really don't think LaTeX requires very much at all--I've admitted all along that hardware might not be there yet, but I haven't seen anything to convince me it won't be soon.

And if not, the conclusion is obvious: ebook sized PDFs (or what amounts to the same) is the only remaining viable option. But I don't think it has to be that long-term.

Quote:

The Knuth & Plass paper which I cited has a formal proof and discussion of the impossibility of finding the perfect set of breaks for a paragraph.

I haven't seen this proof, but I think you've misunderstood the practical importance of such proofs in general (which is almost nonexistent). At worse it shows that there will be some *theoretical* cases, unlikely ever to be actualized, in which an algorithm will give less than perfect cases. And we are not getting perfect results now even with human-designed books.

It reminds me of a conversation I recently had with a colleague. I teach logic at a large university -- a colleague and I alternate teaching Intro to Logic every other semester. I was discussing with him the possibility of writing some software to check our students' answers. One thing we do is assign translations into first-order predicate logic, but we accept all logically equivalent answers as correct. My colleague tried to convince me that software would be worthless here, since by Church's theorem, first-order logic is recursively undecidable, and thus an algorithm that tells you whether a given formula is logically equivalent to another isn't technically possible (it's not a Turing-computable function). But as I pointed out to him at the time, it's absolutely ridiculous to think that we couldn't write software that checked for equivalence over all models of cardinality of finite cardinality n, and that the chances an introductory student would turn in an answer that was equivalent over a large finite cardinality but not equivalent in some infinite domains, with the problems we give, is astronomical, and that, if by some miracle, they managed it, it would be cruel not to give them full credit.

If an algorithm tries 5 million different layouts for a paragraph and selects the best one of those, but "theoretically" it would be possible to get a better one after 5 million more passes (or just "out there" which it could never find), is not interesting or relevant at all practically speaking. An algorithm that would give as good results as your typical human typographer is certainly still possible.-- and likely you'd want to deliberately make it not seek absolute perfection just to save processing power anyway.