Quote:
Originally Posted by baskerville
I'm therefore in the process of writing an HTML renderer!
I've already written the first, and most important, piece of the puzzle: the line breaking algorithm.
|
Knuth-Plass is not suitable for use with HTML in general; floats can cause the line width to vary based on the line height of preceding lines, which can in turn vary if you use dynamic line breaks the way K-P does.
Basically, you don't know the allowed width of a given line without having already fixed the layout of the lines above it.
Jonathan Kew
discusses here (in the item explaining why Firefox doesn't use K-P):
Quote:
Lines of different widths are not a problem in themselves. They become a problem when those widths are not known in advance, as when (for example) a float takes a "chunk" out of the side of the paragraph for text to wrap around it. The length required for any given line may depend on its exact vertical position; but that in turn might depend on which breaks end up getting chosen on earlier lines - and that choice may not be determined until _later_ in the paragraph, if there are several "active" possibilities under consideration.
This can be difficult even with a fixed-position float, if line heights vary (due to font changes, inline images, or all sorts of other factors); it gets worse if the float itself is anchored to a position within the text of the paragraph, and so the position of the float is not known in advance of line-breaking the text that contains it.
|
The good news is that the simple case (fixed- or at least known-width lines) should be sufficient for many ebooks, and K-P is ideal there, but you're going to have to think about how to determine ahead of time whether you can precalculate the line widths, and how to fall back to a different line-breaking algorithm if you cannot. “I don't support floats” is one possible answer, but many ebooks do use inline images so it's probably not acceptable.
You may also want to detect long paragraphs, since K-P is quadratic on the length of lines. If you can find it, P.I Cooper's article in the July 1966
Advances in Computer Typesetting discusses a way to break paragraphs into overlapping chunks to avoid degenerate behavior in long paragraphs.