Quote:
Originally Posted by Jellby
I won't claim I have any authority in that field, because I haven't. I have no real experience in programming (other than some scientific samples in fortran). But two thoughts occur to me:
1. Isn't that roughly what the recent patent conflict with MS-Word was about?
2. Wouldn't you need a too large "byte" size for the format string? It's simple for just italic and bold, but how do you store bold-italic? How do you store bold, italic, underlined, red and large size? If your goal is supporting only basic stuff (like just bold and italic) then it's probably fine, but I suspect almost any other alternative would be equally fine...
|
Can you tell me more about (1)? I'm oblivious.
Regarding 2, the second "string" could be a list instead, if need be, with the number of the list item corresponding to the byte-position in the plaintext. But a single byte, used as a bitfield, is sufficient for 8 distinct on-or-off states.
My primary aim at this time is to convert RTF into HTML or LaTeX. Given that some of those RTFs have a lot of extraneous formatting information (usually relating to minimally [and needlessly] varying font-size, and similar things) that would be literally harmful to include in the output in most cases, I would probably focus only on bold, italics, small caps, and colour. With such a combination, the output would be reasonably clean, contain no excess/disruptive (mis)formatting, and yield itself well to trying to figure out what is regular text, and what is something other than.
I should probably include font-size in the formatting list as well... but I'm almost certain I don't need exact font sizes, but rather a more fuzzy determination as to whether the font size is small, regular, or large.
- Ahi