View Single Post
Old 09-01-2009, 11:18 AM   #20
ahi
Wizard
ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.
 
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
Quote:
Originally Posted by Jellby View Post
I won't claim I have any authority in that field, because I haven't. I have no real experience in programming (other than some scientific samples in fortran). But two thoughts occur to me:

1. Isn't that roughly what the recent patent conflict with MS-Word was about?

2. Wouldn't you need a too large "byte" size for the format string? It's simple for just italic and bold, but how do you store bold-italic? How do you store bold, italic, underlined, red and large size? If your goal is supporting only basic stuff (like just bold and italic) then it's probably fine, but I suspect almost any other alternative would be equally fine...
Can you tell me more about (1)? I'm oblivious.

Regarding 2, the second "string" could be a list instead, if need be, with the number of the list item corresponding to the byte-position in the plaintext. But a single byte, used as a bitfield, is sufficient for 8 distinct on-or-off states.

My primary aim at this time is to convert RTF into HTML or LaTeX. Given that some of those RTFs have a lot of extraneous formatting information (usually relating to minimally [and needlessly] varying font-size, and similar things) that would be literally harmful to include in the output in most cases, I would probably focus only on bold, italics, small caps, and colour. With such a combination, the output would be reasonably clean, contain no excess/disruptive (mis)formatting, and yield itself well to trying to figure out what is regular text, and what is something other than.

I should probably include font-size in the formatting list as well... but I'm almost certain I don't need exact font sizes, but rather a more fuzzy determination as to whether the font size is small, regular, or large.

- Ahi
ahi is offline   Reply With Quote