View Single Post
Old 09-02-2009, 03:29 PM   #39
ahi
Wizard
ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.
 
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
Grand Plan!

Ok, ekaser... so here's my grand plan's reformulation:

The text is basically parsed into a pTome wrapping class...

Each pTome contains an arbitrary number of pPar objects, which are assumed to be either paragraphs or lines with necessary line-breaks (like poems or quotations).

Each pPar object has a 1) classification [e.g.: paragraph, quotation, {chapter/section} title, et cetera], 2) a pString object.

Each pString object has a 1) text string, 2) a formatting string.

The pTome class would have accessor methods to facilitate high-level "posing" of the sort of questions I identified in my earlier post, but instead of words being preparsed, they would be parsed only on the fly whenever an accessor method needed it. I do not foresee a need to perform word level operations, only to make word or higher level queries.

Some outstanding decisions on my mind...

1) color... I should probably include it in the formatting string... so I think I'll probably make the formatting "string" not work on the basis of bitfields but something a bit more complex, so if in the future I discover a reason to make the conversion from RTF or HTML more fine-grained, I can do so without much internal rewriting.

2) links, footnotes, annotations... I am thinking these might have to be their own parallel "strings" (not containing unicode bytes, but rather arbitrarily long sub-pStrings though, or destinations in the case of links). After all, a given character could be both part of a link, and be (right in front of) a footnote (mark). I'm not sure how annotations work in RTFs, but that might also coexist with the previous two in certain complex cases.

Can you think of a better way that doesn't introduce too much complexity?

With regards to the links, I think the link parallel string would only be a destination for location infromation "deposited" from a higher level... almost certainly by the owning pTome.

Basically... the following:

Code:
<h1>The Beginning</h1>

<p>
It was rather a new sort of experience<footnote>though
admittedly she's been on the run from the law before, but that
was a <i>long</i> time ago</footnote>, and she did not deal
well with it.  Or, rather, <b>it</b> did not deal kindly with her.
<p>
would turn into (and I simply, for the sake of being more readable):

Code:

pTome
|
|
|--- pPar[0]
|    |
|    |___ pClassification = "title"
|    |___ pString
|         |
|         |__ "The Beginning" # text
|         |__ "0000000000000" # formatting
|         |__ "0000000000000" # links
|         |__ "0000000000000" # annotations
|         |__ "0000000000000" # footnotes
|
|--- pPar[1]
     |
     |___ pClassification = "paragraph"
     |___ pString
          |
          |__ "It was rather a new sort of experience, and she did not deal well with it.  Or, rather, it did not deal kindly with her."
          |__ "0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000bb000000000000000000000000000000" # formatting
          |__ "000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000" # links
          |__ "000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000" # annotations
          |__ "0000000000000000000000000000000000000*0000000000000000000000000000000000000000000000000000000000000000000000000000000000" # footnotes
                                                    |
                                                    |_ pString
                                                       |
                                                       |__ "though admittedly she's been on the run from the law before, but that was a long time ago"
                                                       |__ "0000000000000000000000000000000000000000000000000000000000000000000000000000iiii000000000"
Does this seem a reasonable way to go about it all?

- Ahi
ahi is offline   Reply With Quote