MobileRead Forums - View Single Post - pacify.py (Text reformatter / RTF extractor)

ahi · 09-24-2009, 03:57 PM

Quote:

Originally Posted by ekaser

Which certainly generates the 'smallest' output code, and most efficient. To do that, of course, as someone else mentioned, means that you have to keep a "format stack" so that you know which format was turned on in which order, so that you can properly "back them off" in the right order.

Of course, as you're aware, there WILL be "improperly formatted" HTML/RTF input files where the italics/bold formatting overlaps and they're not turned off/on cleanly. In that case, you SHOULD convert it to clean formatting. ie, using ()'s instead of []'s for visuals:

text(I)text(B)text(/I)text(/B) ==> texttexttexttext

should be "cleaned up" to:

text(I)text(B)text(/B)(/I)(B)text(/B) ==> texttexttexttext

At least, IMHO.

Remember, ekaser, some of this will happen automagically from (1) the way I keep track of formatting [i.e.: the parallel stream simplifies stuff to begin with] and (2) the formatting normalization plugin [which simplifies stuff a bit further... mostly by blanking formatting for newline characters and spaces standing between non-same formatted other characters].

But yeah... I'll give the format stack solution a shot and see what I manage.

- Ahi