MobileRead Forums - View Single Post - pacify.py (Text reformatter / RTF extractor)

ekaser · 09-01-2009, 02:13 PM

Quote:

Originally Posted by ahi

5. Try to detect poems, letters, quotations and mark them somehow. (A third layer, with a single setting per line, as opposed to per unicode character?)
6. Try to detect part, chapter, section headers... possibly interactively with help from user to make more accurate.

See my other reply to your previous note, regarding using an "start of block" (not "end of block" as I was calling it before) character (which I assume you'll have to have, whether it's NUL, CR, LF or something else) as a place to hold formatting for the FOLLOWING block of text. You could easily use that one "formatting byte" associated with the "end of block" character to hold a bunch of things, to indicate the justification, indent, part, chapter, header, etc. In fact, I'd recommend having a number of "start of block" characters (you've got at least 32 of them, even if characters are only 1 byte wide, as 0-31 will never appear in 'normal' text), so if everything you need fits on the formatting byte for the single "start of block" byte fine (using CR, let's say), and if you need more, then you use alternate "start of block" characters that never emit any text output, but are simply placeholders that provide more information about the following block's formatting. That way, you maintain the ease of text processing/searching, while enabling the 'hiding' of a lot of block-oriented information in your formatting area.

Everett