MobileRead Forums - View Single Post - pacify.py (Text reformatter / RTF extractor)

ekaser · 09-01-2009, 01:29 PM

Quote:

Originally Posted by ahi

It just occurred to me that this sort of approach to formatting might also effectively simplify formatting code.

Very true, a built-in form of TIDY. It might be worthwhile to add HTML input (LaTeX, I don't care about personally, and I'm not sure how many folks really use it or would use it, but that's neither here nor there). Right now you have .TXT and .RTF input and .HTML and LaTeX output. Why not make it 'orthogonal', all any of the four input and any of the four output. That would, in essence, make it a 'tidy' and conversion program all in one. Just a thought.

EDIT: Note, I'm not suggesting you include a 'general' HTML or LaTeX parser, anymore than you're going to have a general RTF parser, just the "basic stuff" that you want to keep and throw everything else away. Sure, some files it would make a mess of, but those files probably wouldn't be appropriate for this style of conversion either. I'm assuming this is aimed at "simple novel" types of books that don't have a lot of fancy formatting to start with. One thought: since you're taking RTF as input files, some of those will have images (covers, maps, etc), so I'm hoping that those image tags would be maintained along with the bold, italic, etc, right? That would imply the need to be able to include a "numbered mark" in the formatting string. Perhaps if the most significant bit of the formatting 'character' was set, then the lower bits are the 'number' of the image (on the "image stack") that should be inserted at that point. Of course, that then also brings up the question of image positioning: left, center, right.