Updated pacify.py ... see attached or first post.
---
Now pacify.py by default produces HTML... of sorts. Don't worry, I'll add back in the functionality to output UTF-8 plaintext when I get around to it.
The caveat being: no <p> tags are produced (although if you look at the source, it is separated by linebreaks, so you can add in the <p> tags easily enough with some clever search and replace.
Also, footnotes extracted from RTF files are enclosed in <footnote>some text here</footnote> for the sake of simplicity--this will be fixed.
Oh, and presently only formatting from RTF is picked up... so presently _emphasized phrase_ style formatting is not recognized.
---
The input is autodetected as either .txt or .rtf based on the file extension. Many RTFs work well... I am regularly encountering ones that prove problematic. Since RTF seems to be a rather large and unwieldy specification, I am not sure how likely am I to be able to guarantee the accuracy of conversion.
If anybody has advice on how I can make my RTF parser cleverly ignore stuff that it doesn't care about, I'd be grateful. It does alright so far... but since I do not yet understand how I could opt to only process text that shows visibly (as opposed to metadata) I am actively filtering out metadata one rtf command at a time... doubtless the wrong way to do it, I know.
The output defaults to HTML unless the -l switch (LaTeX) is used. The LaTeX switch now requires an argument... currently only supports -l gppro though.
Also, if you should provide the title (-T "..."), author (-A "lastname, firstname"), and optionally subtitle (-S "...") for the generated LaTeX document to have a nice title page. Optionally you can also specify your name (-I "...") for an "Ex Libris ..." inscription at the bottom of the title page.
Some parts of the program are a bit more robust now... so you are less likely to encounter errors, but they will almost certainly still happen if the file is very messy (or, I suppose, just very different from the ones I have tested with).
Comments, reports, suggestions are appreciated.
---
Suggested use:
Produce HTML from text:
pacify.py -i input.txt -pcq
Produce LaTeX from text:
pacify.py -i input.txt -pcql gppro -T "Title of Book" -A "Lastname, Firstname" -S "a jolly good tale" -I "Ahi"
Produce HTML from RTF (preserving italic/bold formatting + footnotes):
pacify.py -i input.rtf -pcql gppro
Produce LaTeX from RTF (preserving italic/bold formatting + footnotes):
pacify.py -i input.rtf -pcql gppro -T "Title of Book" -A "Lastname, Firstname" -S "a jolly good tale" -I "Ahi"
- Ahi
|