View Single Post
Old 08-29-2009, 11:34 PM   #6
ahi
Wizard
ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.
 
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
Updated pacify.py ... see attached or first post.

---

Now pacify.py by default produces HTML... of sorts. Don't worry, I'll add back in the functionality to output UTF-8 plaintext when I get around to it.

The caveat being: no <p> tags are produced (although if you look at the source, it is separated by linebreaks, so you can add in the <p> tags easily enough with some clever search and replace.

Also, footnotes extracted from RTF files are enclosed in <footnote>some text here</footnote> for the sake of simplicity--this will be fixed.

Oh, and presently only formatting from RTF is picked up... so presently _emphasized phrase_ style formatting is not recognized.

---

The input is autodetected as either .txt or .rtf based on the file extension. Many RTFs work well... I am regularly encountering ones that prove problematic. Since RTF seems to be a rather large and unwieldy specification, I am not sure how likely am I to be able to guarantee the accuracy of conversion.

If anybody has advice on how I can make my RTF parser cleverly ignore stuff that it doesn't care about, I'd be grateful. It does alright so far... but since I do not yet understand how I could opt to only process text that shows visibly (as opposed to metadata) I am actively filtering out metadata one rtf command at a time... doubtless the wrong way to do it, I know.

The output defaults to HTML unless the -l switch (LaTeX) is used. The LaTeX switch now requires an argument... currently only supports -l gppro though.

Also, if you should provide the title (-T "..."), author (-A "lastname, firstname"), and optionally subtitle (-S "...") for the generated LaTeX document to have a nice title page. Optionally you can also specify your name (-I "...") for an "Ex Libris ..." inscription at the bottom of the title page.

Some parts of the program are a bit more robust now... so you are less likely to encounter errors, but they will almost certainly still happen if the file is very messy (or, I suppose, just very different from the ones I have tested with).

Comments, reports, suggestions are appreciated.

---

Suggested use:

Produce HTML from text:
pacify.py -i input.txt -pcq

Produce LaTeX from text:
pacify.py -i input.txt -pcql gppro -T "Title of Book" -A "Lastname, Firstname" -S "a jolly good tale" -I "Ahi"

Produce HTML from RTF (preserving italic/bold formatting + footnotes):
pacify.py -i input.rtf -pcql gppro

Produce LaTeX from RTF (preserving italic/bold formatting + footnotes):
pacify.py -i input.rtf -pcql gppro -T "Title of Book" -A "Lastname, Firstname" -S "a jolly good tale" -I "Ahi"

- Ahi
Attached Files
File Type: zip pacify.zip (7.0 KB, 386 views)
ahi is offline   Reply With Quote