I will figure out the cause of the error... but I should perhaps note, nrapallo (in case it is unclear either to you or others) the -p option corrects
erroneous paragraph breaking, not systematic paragraph breaking.
The program, regardless of whether the -p option is used, fixes systematic paragraph breaking like:
Code:
Here I am! I travelled yesterday for four hours in a train. It's a
funny sensation, isn't it? I never rode in one before.
College is the biggest, most bewildering place--I get lost whenever I
leave my room. I will write you a description later when I'm feeling
less muddled; also I will tell you about my lessons. Classes don't
begin until Monday morning, and this is Saturday night. But I wanted
to write a letter first just to get acquainted.
What -p would fix would be if the same lines were thus:
Code:
Here I am! I travelled yesterday for four hours in a train. It's a
funny sensation, isn't it? I never rode in one before.
College is the biggest, most bewildering place--I get lost whenever I
leave my room. I will write you a description later when I'm feeling
less muddled; also I will tell you about my lessons. Classes don't
begin until Monday morning, and this is Saturday night. But I wanted
to write a letter first just to get acquainted.
The -p option would detect that the line that the "paragraph" that ends with "... I'm feeling" and is followed by a paragraph that starts with "less muddled; also ..." are almost certainly supposed to be a single paragraph.
While the -p option is good to use (once it works reliably) on all files "just in case" (and since it reports to the user what it changes, you'll know if it corrects something in error)... a file that has no such systematic paragraph errors could be nicely processed with:
pacify.py -i input.txt -cq
Doing so with 157.txt yields the attached. At first look, it seems to work rather nicely, smartening up all single quotes without interfering/being confused by apostrophes... though if and when you find it messed up somewhere in this file, nrapallo, do let me know. It almost certainly get it wrong if there was a word like
'tis that began with a single quote--though since there are not many such words, it's not unreasonable for my program to keep a list of those so it knows to treat them correctly.
- Ahi