One of the worst problem I have with many .txt files I have, is putting together broken lines which have been artificially broken to stay under some maxlength.
This is the first fix I make to Project Gutemberg files, for example, before converting them to rtf under Word, then to lrf
I'm not sure if somebody posted some better way to do it, but I always feel fine with the following script (should work with any Perl version).
Suffice you save it to takeaway_breaklines.pl, and run it as:
takeaway_breaklines.pl infile.txt outfile.txt
hope it helps!
Alessandro
Code:
#!/usr/bin/perl -w
die "USAGE\n$0 filein fileout\n\n" if $#ARGV!=1;
open(A,"<$ARGV[0]");my @a=<A>; close(A);
open(B,">$ARGV[1]");
foreach $l(@a)
{
$l=~/(.*)\n$/;
if (not defined $1) {print"problems at line -$l-\n"}
else
{
$l=$1;
$l=~s/\r//g; # if the file was in DOS mode
if ($l!~/[\.:,;\"!\?\'\)-]$/)
{ print(B "$l ") }
else{print(B "$l\n")}
}
}
close(B);