View Single Post
Old 01-03-2008, 10:34 AM   #1
alexxxm
Addict
alexxxm has a complete set of Star Wars action figures.alexxxm has a complete set of Star Wars action figures.alexxxm has a complete set of Star Wars action figures.alexxxm has a complete set of Star Wars action figures.
 
Posts: 223
Karma: 356
Join Date: Aug 2007
Device: Rocket; Hiebook; N700; Sony 505; Kindle DX ...
Collating broken lines

One of the worst problem I have with many .txt files I have, is putting together broken lines which have been artificially broken to stay under some maxlength.
This is the first fix I make to Project Gutemberg files, for example, before converting them to rtf under Word, then to lrf

I'm not sure if somebody posted some better way to do it, but I always feel fine with the following script (should work with any Perl version).
Suffice you save it to takeaway_breaklines.pl, and run it as:
takeaway_breaklines.pl infile.txt outfile.txt

hope it helps!

Alessandro

Code:
#!/usr/bin/perl -w

die "USAGE\n$0 filein fileout\n\n" if $#ARGV!=1;
open(A,"<$ARGV[0]");my @a=<A>; close(A);
open(B,">$ARGV[1]");

foreach $l(@a)
{
        $l=~/(.*)\n$/;
        if (not defined $1) {print"problems at line -$l-\n"}
        else
        {
                $l=$1;
                $l=~s/\r//g;  # if the file was in DOS mode

                if ($l!~/[\.:,;\"!\?\'\)-]$/)
                { print(B "$l ") }
                else{print(B "$l\n")}
        }
}

close(B);
alexxxm is offline   Reply With Quote