In the same spirit as my previous post
https://www.mobileread.com/forums/sho...479#post134479 ,
i.e. to write small utilities to make very definite actions, here is the code I use to make sure my books do not contain bad characters (bad=non-printable)
Code:
#!/usr/bin/perl -w
if($ARGV[0] eq "-l"){$list=1;$fin=$ARGV[1];$fout=$ARGV[2]}
else{$list=0;$fin=$ARGV[0];$fout=$ARGV[1]}
open(A,"<$fin");my @a=<A>; close(A);
if($list==1)
{
my %ext;
my $i=1;
foreach $l(@a)
{
while ($l=~/([^\x20-\x7e\n\r])/g)
{
$code=ord($1); $hcode = sprintf "%lx", $code;
$ext{$hcode}++;
}
$i++;
}
print"\n\nNon-printable characters, and their number of occurrences\n","-"x70,"\n";
foreach $k (sort (keys %ext))
{print "0x$k\t$ext{$k}\n"}
}
else
{
open(B,">$fout");
foreach $l(@a)
{
$l=~s/\x97/-/g;
$l=~s/\x91/'/g;
$l=~s/\x92/'/g;
$l=~s/\x93/"/g;
$l=~s/\x94/"/g;
print(B "$l");
}
close(B);
}
save it to some name (e.g. correct_nonascii.pl) and run it as:
correct_nonascii.pl [-l] filenamein filenameout
when run with the -l switch it will list how many occurrences for each non-printable char you have.
When run without it, it runs according to the substitution table, which you can extend at will.
According to the example line:
$l=~s/\x97/-/g;
you substitute the char having hex code 0x97 (a long "-" sign, happens often) with the usual "-" char.
Use the -l switch at first, to scan for problems, then check on a good ASCII table.
Alessandro