MobileRead Forums - View Single Post

SBT · 04-25-2012, 05:06 AM

I had a look at diff3, but it doesn't seem to have the ability to select 'best out of three' automagically. I ended up making the following bash script. What it does is basically, given versions a.txt, b.txt, c.txt, ...:

find lines that differ between a and b
do a poll of the two line versions among all file versions
select the one with most hits.

This can be iterated, repeating the process with c and d, and so on, then diffing the refined versions and so on.

Code:

cp a.txt ab.txt
diff -y a.txt b.txt|grep '|' |\
while read l
do
a=${l%%	*}
b=${l##*	}
na=$(cat ?.txt|grep -c "^$a\$" )
nb=$(cat ?.txt|grep -c "^$b\$" )
sd=$(( ${#b} - ${#a} ))
sd=${sd#-} # assume lines are not similar if the lengths differ by more than 3
if (( "$na" < "$nb"  &&  $sd < 3 )); then 
sed -i "/^$a\$/s/.*/# $b/" ab.txt
fi
done

04-25-2012, 05:06 AM	#3
SBT Fanatic Posts: 580 Karma: 810184 Join Date: Sep 2010 Location: Norway Device: prs-t1, tablet, Nook Simple, assorted kindles, iPad	I had a look at diff3, but it doesn't seem to have the ability to select 'best out of three' automagically. I ended up making the following bash script. What it does is basically, given versions a.txt, b.txt, c.txt, ...: find lines that differ between a and b do a poll of the two line versions among all file versions select the one with most hits. This can be iterated, repeating the process with c and d, and so on, then diffing the refined versions and so on. Code: cp a.txt ab.txt diff -y a.txt b.txt\|grep '\|' \|\ while read l do a=${l%% } b=${l## } na=$(cat ?.txt\|grep -c "^$a\$" ) nb=$(cat ?.txt\|grep -c "^$b\$" ) sd=$(( ${#b} - ${#a} )) sd=${sd#-} # assume lines are not similar if the lengths differ by more than 3 if (( "$na" < "$nb" && $sd < 3 )); then sed -i "/^$a\$/s/.*/# $b/" ab.txt fi done