|
|
#1 |
|
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 580
Karma: 810184
Join Date: Sep 2010
Location: Norway
Device: prs-t1, tablet, Nook Simple, assorted kindles, iPad
|
diff & automerge multiple text files
To speed up proofreading texts from the internet archive, I often do a diff between two different scans/uploads of the same book; vimdiff is my tool of preference. However, some books have more than two uploads, and then it should be possible to do an auto-merge, based on how many files agree on the contents of a particular line.
Does anybody have an idea how to do this? |
|
|
|
|
|
#2 |
|
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
You might try diff3. It has some options that will help in the task.
Dale |
|
|
|
| Advert | |
|
|
|
|
#3 |
|
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 580
Karma: 810184
Join Date: Sep 2010
Location: Norway
Device: prs-t1, tablet, Nook Simple, assorted kindles, iPad
|
I had a look at diff3, but it doesn't seem to have the ability to select 'best out of three' automagically. I ended up making the following bash script. What it does is basically, given versions a.txt, b.txt, c.txt, ...:
Code:
cp a.txt ab.txt
diff -y a.txt b.txt|grep '|' |\
while read l
do
a=${l%% *}
b=${l##* }
na=$(cat ?.txt|grep -c "^$a\$" )
nb=$(cat ?.txt|grep -c "^$b\$" )
sd=$(( ${#b} - ${#a} ))
sd=${sd#-} # assume lines are not similar if the lengths differ by more than 3
if (( "$na" < "$nb" && $sd < 3 )); then
sed -i "/^$a\$/s/.*/# $b/" ab.txt
fi
done
|
|
|
|
|
|
#4 |
|
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
Interesting. Diff3 does try and pick the best but only based on the order the files are specified. It trys to prefer the newest change rather than voting. I guess it depends on what you think is most important.
Glad you got something you like and works for you. Thanks for posting. Others may find it useful. Dale |
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Can't seem to automerge .docx files | mshnryman | Library Management | 19 | 12-28-2011 08:06 AM |
| Converting multiple text files to xhtml? | Spotnik | Sigil | 19 | 04-12-2011 11:37 PM |
| Auto send diff feeds to multiple devices tutorial | mean_gene | Calibre | 0 | 12-27-2010 03:26 PM |
| Safari downloads MOBI & EPUB as text files | webfolk | Workshop | 3 | 11-14-2010 04:27 AM |
| Convert zip with multiple text files to MOBI | mindfire | Calibre | 1 | 03-27-2010 11:19 AM |