04-24-2012, 08:01 AM | #1 |
Fanatic
Posts: 580
Karma: 810184
Join Date: Sep 2010
Location: Norway
Device: prs-t1, tablet, Nook Simple, assorted kindles, iPad
|
diff & automerge multiple text files
To speed up proofreading texts from the internet archive, I often do a diff between two different scans/uploads of the same book; vimdiff is my tool of preference. However, some books have more than two uploads, and then it should be possible to do an auto-merge, based on how many files agree on the contents of a particular line.
Does anybody have an idea how to do this? |
04-24-2012, 05:12 PM | #2 |
Grand Sorcerer
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
You might try diff3. It has some options that will help in the task.
Dale |
Advert | |
|
04-25-2012, 04:06 AM | #3 |
Fanatic
Posts: 580
Karma: 810184
Join Date: Sep 2010
Location: Norway
Device: prs-t1, tablet, Nook Simple, assorted kindles, iPad
|
I had a look at diff3, but it doesn't seem to have the ability to select 'best out of three' automagically. I ended up making the following bash script. What it does is basically, given versions a.txt, b.txt, c.txt, ...:
Code:
cp a.txt ab.txt diff -y a.txt b.txt|grep '|' |\ while read l do a=${l%% *} b=${l##* } na=$(cat ?.txt|grep -c "^$a\$" ) nb=$(cat ?.txt|grep -c "^$b\$" ) sd=$(( ${#b} - ${#a} )) sd=${sd#-} # assume lines are not similar if the lengths differ by more than 3 if (( "$na" < "$nb" && $sd < 3 )); then sed -i "/^$a\$/s/.*/# $b/" ab.txt fi done |
04-25-2012, 10:47 AM | #4 |
Grand Sorcerer
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
Interesting. Diff3 does try and pick the best but only based on the order the files are specified. It trys to prefer the newest change rather than voting. I guess it depends on what you think is most important.
Glad you got something you like and works for you. Thanks for posting. Others may find it useful. Dale |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Can't seem to automerge .docx files | mshnryman | Library Management | 19 | 12-28-2011 07:06 AM |
Converting multiple text files to xhtml? | Spotnik | Sigil | 19 | 04-12-2011 10:37 PM |
Auto send diff feeds to multiple devices tutorial | mean_gene | Calibre | 0 | 12-27-2010 02:26 PM |
Safari downloads MOBI & EPUB as text files | webfolk | Workshop | 3 | 11-14-2010 03:27 AM |
Convert zip with multiple text files to MOBI | mindfire | Calibre | 1 | 03-27-2010 10:19 AM |