Quote:
Originally Posted by Hitch
Yes, you could stuff Word into Git, but then getting it back out and getting the XML back into a viable Word document, not so much. Honestly, to be boring, using plain old Track Changes would be better.
|
Assuming you could get the XML into Git, reconstructing the Word document at any given point isn't a problem. You are simply applying changes in the XML to the original commit, and when you reach the desired point, stuff the result into the Zip file that the DOCX document is under the hood. The trick is doing that in an automated manner in both directions. (I have no idea how one might script it.)
And Track Changes tends to fall down as the number of parties working on the document increases. If it's back and forth between author and editor it's one thing. Beyond that it becomes quite another. Git stores ID of the committer, a date and time stamp, and a comment explaining the commit to each update.
An old friend is noted open source advocate Eric S. Raymond. He wrote an open source tool called Reposurgeon, intended to automate as much as possible lifting code out of one repository and putting it into another. His goal was the eliminate the CVS form of DVCS most folks had used in his lifetime, and migrate code in CVS to something else, with Git as the default. Getting the
code migrated wasn't that hard. Properly migrating the commit
comments and tying them to the commits they commented on was another thing entirely. He was mostly successful. Some manual work would be involved in fixing the stuff Reposurgeon couldn't handle, but that was expected going in.
And he participated in migration of some repositories in use for 20 or more years with gigabytes of content. He had a machine he called the Great Beast of Malvern built to do the conversion, with 64GB of ECC RAM (later upgraded to 128GB) to do the work, so the conversion could all happen in RAM and you didn't grow old and grey waiting for it to finish.
There used to be an outfit called Component Software offering a version of the older RCS VCS software which they claimed could properly version MS Word and Excel files back when MS was still using binary file formats. (RCS was not a distributed VCS, and this assumed you were storing to a local repository on your machine.) I played with it a bit, mostly to see just what they had done to make is possible.
______
Dennis