tsunami

log in
history

tsunami suggestions - document change history

Ned Imming
2009-01-05 05:49 UTC

Suggestion
Implement a method of maintaining document histories, either through individual changes or more simply as a submission history.
Solutions
Simple submission history
This is by far the simpler method to implement. The idea here is to maintain older versions of the document as they are submitted. A hard cap on the number of retained version can be used to limit the total space a given document will occupy. Some form of this first solution will be needed to supply documents to compare in the second method 'Individual change tracking'.
Individual change tracking
This method is more complicated. The basic idea is to compare the edited document to the previous document noting the changes sections in a meaningful way.

This is usually accomplished using a diff algorithm. Essentially you look for the longest common sub sequence and them move left and right looking for more common sub sequences until none are found, the remainder is the difference.

Rather than implement your own, you may consider interfacing with existing diff solutions and just interpreting the results of those. A rather complete method is available for comparing HTML using a program called DaisyDiff.

Or more simply you could use an existing command line version of diff that is common for most *nix servers and is also available through GNUWin32 for Windows.

There is also an article here that talks a bit about the problems with markup diffs. There are also some user scripts in java for doing diffs on wiki pages found here
Resources & Examples
http://en.wikipedia.org/wiki/User:Cacycle/wikEdDiff
http://www.jnolen.com/blog/2005/02/wiki_diff.html
http://gnuwin32.sourceforge.net/packages/diffutils.htm
http://code.google.com/p/daisydiff/
http://en.wikipedia.org/wiki/Longest_common_subsequence_problem
http://en.wikipedia.org/wiki/Diff
http://www.codeproject.com/KB/recipes/diffengine.aspx
Original Conversation
Document history, or some kind of version control. It wouldn't have to be too crazy, it could just one or two submits or something like that.

Of course it could get out of hand with the ability to see changes by date, going back to the first submission. I suppose you would have to run a diff check with every edit submission and then keep some kind of track of it on the documents themselves. Still it could be handy to be able to look at a document see when changes were made.

Care to find any diff algorithms? :-p I'm guessing that I would need to derive patches that "reverse-patch" an article to get to a prior version. This is something I need to work on fairly soon. A nice overview would be nice. :-)