Problem definition
It must be possible to pull a record from a given table, as it was at any point in time.
SQL
--drop table History
--go
create table History (
table_name varchar(64),
column_name varchar(64),
record_id int,
new_value varchar(100),
update_date datetime
)
go
;with V as (
select c = 'first_name', v = 'Larry', d = '1/1/06' union
select c = 'last_name', v = 'Smith', d = '1/1/06' union
select c = 'job_title', v = 'Manager', d = '1/1/06' union
select c = 'job_title', v = 'Senior Manager', d = '1/1/07' union
select c = 'job_title', v = 'CEO', d = '1/1/08'
)
insert History (table_name, record_id, column_name, new_value, update_date)
select 'Person', 1, c, v, d
from V
union
select 'Person', 2, 'first_name', 'Victor', '1/1/06'
;with Ranked as (
select *,
rank = rank() over (partition by record_id, column_name order by update_date desc)
from History
where table_name = 'Person'
and update_date <= '1/1/07'
), Filtered as (
select column_name,
record_id,
new_value
from Ranked
where rank = 1
)
select record_id,
first_name,
last_name,
job_title
from Filtered
pivot (max(new_value) for column_name in (first_name, last_name, job_title)) pvt
Results
record_id first_name last_name job_title
--------- ---------- --------- --------------
1 Larry Smith Senior Manager
2 Victor NULL NULL
Caveats
The above is effectively
Entity-Attribute-Value (EAV), which is an extremely flexible way to use relational databases, but also a very inefficient one. The query above will
not scale well. The data returned are all the same data type as
History.new_value —
varchar(100) in this case. This does not bode well for, example, calculating sums over nontrivial sets of data. Moreover, the above model assumes all tables have the same type of [synthetic] primary key. (Although, this could be addressed by having a separate history for each table.)
Caching
There are many variations on the above that are really just the above, plus caching. For example: consider the use of
table_name in the example — it actually works without there being a
Person table in the database. However, it makes sense to have the latest version of data available in tables, for quick access. The latest version is simply the
select query above, with no date criterion. Consider such tables the "most recent snapshot".
If this "most recent snapshot" approach is taken, the
History table above could be thought of as a sort of audit log — it could be populated by triggers. (Deletion would need special treatment, unless all entries in the audit log are also deleted.)
Another way to do versioning is to never update records in tables — just create new ones and mark the old ones somehow. This has the benefit of faster data access and avoidance of EAV issues. However, this is not very space-efficient. A possible compromise between this approach and the EAV approach would be to create occasional snapshots, with the gaps filled by the EAV approach.