SQL Server Row Level Versioning

Richard Hooper is a MCPD living and working in the UK

SQL Server Row Level Versioning

11 Jun 2012

I have had to do this in previous projects but have never had the opportunity to write it down so when I was asked to produce a document outlining the process I was proposing…I thought I might as well put it up here too!

The row versioning that I am talking about is not to be confused with “Row Versioning-Based Transaction Isolation” in SQL Server, if that is what you are looking for then head over to here for a description of that.

What I am referring to is something that allows you to keep versions of individual rows in tables over time so you can review all the changes made. Basically this is very similar to a source control system like SubVersion or TFS but for the data in the database tables. This is a simple solution and by no means a complete version control system so if you have any suggestions on improvements, I would love to hear them!

The Problem Domain

The scenario is that I am writing a basic CRM system that need to be able to import contact information from another source and be able to report improvements made over time against the original list. For example, imagine I imported a list of contacts that contained 20 entries and of that list 10 had email addresses where the other 10 didn’t. During the process of dealing with the contacts I manage to get the other 10 email addresses, I need to know how many where improved and specifically, which ones. To achieve this I decided that we should implement a type of row versioning which I know sounds like a bit of overkill but there are other things that we need to be able to report from the data too. We also need to be able to specify a date and time and be able to reconstruct the database tables that are versioned as they were at a given date and time.

The Implementation

The implementation that I am going to show you works on a table by table basis so you need to add it to each table individually but to be fair, its a pretty simple solution so it doesn’t take much to do.

I have included the SQL for you so you can try it out yourself. You should find it attached to this post as a sql file.

Firstly, let us use a very simple example of a Contact table that looks something like the following table:

If I want to be able to reproduce the data in the table as it was at a specified time, I need to record the values in all of the columns after every change. One way that I could do this is to have a version number column in the Contact table and add a new row to the same table with the updated values incrementing the version number as it goes. This would give me everything in one table which is less to maintain once the table, triggers and the queries for the data are complete. However in our system we need to read back the latest data very quickly and at the same time have a simple way for the user to query the data on a given date. To achieve this, I opted for creating a separate table that contains the history for the rows so that anyone trying to access the latest data (which is most of our users) they just go straight to the Contact table and access the data as they expected. This also means that the main Contact table does not grow faster than the original dataset.

The following is the structure of my history table:

As you can see, all of the columns from the Contact table have been copied across to the History_Contact table with the addition of the new separate primary key. The primary key on this history table has to be separate from the primary key copied across from the main Contact table, [ContactId], so that we can maintain the primary key value across the versions of the original row.

Now I need to automate the process so that the versions are added to the database without having to extend any of the code used to persist the data which in our case is the Entity Framework. To achieve this I have opted for the use of a table trigger as it is fast and means that if we decided to attach another system to the database, we don’t have to think about ensuring the versioning system is in place. The trigger is simple and the SQL for it is shown below:

CREATE TRIGGER [dbo].[ContactUpdated]
   ON  [dbo].[Contact]
   AFTER UPDATE
AS
BEGIN
  SET NOCOUNT ON;

  INSERT INTO [dbo].[History_Contact]
         ([ContactId]
         ,[Firstname]
         ,[Surname]
         ,[TelephoneNumber]
         ,[MobilePhoneNumber]
         ,[DateCreated]
         ,[DateModified])
   SELECT
          [ContactId]
         ,[Firstname]
         ,[Surname]
         ,[TelephoneNumber]
         ,[MobilePhoneNumber]
         ,[DateCreated]
         ,[DateModified]
   FROM deleted
END

Just a quick run through that for clarity…this trigger is running on the Contact table and happens every time an UPDATE statement is executed against that table but most importantly this happens AFTER the udpate has completed. The reason this is important is because we are going to use one of the temporary tables created by SQL Server in triggers called deleted. This table, in the case of an UPDATE statement, will contain the values that were in the rows that were updated before the UPDATE statement was completed. Keep in mind that this is a table so it can contain multiple rows of data, not just a single row so don’t go writing a trigger that always expects one row! The trigger simply takes the contents of the deleted table and inserts it into the History_Contact table thus keeping a record of a version of the row.

I told you it was simple….

So lets see what happens when I add some data into this table. Lets start by inserting a new row of data into our Contact table, here is my sample row.

Now if you were to look in the History_Contact table, you should find that there are no rows. This is expected as the row in the Contact table represents the latest version of the row at all times and seeing as this is currently the only version, there is no need for a version in the History_Contact table.

So lets try updating our row by changing the Firstname from “J” to “Joe” and by adding the telephone number “01202 123456”. Here is the updated row…

Now that I have executed an update statement against my Contact table, I would have expected the trigger to have been executed and saved the old version of the data row into the History_Contact table. Lets have a look at the History_Contact table…

Now if you compare this version to the originally inserted row you can see that it is the same which means that between the Contact table and the History_Contact table, we can see all of the versions of this row.

Conclusion

As you can see from the above example, this is a very simple and very lightweight method of implementing a version control system for your SQL database data. I am sure that there are a lot of improvements that could be made to this to create a system that works in all scenarios but this works for what we need.

Category:

SQL Server

2 Comments

Tagged with:

Comments on: "SQL Server Row Level Versioning" (2)

Arif Majid said:

28 Jan 2013 at 6:32 pm

hello richard,

is the trigger shooting after every update query ??? if yes … then do i have to write the same trigger for a delete query aswell ??

thanks
Arif
- richardshooper said:
  
  30 Jan 2013 at 9:10 pm
  
  Hi Arif,
  
  Yes, the trigger fires after every update statement. I don’t have it firing after delete statements because nothing has changed in the data row…the data row has just gone!
  
  The key to it is, can you trace what happened to the row? If its not in the master table but exists in the history table, then I know it must have been deleted.
  
  Hope that helps! Rich.

Richard Hooper's Blog

Richard Hooper is a MCPD living and working in the UK