Monday, 13 July 2009

Painting a Consistent Picture from Moving Data

Applies to: SQL Server 2005 and SQL Server 2008.

Introducing the Two types of Row Versioning

Row versioning was introduced in SQL Server 2005 which provides a way to implement a non-locking, non-blocking solution to getting a consistent picture of a moving database. This is useful to many applications which require an entirely consistent read of data such as an ETL or a reporting application.

Firstly to make this work, data needs to be committed in consistent blocks. So for example if you have an Order table and an OrderItem table, Order has a column called OrderItemCount – if the OrderItemCount is updated in one transaction and the OrderItem row is added in another transaction this is bad. You need to at least ensure that logical and consistent things are written to the database atomically within transactions.

There are two types of consistent picture which can be drawn using Row Versioning under SQL 2005 and above. Type 1 gives a consistent picture within a single query statement; Type 2 can give a consistent picture across multiple query statements.

To explain the difference between Type 1 and Type 2 a little more, look at this query:

SELECT OrderItemCount FROM Order WHERE OrderID = 100
SELECT COUNT (*) FROM OrderItem WHERE OrderID = 100

The above query is actually two statements which execute separately, but they should both always return the same value. Type 1 only guarantees a single statement is consistent. So it's possible for the data to change between statements, and you could be looking at a newer version of the data by the time the second statement runs (because an edit has been made to Order/OrderItem).

Type 2 makes sure that you are working off the same version of the data across all your statements. So even if an edit is committed to the database between statements, the statements will still be working off the old, but the same and consistent data.

Type 1 Row Versioning is called Read Committed Snapshot (READ_COMMITTED_SNAPSHOT) – This offers single statement level consistency
Type 2 Row Versioning is called Snapshot Isolation (SNAPSHOT_ISOLATION) – This offers multi-statement consistency

Implementation

Type 1 Row Versioning is very easy to implement. Simply run this statement once and no further implementation is required:

ALTER DATABASE <database> SET READ_COMMITTED_SNAPSHOT ON WITH <termination>

Type 2 Row Versioning needs a little more work. Firstly the following statement needs to be run once:

ALTER DATABASE <database> SET ALLOW_SNAPSHOT_ISOLATION ON;
It can take a while for the setting to come into effect, to know when it has taken effect check the snapshot_isolation_state column in sys.databases – when the value is '1' then the setting has taken effect.

Next, when querying the database and multi-statement (Type 2 row versioning) is required the following statement needs to be run on the connection:

SET TRANSACTION ISOLATION LEVEL SNAPSHOT

Then, SQL Server needs to know which groups of statements need to all read a consistent version. This is done using a transaction (yes, this is correct, a transaction used when reading from the database). So our previous query would become:

SET TRANSACTION ISOLATION LEVEL SNAPSHOT
BEGIN TRAN -- Everything in this transaction will read the same version of the data.
SELECT OrderItemCount FROM Order WHERE OrderID = 100
SELECT COUNT (*) FROM OrderItem WHERE OrderID = 100
COMMIT TRAN

Finally

Row versioning can be used to resolve performance problems relating to locking and blocking, because data can be queried consistently (albeit perhaps slightly old data) whilst still allowing writes to the database. Having said this, a slight performance degrade could be experienced when querying a database which does not already suffer much from locking/blocking contention.

The old versioned rows are stored in tempdb, so this database will potentially grow in size. Also it should be placed on a disk partition with good performance.

See also: Micrsosoft White Paper

0 comments: