SQL Server 2008 and Testing – Sparse Columns

SQL Server 2008 was just released a couple of weeks ago. I’ve been looking at it, trying to determine the implications SQL Server 2008 has for testing. The first step in such an analysis is to understand exactly what the new features of SQL Server 2008 are. One of the interesting new features is a new "sparse columns". In frequent situations, you have a table which has a column where the majority of values in that column are NULL. For example, consider a table of employee information which contains columns lastName, firstName, middleInitial, and suffix. The suffix column holds information like "Jr." (junior) or "III" (the third). In almost all cases, an employee’s name does not contain a suffix. SQL Server 2008 allows you to identify a column with a sparse keyword, which in theory, will save disk space because null values will not be stored. Consider this demo script:
— Sparse columns demo
use master
if exists(select name from sys.sysdatabases where name=’dbSparseDemo’)
 drop database dbSparseDemo
create database dbSparseDemo
use dbSparseDemo
— "consider using a sparse column on very large tables where the
— column is NULL at least 2/3 of the time
— cannot apply to types geometry, geography, text, ntext, timestamp, image,
— or user-defined types"
create table tblEmployees1 — "regular" table
emp_id char(3) primary key,
emp_lastName varchar(35) not null,
emp_firstName varchar(20) null,
emp_suffix varchar(5) null — like ‘Jr.’ or ‘III’; usually null
create table tblEmployees2 — using a sparse column
emp_id char(3) primary key,
emp_lastName varchar(35) not null,
emp_firstName varchar(20) null,
emp_suffix varchar(5) sparse null — like ‘Jr.’ or ‘III’; usually null
declare @i int
set @i = 1
while @i <= 1000 — 1,000 * 10 = 10,000 rows per table
 insert into tblEmployees1 values(‘001′,’Anderson’,’Adam’,’Jr.’)
 insert into tblEmployees1 values(‘002′,’Baker’,’Betty’,null)
 insert into tblEmployees1 values(‘003′,’Collins’,’Chris’,null)
 insert into tblEmployees1 values(‘004′,’Denevan’,’Doug’,null)
 insert into tblEmployees1 values(‘005′,’Eagen’,’Edward’,null)
 insert into tblEmployees1 values(‘006′,’Flynn’,’Fred’,null)
 insert into tblEmployees1 values(‘007′,’Graham’,’Greg’,null)
 insert into tblEmployees1 values(‘008′,’Humphrey’,’Harold’,null)
 insert into tblEmployees1 values(‘009′,’Issacson’,’Ian’,null)
 insert into tblEmployees1 values(‘010′,’Johnson’,’James’,null)
 insert into tblEmployees2 values(‘001′,’Anderson’,’Adam’,’Jr.’)
 insert into tblEmployees2 values(‘002′,’Baker’,’Betty’,null)
 insert into tblEmployees2 values(‘003′,’Collins’,’Chris’,null)
 insert into tblEmployees2 values(‘004′,’Denevan’,’Doug’,null)
 insert into tblEmployees2 values(‘005′,’Eagen’,’Edward’,null)
 insert into tblEmployees2 values(‘006′,’Flynn’,’Fred’,null)
 insert into tblEmployees2 values(‘007′,’Graham’,’Greg’,null)
 insert into tblEmployees2 values(‘008′,’Humphrey’,’Harold’,null)
 insert into tblEmployees2 values(‘009′,’Issacson’,’Ian’,null)
 insert into tblEmployees2 values(‘010′,’Johnson’,’James’,null)
 set @i = @i + 1
exec sp_spaceused tblEmployees1
exec sp_spaceused tblEmployees2
— end script
In theory at least, the tblEmployees2 table which has a sparse column should be significantly smaller than the tblEmployees1 table. But in fact tblEmployees2 is larger than tblEmployees1, for reasons which are not at all clear to me. My point is, if you use the new sparse column feature in SQL Server 2008, you need to test to verify that you are in fact saving disk space.
This entry was posted in Software Test Automation. Bookmark the permalink.

One Response to SQL Server 2008 and Testing – Sparse Columns

  1. Kalen says:

    Hi James
    Varchar columns are already stored in a very space efficient format, and if a varchar is NULL, it needs no space.
    In addition, sparse columns have their own overhead. Nothing comes for free.
    So this is not a fair test. A varchar(5) is going to be very similar in size whether it is SPARSE or not, the the overhead for keeping track of the sparse column may outweigh the benefit, as you\’ve seen. 
    Try a test where the column you are making sparse is a datetime (fixed 8 bytes), or better still, a char(200).

Comments are closed.