What does the SCD transform in SSIS do, and should you use it?
Part four of a six-part series of blogs

Integration Services include a Slowly Changing Dimension transform, which is designed to make loading dimension data easier. This blog explains how it works, and discusses whether it achieves its objectives.

  1. Using the Slowly Changing Dimension Transform in SSIS
  2. Creating an SCD transform - type 0 (fixed attributes)
  3. Creating an SCD transform - type 1 (changing attributes)
  4. Creating an SCD transform - type 2 (historical attributes) (this blog)
  5. Inferred Members in Slowly Changing Dimensions
  6. The disadvantages of SCD transforms

Posted by Andy Brown on 05 July 2014

You need a minimum screen resolution of about 700 pixels width to see our blogs. This is because they contain diagrams and tables which would not be viewable easily on a mobile phone or small laptop. Please use a larger tablet, notebook or desktop computer, or change your screen resolution settings.

Creating an SCD transform - type 2 (historical attributes)

To me, this is the most useful type of SCD: one where you capture the history of changes made to the grade.  Here's how it works.

Changes needed to our dimension table

The first part of this blog got you to set up the data we needed.  You now need to add 3 additional columns to the dimension table to allow us to capture historical data.  To do this, run this script:

-- get rid of old copy of dimension table

BEGIN TRY

DROP TABLE DimPerson

END TRY

BEGIN CATCH

END CATCH

-- recreate the dimension table, with 3 new columns

CREATE TABLE dbo.DimPerson(

SurrogateKeyId int IDENTITY(1,1) NOT NULL,

PersonId int NOT NULL,

FirstName nvarchar(255) NULL,

Surname nvarchar(255) NULL,

Grade varchar(50) NULL,

GradeStatus varchar(50) NULL,

FromDate date NULL,

ToDate date NULL

)

-- add in 3 people

SET IDENTITY_INSERT dbo.DimPerson ON

INSERT dbo.DimPerson (

SurrogateKeyId,

PersonId,

FirstName,

Surname,

Grade,

GradeStatus,

FromDate,

ToDate

) VALUES (

1, 1, N'Rita', N'Smith', N'Office junior',

'Current','01/01/2000',Null)

INSERT dbo.DimPerson (

SurrogateKeyId,

PersonId,

FirstName,

Surname,

Grade,

GradeStatus,

FromDate,

ToDate

) VALUES (

2, 2, N'Bob', N'Brown', N'Office junior',

'Current','01/01/2000',Null

)

INSERT dbo.DimPerson (

SurrogateKeyId,

PersonId,

FirstName,

Surname,

Grade,

GradeStatus,

FromDate,

ToDate

) VALUES (

3, 3, N'Sue', N'Jones', N'Office junior',

'Current','01/01/2000',Null

)

SET IDENTITY_INSERT dbo.DimPerson OFF

-- show what we've created

SELECT * FROM DimPerson

Running this should give a squeaky clean new table:

The recreated dimension table

We can track changes either using the selected GradeStatus column, or the FromDate and ToDate columns.

It's vital that existing dimension table records contain sensible values for the historical status columns, otherwise SSIS has nothing to pick up on when adding new rows.

Creating the SCD type 2 transform

Those who have read this far will know what comes next!

  1. Create a new package.
  2. Create a new data flow task within it.
  3. Add an OLEDB source to take data from the vwPromotions view.
  4. Add a Slowly Changing Dimension transform.

What you should now be looking at is something like this:

Start of package

The start of the package - time to configure our SCD transform!

 

Double-click on the SCD transform and configure it as described below.  After missing out the first stage (if it even appears), you can specify which columns contain changes to the dimension table that you want to track:

Specifying columns

We'll track changes to the Grade column only, and use the PersonId to match rows between the transactional database source table and dimension table target.

 

If you're not sure what to do above, I've described it in more details in the previous parts of this blog.

In the next stage of the wizard, say that you want to keep a historical record of changes:

Dimension columns

We want to track changes to the Grade column, and keep a history of previous ones.

 

You can now configure how you want to track historical data, either by using a single column or start and end dates (described under separate headings below).

Using a single column to track changes

If you just want to track whether a row is current or has expired, choose these options:

Historical attribute options - single column

We'll use the GradeStatus column to track whether a row is current or not.

Note that although the Expiration value dropdown only gives two values, you can type in your own (here we've typed in No longer relevant).  You can now go through the rest of the wizard, choosing not to use inferred members (these are described in the next part of this blog), to give:

The final package

The resulting package.

When you run this, this is what the DimPerson dimension table should look like: 

The DimPerson table

The GradeStatus has been updated for old records, but hasn't been correctly set for interim new ones.

The problem shown above - that interim rows show up as Current - shouldn't be a problem in a real-world example, since they're called "Slowly Changing Dimensions" for a reason, and if you run regular updates you'll never by making two changes in the same package.

Using a start and end date to track changes

This option seems better, simply because it captures more data (often it's not enough to know whether a row is obsolete or not - you need to know when the change occurred).  Here's how to track changes using a start/end date.

First (optionally) create a variable:

Variable to hold date

We'll assume updates are occurring on 1st January 2015.

Now configure the historical attribute as follows:

Configuring start and end dates

By this point it's normally fairly obvious what to do. Rather than assuming changes occurred today, we'll use 1st January 2015 instead.

Step through the rest of the wizard (choosing not to implement inferred member support, which is covered in the next part of this blog) to get:

Our final package - start/end dates

The final package looks the same as for a single column.

Run this package to get:

The final list of people

The start dates and end dates have been set correctly.

Note that as for the single-column example above we have a problem if more than one update is made to the same row (Rita Smith's promotion to Associate incorrectly doesn't have an end date).  In a real-life scenario this is unlikely to be a problem, as updates will be scheduled overnight and changes will happen only occasionally.

If you're going to use SCDs (and the final part of this blog considers reasons why you might not) a type 2 SCD tracking historical changes with start and end dates seems the best way to do it.

Before we look at some disadvantages of Slowly Changing Dimension transforms, let's take a brief digression to look at inferred members. 

This blog has 0 threads Add post