What does the SCD transform in SSIS do, and should you use it?
Part six of a six-part series of blogs

Integration Services include a Slowly Changing Dimension transform, which is designed to make loading dimension data easier. This blog explains how it works, and discusses whether it achieves its objectives.

  1. Using the Slowly Changing Dimension Transform in SSIS
  2. Creating an SCD transform - type 0 (fixed attributes)
  3. Creating an SCD transform - type 1 (changing attributes)
  4. Creating an SCD transform - type 2 (historical attributes)
  5. Inferred Members in Slowly Changing Dimensions
  6. The disadvantages of SCD transforms (this blog)

Posted by Andy Brown on 05 July 2014

You need a minimum screen resolution of about 700 pixels width to see our blogs. This is because they contain diagrams and tables which would not be viewable easily on a mobile phone or small laptop. Please use a larger tablet, notebook or desktop computer, or change your screen resolution settings.

The disadvantages of SCD transforms

I'm not a fan of these transforms, but you may disagree.  However, this being my blog allow me to put my side of the case first ...

Objection 1: lack of transparency

I love SSIS.  I think it's excellent software, and what it does best of all is make complicated processes transparent.  You could give a typical SSIS package like this to someone non-technical, and they'd be able to pick up what it's doing:

A typical SSIS package

Documented packages like this are much easier to read than SQL.

Try giving the equivalent SQL statements to a non-technical person, and they won't have a clue what it's doing.

However, SCD transforms do the opposite to the rest of SSIS: they make a transparent process opaque:

SSIS SCD transform

Unlike nearly every other SSIS transform, you can't edit an SCD transform directly to see what it does.

I don't mind using wizards to save me time; I don't, however, like wizards which get me from A to B but don't reveal the path taken.

The other way in which SCD transforms add opacity to what's going on is by using jargon to complicate what is actually a simple underlying process, but this is more the fault of Analysis Services than SSIS.

Objection 2: speed

Here are some of the reasons that an SCD transform isn't going to be the quickest way to load dimension table data:

Reason More details
Row by row lookups For every row in the input table, a new lookup is done against the dimension table by the business key.  This means that the dimension table is not cached in memory - this is not a good idea!
Row by row updates Whenever a change is found (for changing attributes), another update statement is run against the dimension table.  Set-based update statements in SQL will run much more quickly.
Slow inserts When you're using historical attributes, the insert statements generated add rows one at a time, to avoid locks between inserts and updates.  Again, this will not run quickly.

You can overcome some of these problems by indexing the business key in the dimension table and by changing the insert and update tasks generated by the SCD transform wizard, but SQL is always going to load slowly changing dimension data more quickly than an SCD transform.

One sign that the SCD transform isn't the greatest thing since sliced bread: the fact that many websites recommend either third party add-ons or alternative ways to accomplish the same results.

My recommendation

If you know SQL, I think you'd be much better of writing a stored procedure to upload your SCD data, and then call this using an Execute SQL task within a package.  For our historical attribute example, this could be something like this:

-- set the date we should record things changed on

DECLARE @ChangeDate date

SET @ChangeDate = GetDate()

-- update the existing people to say that

-- their grade changed on this given date

UPDATE

DimPerson

SET

ToDate = @ChangeDate

WHERE

(

SELECT COUNT(*)

FROM vwPromotions AS v

WHERE v.PersonId = DimPerson.PersonId

) > 0

-- add the new grades in

INSERT INTODimPerson(

PersonId,

FirstName,

Surname,

Grade,

FromDate,

ToDate

)

SELECT

v.PersonId,

v.FirstName,

v.Surname,

v.GradeName,

Null,

@ChangeDate

FROM

vwPromotions AS v

INNER JOIN DimPerson AS dp

ON v.PersonId = dp.PersonId

-- add all of the new people from the fact

-- "table" into the dimension table

INSERT INTODimPerson(

PersonId,

FirstName,

Surname,

Grade,

FromDate,

ToDate

)

SELECT

v.PersonId,

v.FirstName,

v.Surname,

v.GradeName,

Null,

v.PromotionDate

FROM

vwPromotions AS v

LEFT OUTER JOIN DimPerson AS dp

ON v.PersonId = dp.PersonId

WHERE

dp.PersonId is null

Note that I haven't tested this, nor have I attempted to explain it; I've included it just to show that I don't think an SCD transform does anything too scary.

Another good way to proceed would be to forget the SCD transform in SSIS, and instead do things the "hard" way by using a combination of lookup and other transforms.

This blog has 0 threads Add post