Read our blogs, tips and tutorials
Try our exercises or test your skills
Watch our tutorial videos or shorts
Take a self-paced course
Read our recent newsletters
License our courseware
Book expert consultancy
Buy our publications
Get help in using our site
560 attributed reviews in the last 3 years
Refreshingly small course sizes
Outstandingly good courseware
Whizzy online classrooms
Wise Owl trainers only (no freelancers)
Almost no cancellations
We have genuine integrity
We invoice after training
Review 30+ years of Wise Owl
View our top 100 clients
Search our website
We also send out useful tips in a monthly email newsletter ...
What does the SCD transform in SSIS do, and should you use it? Part six of a six-part series of blogs |
---|
Integration Services include a Slowly Changing Dimension transform, which is designed to make loading dimension data easier. This blog explains how it works, and discusses whether it achieves its objectives.
|
I'm not a fan of these transforms, but you may disagree. However, this being my blog allow me to put my side of the case first ...
I love SSIS. I think it's excellent software, and what it does best of all is make complicated processes transparent. You could give a typical SSIS package like this to someone non-technical, and they'd be able to pick up what it's doing:
Documented packages like this are much easier to read than SQL.
Try giving the equivalent SQL statements to a non-technical person, and they won't have a clue what it's doing.
However, SCD transforms do the opposite to the rest of SSIS: they make a transparent process opaque:
Unlike nearly every other SSIS transform, you can't edit an SCD transform directly to see what it does.
I don't mind using wizards to save me time; I don't, however, like wizards which get me from A to B but don't reveal the path taken.
The other way in which SCD transforms add opacity to what's going on is by using jargon to complicate what is actually a simple underlying process, but this is more the fault of Analysis Services than SSIS.
Here are some of the reasons that an SCD transform isn't going to be the quickest way to load dimension table data:
Reason | More details |
---|---|
Row by row lookups | For every row in the input table, a new lookup is done against the dimension table by the business key. This means that the dimension table is not cached in memory - this is not a good idea! |
Row by row updates | Whenever a change is found (for changing attributes), another update statement is run against the dimension table. Set-based update statements in SQL will run much more quickly. |
Slow inserts | When you're using historical attributes, the insert statements generated add rows one at a time, to avoid locks between inserts and updates. Again, this will not run quickly. |
You can overcome some of these problems by indexing the business key in the dimension table and by changing the insert and update tasks generated by the SCD transform wizard, but SQL is always going to load slowly changing dimension data more quickly than an SCD transform.
One sign that the SCD transform isn't the greatest thing since sliced bread: the fact that many websites recommend either third party add-ons or alternative ways to accomplish the same results.
If you know SQL, I think you'd be much better of writing a stored procedure to upload your SCD data, and then call this using an Execute SQL task within a package. For our historical attribute example, this could be something like this:
-- set the date we should record things changed on
DECLARE @ChangeDate date
SET @ChangeDate = GetDate()
-- update the existing people to say that
-- their grade changed on this given date
UPDATE
DimPerson
SET
ToDate = @ChangeDate
WHERE
(
SELECT COUNT(*)
FROM vwPromotions AS v
WHERE v.PersonId = DimPerson.PersonId
) > 0
-- add the new grades in
INSERT INTODimPerson(
PersonId,
FirstName,
Surname,
Grade,
FromDate,
ToDate
)
SELECT
v.PersonId,
v.FirstName,
v.Surname,
v.GradeName,
Null,
@ChangeDate
FROM
vwPromotions AS v
INNER JOIN DimPerson AS dp
ON v.PersonId = dp.PersonId
-- add all of the new people from the fact
-- "table" into the dimension table
INSERT INTODimPerson(
PersonId,
FirstName,
Surname,
Grade,
FromDate,
ToDate
)
SELECT
v.PersonId,
v.FirstName,
v.Surname,
v.GradeName,
Null,
v.PromotionDate
FROM
vwPromotions AS v
LEFT OUTER JOIN DimPerson AS dp
ON v.PersonId = dp.PersonId
WHERE
dp.PersonId is null
Note that I haven't tested this, nor have I attempted to explain it; I've included it just to show that I don't think an SCD transform does anything too scary.
Another good way to proceed would be to forget the SCD transform in SSIS, and instead do things the "hard" way by using a combination of lookup and other transforms.
Parts of this blog |
---|
|
Some other pages relevant to the above blogs include:
Kingsmoor House
Railway Street
GLOSSOP
SK13 2AA
Landmark Offices
99 Bishopsgate
LONDON
EC2M 3XD
Holiday Inn
25 Aytoun Street
MANCHESTER
M1 3AE
© Wise Owl Business Solutions Ltd 2024. All Rights Reserved.