Read our blogs, tips and tutorials
Try our exercises or test your skills
Watch our tutorial videos or shorts
Take a self-paced course
Read our recent newsletters
License our courseware
Book expert consultancy
Buy our publications
Get help in using our site
559 attributed reviews in the last 3 years
Refreshingly small course sizes
Outstandingly good courseware
Whizzy online classrooms
Wise Owl trainers only (no freelancers)
Almost no cancellations
We have genuine integrity
We invoice after training
Review 30+ years of Wise Owl
View our top 100 clients
Search our website
We also send out useful tips in a monthly email newsletter ...
What does the SCD transform in SSIS do, and should you use it? Part four of a six-part series of blogs |
---|
Integration Services include a Slowly Changing Dimension transform, which is designed to make loading dimension data easier. This blog explains how it works, and discusses whether it achieves its objectives.
|
To me, this is the most useful type of SCD: one where you capture the history of changes made to the grade. Here's how it works.
The first part of this blog got you to set up the data we needed. You now need to add 3 additional columns to the dimension table to allow us to capture historical data. To do this, run this script:
-- get rid of old copy of dimension table
BEGIN TRY
DROP TABLE DimPerson
END TRY
BEGIN CATCH
END CATCH
-- recreate the dimension table, with 3 new columns
CREATE TABLE dbo.DimPerson(
SurrogateKeyId int IDENTITY(1,1) NOT NULL,
PersonId int NOT NULL,
FirstName nvarchar(255) NULL,
Surname nvarchar(255) NULL,
Grade varchar(50) NULL,
GradeStatus varchar(50) NULL,
FromDate date NULL,
ToDate date NULL
)
-- add in 3 people
SET IDENTITY_INSERT dbo.DimPerson ON
INSERT dbo.DimPerson (
SurrogateKeyId,
PersonId,
FirstName,
Surname,
Grade,
GradeStatus,
FromDate,
ToDate
) VALUES (
1, 1, N'Rita', N'Smith', N'Office junior',
'Current','01/01/2000',Null)
INSERT dbo.DimPerson (
SurrogateKeyId,
PersonId,
FirstName,
Surname,
Grade,
GradeStatus,
FromDate,
ToDate
) VALUES (
2, 2, N'Bob', N'Brown', N'Office junior',
'Current','01/01/2000',Null
)
INSERT dbo.DimPerson (
SurrogateKeyId,
PersonId,
FirstName,
Surname,
Grade,
GradeStatus,
FromDate,
ToDate
) VALUES (
3, 3, N'Sue', N'Jones', N'Office junior',
'Current','01/01/2000',Null
)
SET IDENTITY_INSERT dbo.DimPerson OFF
-- show what we've created
SELECT * FROM DimPerson
Running this should give a squeaky clean new table:
We can track changes either using the selected GradeStatus column, or the FromDate and ToDate columns.
It's vital that existing dimension table records contain sensible values for the historical status columns, otherwise SSIS has nothing to pick up on when adding new rows.
Those who have read this far will know what comes next!
What you should now be looking at is something like this:
The start of the package - time to configure our SCD transform!
Double-click on the SCD transform and configure it as described below. After missing out the first stage (if it even appears), you can specify which columns contain changes to the dimension table that you want to track:
We'll track changes to the Grade column only, and use the PersonId to match rows between the transactional database source table and dimension table target.
If you're not sure what to do above, I've described it in more details in the previous parts of this blog.
In the next stage of the wizard, say that you want to keep a historical record of changes:
We want to track changes to the Grade column, and keep a history of previous ones.
You can now configure how you want to track historical data, either by using a single column or start and end dates (described under separate headings below).
If you just want to track whether a row is current or has expired, choose these options:
We'll use the GradeStatus column to track whether a row is current or not.
Note that although the Expiration value dropdown only gives two values, you can type in your own (here we've typed in No longer relevant). You can now go through the rest of the wizard, choosing not to use inferred members (these are described in the next part of this blog), to give:
The resulting package.
When you run this, this is what the DimPerson dimension table should look like:
The GradeStatus has been updated for old records, but hasn't been correctly set for interim new ones.
The problem shown above - that interim rows show up as Current - shouldn't be a problem in a real-world example, since they're called "Slowly Changing Dimensions" for a reason, and if you run regular updates you'll never by making two changes in the same package.
This option seems better, simply because it captures more data (often it's not enough to know whether a row is obsolete or not - you need to know when the change occurred). Here's how to track changes using a start/end date.
First (optionally) create a variable:
We'll assume updates are occurring on 1st January 2015.
Now configure the historical attribute as follows:
By this point it's normally fairly obvious what to do. Rather than assuming changes occurred today, we'll use 1st January 2015 instead.
Step through the rest of the wizard (choosing not to implement inferred member support, which is covered in the next part of this blog) to get:
The final package looks the same as for a single column.
Run this package to get:
The start dates and end dates have been set correctly.
Note that as for the single-column example above we have a problem if more than one update is made to the same row (Rita Smith's promotion to Associate incorrectly doesn't have an end date). In a real-life scenario this is unlikely to be a problem, as updates will be scheduled overnight and changes will happen only occasionally.
If you're going to use SCDs (and the final part of this blog considers reasons why you might not) a type 2 SCD tracking historical changes with start and end dates seems the best way to do it.
Before we look at some disadvantages of Slowly Changing Dimension transforms, let's take a brief digression to look at inferred members.
Parts of this blog |
---|
|
Some other pages relevant to the above blogs include:
Kingsmoor House
Railway Street
GLOSSOP
SK13 2AA
Landmark Offices
99 Bishopsgate
LONDON
EC2M 3XD
Holiday Inn
25 Aytoun Street
MANCHESTER
M1 3AE
© Wise Owl Business Solutions Ltd 2024. All Rights Reserved.