WISE OWL EXERCISES
SSIS INTEGRATION SERVICES EXERCISES
SSIS Integration Services | Previous versions exercise | Review existing lookup transform, and possibly extend
This exercise is provided to allow potential course delegates to choose the correct Wise Owl Microsoft training course, and may not be reproduced in whole or in part in any format without the prior written consent of Wise Owl.
The answer to the exercise will be included and explained if you attend the relevant Wise Owl course (sadly, only in the UK for now).
You need a minimum screen resolution of about 700 pixels width to see our exercises. This is because they contain diagrams and tables which would not be viewable easily on a mobile phone or small laptop. Please use a larger tablet, notebook or desktop computer, or change your screen resolution settings.
If you have spare time on your hands, get this pre-written package working, analyse it to see if you understand what it's doing ... and even try extending its scope!
If you're ambitious, you could even try reading down the description of what the package does, and then try to recreate it from scratch yourself ...
The above folder contains a text file containing the first 5 hours of visits to the Wise Owl website on a particular date (all genuine data!):
The first few rows of the text file.
The exercise aims to create a table of IP addresses referenced:
Each IP address appears once and once only.
The package should then create a final table of web visits:
The package should store the number of the IP address record. Only HTML files are counted.
To start, redirect the flat file connection manager to point to the text file in the above folder:
Double-click on this connection manager to edit it, and point it to the right text file.
There are a lot of other settings changed for this which might be worth reviewing, particularly on the Advanced tab.
You also need to run the SQL script in the above folder to generate the skeleton structures for the tables needed.
Here's what the first data flow task in this package does:
- Deletes all the records from all tables to reset ground zero.
- Imports the text file.
- Combines the date and time given into a single date/time field type.
- Strips out URLs whose names don't end with .HTM or / (ie images, style sheets, etc.).
- Looks up the IP addresses in the table tblIpAddress. Any IP addresses not found are added to this table, via a Sort transform (which removes any duplicates).
- Reunites all rows (those for which IP addresses were found, and those for which IP addresses weren't).
- Stores these rows in a temporary staging table.
The second data flow task just exports the rows from the staging table to the final table, looking up the IP addresses in the (now complete) IP address table as it does so.
If you have time on your hands, you could always try extending the package to store the unique URLs visited in a separate table, and then store the unique URL record number for each web visit, rather than the URL text.
I'm really confused and I'm having a couple of issues right off the bat. 1) In the zip file there is "Webby.dtsx". What is that? I have no idea what to do with that. 2) I want to bring in the text file source but when I create a new connection to it, I can't seperate the columns. Maybe I just don't know how to delimiter it? I don't even know if I'm using that term right. If anyone can please help, it would be greatly appreciated!
The idea behind this exercise is that you should add the Webby package to an existing project, resolve the connection problems in it and see how it works - so it's more looking at an existing package than creating a new one. To start with, right-click on your project in Solution Explorer, choose to add an existing package and select Webby.dtsx. Thereafter you should be able to follow through the exercise.