Exercise: Review existing lookup transform, and possibly extend

This exercise is provided to allow potential course delegates to choose the correct Wise Owl Microsoft training course, and may not be reproduced in whole or in part in any format without the prior written consent of Wise Owl.

The answer to the exercise will be included and explained if you attend the Wise Owl course listed below!

Category ==> SSIS Integration Services  (18 exercises)
Topic ==> Lookup transforms  (2 exercises)
Level ==> Harder than average
Course ==> Integration Services
Before you can do this exercise, you'll need to download and unzip this file (if you have any problems doing this, click here for help).

You need a minimum screen resolution of about 700 pixels width to see our exercises. This is because they contain diagrams and tables which would not be viewable easily on a mobile phone or small laptop. Please use a larger tablet, notebook or desktop computer, or change your screen resolution settings.

If you have spare time on your hands, get this pre-written package working, analyse it to see if you understand what it's doing ... and even try extending its scope!

If you're ambitious, you could even try reading down the description of what the package does, and then try to recreate it from scratch yourself ...

The above folder contains a text file containing the first 5 hours of visits to the Wise Owl website on a particular date (all genuine data!):

Website visits in text form

The first few rows of the text file.

 

The exercise aims to create a table of IP addresses referenced:

Table of IP addresses

Each IP address appears once and once only.

  

The package should then create a final table of web visits:

The first few website visits

The package should store the number of the IP address record.  Only HTML files are counted.

 

To start, redirect the flat file connection manager to point to the text file in the above folder:

Redirect flat file connection manager

Double-click on this connection manager to edit it, and point it to the right text file.

 

There are a lot of other settings changed for this which might be worth reviewing, particularly on the Advanced tab.

You also need to run the SQL script in the above folder to generate the skeleton structures for the tables needed.

Here's what the first data flow task in this package does:

  1. Deletes all the records from all tables to reset ground zero.
  2. Imports the text file.
  3. Combines the date and time given into a single date/time field type.
  4. Strips out URLs whose names don't end with .HTM or / (ie images, style sheets, etc.).
  5. Looks up the IP addresses in the table tblIpAddress.  Any IP addresses not found are added to this table, via a Sort transform (which removes any duplicates).
  6. Reunites all rows (those for which IP addresses were found, and those for which IP addresses weren't).
  7. Stores these rows in a temporary staging table.

The second data flow task just exports the rows from the staging table to the final table, looking up the IP addresses in the (now complete) IP address table as it does so.

If you have time on your hands, you could always try extending the package to store the unique URLs visited in a separate table, and then store the unique URL record number for each web visit, rather than the URL text. 

This page has 0 threads Add post