BLOGS BY TOPIC
BLOGS BY AUTHOR
BLOGS BY YEAR
Posted by Andy Brown on 07 April 2016
You need a minimum screen resolution of about 700 pixels width to see our blogs. This is because they contain diagrams and tables which would not be viewable easily on a mobile phone or small laptop. Please use a larger tablet, notebook or desktop computer, or change your screen resolution settings.
Looping over Multiple Worksheets in an Excel File using SSIS
Although the concept behind how to do this is straightforward, the implementation - as you'll see from the length of this blog - is a bit messy.
You may also have an issue with drivers. For this example, I've deliberately used an old-format Excel file, because I don't have the 64-bit driver for Excel installed on my computer. For more information, see Grasshopper's excellent comments in this article.
We're not sure where to go on holiday this summer, so I've created a workbook with some ideas. You can download this here, and use it for the rest of this example:
My holiday ideas for countries in Africa - similar ideas exist for 4 other continents (I've ruled out Antarctica this year, as it's just too cold - plus I don't like penguins).
What I want to do is to import the data for all 5 worksheets into a single table in SQL Server:
The table will contain holiday ideas for all continents.
If you want to work through this example using this table, first run this script in Management Studio to create the table:
-- creates a table which we'll populate with holiday ideas
CREATE TABLE tblCountry(
CountryId int PRIMARY KEY IDENTITY(1,1),
Continent nvarchar(50) NULL,
CountryName nvarchar(50) NULL,
Thoughts nvarchar(4000) NULL
So much for what we're trying to do. How to go about it?
The underlying principle
What we're going to do is to run a query against the Excel workbook to get a list of the worksheet names. Although we'll never actually see the results, here's what they'd look like:
We'll choose to loop over the worksheet names in the Excel workbook. You can think of this as looping over this table of names.
We'll accomplish the above using a Foreach ADO.NET Schema Rowset Enumerator loop. You'd normally use this to loop over tables, views or stored procedures in a SQL Server database, but it can also be used - as we'll see - to loop over worksheet names.
Creating a variable to hold each worksheet name
First, create a variable to hold the name of each worksheet:
Set the initial value for the variable to be the name of one of the existing worksheets, followed by a $ sign.
This variable will hold the name of each worksheet, but you must set the initial value to be an existing worksheet name, because otherwise you won't later be able to configure the data flow task to import from this workbook.
Adding the loop over worksheets
To loop over worksheets, first add a Foreach Loop:
Add this loop to a new package.
Now double-click on it to edit it, and specify what you're looping over:
Choose the Collection tab, and specify the loop type shown above.
Now choose to create a connection to the thing (normally a database, but here a workbook) containing the schema rowset to enumerate over:
Choose to create a new connection.
In the dialog box which appears, choose to create a new connection manager:
Click at the bottom right of the dialog box which appears to create a new connection manager.
Click on the drop arrow at the top of the dialog box which appears, and choose to use the Jet engine:
Choose this provider from the list, allowing you to link to an Excel workbook.
Now choose to browse to the workbook containing the worksheets you want to loop over:
Click on the button shown to find your workbook.
At the bottom right corner of the dialog box which appears, make sure you're not just looking at Access databases:
Choose to show all files.
Find and double-click on your workbook, to select it:
You can ignore the user name and password - these are only relevant for some Access databases.
Test your connection:
If you get an error message when you click on this button, either you've done something wrong or you haven't got the right driver installed (see the hint at the top of this blog).
Now go to the All tab on the left, scroll up to the Advanced properties and type the Extended Properties in as Excel 8.0:
This identifies what type of file it is that you're looping over.
Select OK until you're back in your Foreach loop, and finally (!) set the variable mapping:
What this means is that you'll take the 3rd bit of information returned for each worksheet - its name - and store it in the SheetName variable (in SSIS collections are always numbered from 0, not 1).
I have no idea what the first and second bit of information returned for a worksheet are, nor can I find anything about this on the Internet.
Importing the data for each worksheet
Add a data flow task within your loop - this is what you should now have:
For each worksheet, we'll import the data into our SQL Server table.
Double-click on the data flow task to edit it, and create this source:
This should connect to the Holiday Ideas workbook. The red cross is because we haven't said which worksheet we'll use - that's the next step!
Double-click on the Excel source, and assign a worksheet to it:
Because all of the worksheets have the same format, it doesn't matter which one you choose.
Add an OLEDB destination:
This should connect to your SQL Server database, and specifically to the tblCountry table. I haven't shown how to do this here, but if you're reading this article you should know the basics, I'm hoping.
Here's what the mappings should look like:
We'll map the two Excel columns into two columns in the SQL Server table. The country id is an identity column whose value will be set automatically.
Now go back into your Excel data source to edit it, and change the source to the variable whose value is being set each time round the Foreach loop:
Choose the data access mode as Table name or view name variable, then choose your SheetName variable.
Running the package
You should now be able to run your package, to get this table in SQL Server:
The Continent column is, as yet, null.
Tidying up the package - emptying the table to start
To make the package perfect, it would be good to delete old rows from the table before importing new ones:
Add an Execute SQL task to delete all the old holidays before importing new ones.
The SQL statement to use for this task is shown below:
Use this statement to delete all of the rows from the tblCountry table.
Adding the continent name
Finally, we need to add the continent name. To do this, add a derived column transform as follows:
This transform adds a new column, set to equal the name of the current worksheet (as stored in the SheetName variable).
Here's what the derived column transform should do:
The transform should add a new column (which we may as well call Continent) - the expression for this is shown below.
Here's the expression to use (only the first bit of this is shown above):
LEN(@[User::SheetName]) - 1
The SheetName variable stores each worksheet name with a $ at the end (Africa$, rather than Africa). This expression takes a substring of the sheet name, beginning at character 1 and ending 1 character before the end.
The (DT_WSTR,50) cast is needed to turn the results into a 50 character string, to ensure America doesn't overflow the length of the column.
Here's what the final mappings in the SQL Server destination should look like:
You'll need to add the Continent mapping.
And here's what the final results should be when you run the package:
Like I said - conceptually not too bad, but messy.