Using SSIS to loop over the worksheets in an Excel workbook, importing each
Looping over Excel workbooks is one thing, but how can you loop over all of the worksheets in a single workbook, importing the contents of each? This blog explains the steps you need to follow.

Posted by Andy Brown on 07 April 2016

You need a minimum screen resolution of about 700 pixels width to see our blogs. This is because they contain diagrams and tables which would not be viewable easily on a mobile phone or small laptop. Please use a larger tablet, notebook or desktop computer, or change your screen resolution settings.

Looping over Multiple Worksheets in an Excel File using SSIS

Although the concept behind how to do this is straightforward, the implementation - as you'll see from the length of this blog - is a bit messy.

You may also have an issue with drivers.  For this example, I've deliberately used an old-format Excel file, because I don't have the 64-bit driver for Excel installed on my computer.  For more information, see Grasshopper's excellent comments in this article.

Our example

We're not sure where to go on holiday this summer, so I've created a workbook with some ideas.  You can download this here, and use it for the rest of this example:

Holiday ideas workbook

My holiday ideas for countries in Africa - similar ideas exist for 4 other continents (I've ruled out Antarctica this year, as it's just too cold - plus I don't like penguins).

What I want to do is to import the data for all 5 worksheets into a single table in SQL Server:

The first holiday

The table will contain holiday ideas for all continents.

If you want to work through this example using this table, first run this script in Management Studio to create the table:

-- creates a table which we'll populate with holiday ideas

CREATE TABLE tblCountry(

CountryId int PRIMARY KEY IDENTITY(1,1),

Continent nvarchar(50) NULL,

CountryName nvarchar(50) NULL,

Thoughts nvarchar(4000) NULL

)

So much for what we're trying to do.  How to go about it?

The underlying principle

What we're going to do is to run a query against the Excel workbook to get a list of the worksheet names.  Although we'll never actually see the results, here's what they'd look like:

Imaginary worksheets table

We'll choose to loop over the worksheet names in the Excel workbook. You can think of this as looping over this table of names.

 

We'll accomplish the above using a Foreach ADO.NET Schema Rowset Enumerator loop.  You'd normally use this to loop over tables, views or stored procedures in a SQL Server database, but it can also be used - as we'll see - to loop over worksheet names.

Creating a variable to hold each worksheet name

First, create a variable to hold the name of each worksheet:

The variable name

Set the initial value for the variable to be the name of one of the existing worksheets, followed by a $ sign.

This variable will hold the name of each worksheet, but you must set the initial value to be an existing worksheet name, because otherwise you won't later be able to configure the data flow task to import from this workbook.

Adding the loop over worksheets

To loop over worksheets, first add a Foreach Loop:

The foreach loop

Add this loop to a new package.

 

Now double-click on it to edit it, and specify what you're looping over:

Setting the loop type

Choose the Collection tab, and specify the loop type shown above.

Now choose to create a connection to the thing (normally a database, but here a workbook) containing the schema rowset to enumerate over:

Creating connection

Choose to create a new connection.

In the dialog box which appears, choose to create a new connection manager:

New connection manager

Click at the bottom right of the dialog box which appears to create a new connection manager.

 

Click on the drop arrow at the top of the dialog box which appears, and choose to use the Jet engine:

Jet provider

Choose this provider from the list, allowing you to link to an Excel workbook.

Now choose to browse to the workbook containing the worksheets you want to loop over:

Browse button

Click on the button shown to find your workbook.

At the bottom right corner of the dialog box which appears, make sure you're not just looking at Access databases:

Types of files

Choose to show all files.

Find and double-click on your workbook, to select it:

Choosing the workbook

You can ignore the user name and password - these are only relevant for some Access databases.

Test your connection:

Testing connection

If you get an error message when you click on this button, either you've done something wrong or you haven't got the right driver installed (see the hint at the top of this blog).

Now go to the All tab on the left, scroll up to the Advanced properties and type the Extended Properties in as Excel 8.0:

Identity type of file

This identifies what type of file it is that you're looping over.

Select OK until you're back in your Foreach loop, and finally (!) set the variable mapping:

Variable mapping

What this means is that you'll take the 3rd bit of information returned for each worksheet - its name - and store it in the SheetName variable (in SSIS collections are always numbered from 0, not 1).

I have no idea what the first and second bit of information returned for a worksheet are, nor can I find anything about this on the Internet.

Importing the data for each worksheet

Add a data flow task within your loop - this is what you should now have:

Data flow task

For each worksheet, we'll import the data into our SQL Server table.

 

Double-click on the data flow task to edit it, and create this source:

The Excel data source

This should connect to the Holiday Ideas workbook.  The red cross is because we haven't said which worksheet we'll use - that's the next step!

Double-click on the Excel source, and assign a worksheet to it:

Choose a worksheet

Because all of the worksheets have the same format, it doesn't matter which one you choose.

Add an OLEDB destination:

SQL Server destination

This should connect to your SQL Server database, and specifically to the tblCountry table. I haven't shown how to do this here, but if you're reading this article you should know the basics, I'm hoping.

 

Here's what the mappings should look like:

Table mappings

We'll map the two Excel columns into two columns in the SQL Server table. The country id is an identity column whose value will be set automatically.

 

Now go back into your Excel data source to edit it, and change the source to the variable whose value is being set each time round the Foreach loop:

Setting the sheet variable name

Choose the data access mode as Table name or view name variable, then choose your SheetName variable.

Running the package

You should now be able to run your package, to get this table in SQL Server:

Table of results

The Continent column is, as yet, null.

Tidying up the package - emptying the table to start

To make the package perfect, it would be good to delete old rows from the table before importing new ones:

Execute SQL task

Add an Execute SQL task to delete all the old holidays before importing new ones.

 

The SQL statement to use for this task is shown below:

Truncating table

Use this statement to delete all of the rows from the tblCountry table.

Adding the continent name

Finally, we need to add the continent name.  To do this, add a derived column transform as follows:

Derived column transform

This transform adds a new column, set to equal the name of the current worksheet (as stored in the SheetName variable).

 

Here's what the derived column transform should do:

Derived column expression

The transform should add a new column (which we may as well call Continent) - the expression for this is shown below.

Here's the expression to use (only the first bit of this is shown above):

(DT_WSTR,50) (

SUBSTRING(

@[User::SheetName],

1,

LEN(@[User::SheetName]) - 1

)

)

The SheetName variable stores each worksheet name with a $ at the end (Africa$, rather than Africa).  This expression takes a substring of the sheet name, beginning at character 1 and ending 1 character before the end.

The (DT_WSTR,50) cast is needed to turn the results into a 50 character string, to ensure America doesn't overflow the length of the column.

Here's what the final mappings in the SQL Server destination should look like:

Final mappings for columns

You'll need to add the Continent mapping.

And here's what the final results should be when you run the package:

The final results

Phew!

Like I said - conceptually not too bad, but messy. 

This blog has 0 threads Add post