BLOGS BY TOPIC
BLOGS BY AUTHOR
BLOGS BY YEAR
Posted by Andy Brown on 21 April 2017
You need a minimum screen resolution of about 700 pixels width to see our blogs. This is because they contain diagrams and tables which would not be viewable easily on a mobile phone or small laptop. Please use a larger tablet, notebook or desktop computer, or change your screen resolution settings.
Use SSIS to import all of the worksheets from an Excel file
This blog shows how to loop over the worksheets in an Excel workbook, importing the contents of each. The technique will only work if the worksheets are all in the same format.
We want to consolidate three separate months of shopping data into a single SQL Server table:
Each month contains a separate list of shopping trips, but (crucially) they share the same format.
Here's what the end result should be:
What the final answer should look like (although the rows won't necessarily be in this order).
If you want to work through the case study in this blog, first run this script in Management Studio to create the table you'll need:
-- create table for shopping data
CREATE TABLE AdoShopping(
SELECT * FROM AdoShopping
That's enough about what we need to do. Time now to do it!
The underlying principle
We need to get a list of all of the names of the worksheets in our workbook:
We'll never see this, but this is the collection of names that we need to loop over.
We'll accomplish the above using a Foreach ADO.NET Schema Rowset Enumerator loop. You'd normally use this to loop over tables, views or stored procedures in a SQL Server database, but it can also be used - as we'll see - to loop over worksheet names.
Step 1 - Create a variable to hold each worksheet's name
The first thing to do is to create a variable to hold each worksheet's name in turn:
It's crucial that you give a value to this variable equal to the name of one of the worksheets that you want to import, otherwise you won't be able to get the data flow task working later.
Step 2 - Add the basic package tasks
Create a package which deletes any old rows from the AdoShopping table, then has an empty loop and an empty data flow task:
The Execute SQL task will run the command TRUNCATE TABLE AdoShopping (if you don't know how to set this up, you're probably reading the wrong blog). See below for how to configure the other tasks.
Step 3 - Creating a connection for the Foreach Loop
Double-click on the Foreach Loop task to edit it, and say what sort of loop it is:
On the Collection tab, specify that this loop should be an ADO.NET Schema Rowset Enumerator (!).
Now choose to create a new connection:
Choose to create a new connection
Ask to create a new connection:
Click to create a new connection.
Choose the provider that you want to use:
It may seem strange that you use an Access provider to connect to an Excel worksheet, but ...
Now type in the Excel workbook to which you want to link:
Make sure that you include the full file path and file name (obviously your path will be different to the one shown here).
On the All tab, set the Extended Properties to Excel 12.0;HDR-YES;:
Type this in carefully!
HDR=YES tells SSIS that the first row of your Excel worksheet contains column headers, not data, but the more important information is the internal Excel version number.
Test your connection:
Click on this button at the bottom left to test your connection.
You should get this message:
Yeah! Or if you get an error, check you haven't left the workbook open in Excel.
Select OK until you exit the dialog box for configuring your connection, then optionally rename the connection:
Rename the connection that you've created for your package.
Step 4 - Configuring the rest of the Foreach Loop
Double-click on your Foreach Loop to edit it, and specify that you want to return a list of the Excel tables:
Tables may seem an odd choice, but this is internally how SQL will get at the worksheet names.
On the Variable Mappings tab, choose to store the second bit of information returned in your string variable:
Type in number 2 as the index number (for why, see hint below).
When you get a list of tables as above, you get the table catalog name, table schema name, table name and table type. All but one of these bits of information is useless (and the first two don't even seem to be set). You want to store the third bit of information, the table name, which is index number 2 (since nearly everything in SSIS is numbered from 0).
Step 5 - Configuring the data flow task
Here's what the data flow task will look like:
The details of how to export to SQL Server aren't shown here. The Excel source should point to the Shopping.xlsx workbook.
For the Excel source, choose as the data access mode that the worksheet name is contained in a table variable, and specify which:
Each time round the loop, the Excel source will point to the worksheet whose name will be contained in the SheetName variable.
Here are the mappings for the SQL Server destination:
The mappings from Excel source to SQL Server destination table AdoShopping.
Step 6 - running the package
The final thing to do is to run the package:
What you should see in control flow
The data flow task will flash once for each worksheet:
At the end of the package, this is what the data flow should show (since there are 6 rows in the March worksheet, the last one to be imported).
And that's it! Told you it was a messy procedure ...