This page has 0 threads | Add post

Blogs

Finding our blogs useful? Please consider helping us to share more!

How to process tables in your model, and ideas for speeding processing up Part three of a three-part series of blogs
Processing tables involves loading data into your model, and can take a long time. This blog explains how processing works in SSAS Tabular and the different options available to you, shows how to process tables or models, and gives some ideas on how to make processing run more quickly. The theory of processing tables in SSAS (Tabular) The practice of processing in SSAS Tabular Some ideas for speeding up processing (this blog) This blog is part of our online SSAS Tabular tutorial; we also offer lots of other Analysis Services training resources.

How to process tables in your model, and ideas for speeding processing up

Part three of a three-part series of blogs

Processing tables involves loading data into your model, and can take a long time. This blog explains how processing works in SSAS Tabular and the different options available to you, shows how to process tables or models, and gives some ideas on how to make processing run more quickly.

The theory of processing tables in SSAS (Tabular)
The practice of processing in SSAS Tabular
Some ideas for speeding up processing (this blog)

This blog is part of our online SSAS Tabular tutorial; we also offer lots of other Analysis Services training resources.

Posted by Andy Brown on 19 February 2016

Some ideas for speeding up processing

Processing is usually the bottleneck in any company, since it involves loading masses of data from a company's transactional database into its data warehouse, for use the next day as a basis for management reports.

One way to minimise the disruption that this process (forgive the pun) causes is to use SQL Server Agent to schedule it to run at 3 o'clock in the morning, although this is tough on anyone working nights!

This page looks at some ways in which you can make processing take less time.

Step 1 - Partition your tables

I've looked at this in a separate blog, but I've listed it here for completeness. Partitioning allows you to break a table up into different parts, and choose which to process:

For the tblPos table, for example, you could choose just to import the 2016 data, on the grounds that other years have been closed off and so can't have changed.

Step 2 - Omit high-cardinality columns where possible

The temptation when importing tables is to include all of the columns, then just hide the ones you want from client view:

The TransactionId column is hidden from client view, so how much harm can it be doing?

For large datasets, this is a mistake! The TransactionId column shown above is the most expensive one in the entire MAM database, since it contains 13,904 different values:

Because it's the primary key for the transactions table, each transaction id value is necessarily unique.

Remember that SSAS uses column and not row storage, so the column dictionary for the transaction id number will be (at least in MAM terms) huge. All this for a column which no user of your model will ever want to see or use!

The solution is to avoid importing expensive columns with a high level of granularity:

Unticking the transaction id will mean that this expensive column won't be imported.

Step 3 - Pre-sort your data

Take the animal table as an example. Sorting by the animal id or name won't gain much:

The first two columns have unique values for every row, so it doesn't matter what order these appear in.

Sorting by the Legs column, by contrast, will have an effect (since this has fewest unique values):

The Legs column has the fewest distinct values, so it makes sense to sort by this column first.

I have to be honest, I don't know exactly how much difference this will make, but the logic seems good!

Step 4 - avoid calculated columns

Consider the following simple calculation, which works out the value for transactions as the price of the item multiplied by the quantity of items bought:

The value of any item is the price times the quantity.

I've added 3 measures at the bottom of the table:

Measure	Formula	Results
Cardinality price	=DISTINCTCOUNT([Price])	384
Cardinality quantity	=DISTINCTCOUNT([Quantity])	4
Cardinality amount	=DISTINCTCOUNT([Amount])	1045

The maximum possible combination of amounts would be 1,536 (that is, 384 prices multiplied by 4 quantities), but because of repeats there are only 1,045 unique amounts. What this means is that the calculated column takes up nearly three times as much storage as the other two columns combined.

Be aware, however, that by removing the calculated column any query which aggregates transaction amounts will run that much more slowly, since it will have to recalculate them. There is often a trade-off between processing time and query run time!

Step 5 - avoid expensive queries

Because of the way that the xVelocity engine stores data (by columns, not rows) some queries will be more costly to run than others. Here's an example of an expensive query:

This query will be relatively expensive to run, since it has to bring in data from four different columns.

Here's a less expensive one:

This query will be relatively cheap to run, since aggregation over a column runs quickly, and there are only two columns to bring in.

In practise the volume of data is so small for the MAM database that relative timings will be meaningless!

Step 6 - don't store date/time columns

For the make-a-mammal tblPos (point-of-sale) table, we capture the point-of-sale date/time. Typical entries might look like this:

Five typical date/time entries from the Pos table.

If you have DateTime columns like this, you should split them into two parts (date and time). For the above dates, this would give:

Original value	Date	Time
23/11/2015 13:35:23	23/11/2015	13:35:23
23/11/2015 15:22:08	23/11/2015	15:22:08
24/11/2015 13:35:23	23/11/2015	13:35:23
24/11/2015 15:22:08	23/11/2015	15:22:08
25/11/2015 13:35:23	23/11/2015	13:35:23

This would yield the following column dictionaries:

Column	Dictionary	Values
Date	23/11/2015, 24/11/2015, 25/11/2015	0,0,1,1,2
Time	13:35:23, 15:22:08	0,1,0,1,0

The maximum possible number of dates which can now be stored is 365 or 366 for each year in the model, and the maximum number of times is 24 x 60 x 60 = 86,400 seconds.

Step 7 - consider splitting id columns

Typically when processing a model the most expensive columns are unique id ones. You can reduce the amount of data stored by splitting these - here's a suggestion:

I've split the transaction id into the thousands component and the units component).

This would give the following column dictionaries:

Column	Formula	Distinct values
Thousands	=int([TransactionId]/1000)	113
Units	*=[TransactionId]-1000 int([TransactionId]/1000)**	1000

Of course, you'll then have to reconstruct these when creating DAX expressions: it's the age-old trade-off between processing time and query run-time ...

If you want to measure how long it takes to run queries, DAX Studio includes an excellent Server Timings facility.

Parts of this blog
The theory of processing tables in SSAS (Tabular) The practice of processing in SSAS Tabular Some ideas for speeding up processing (this blog)

Some other pages relevant to the above blogs include:

This blog has 0 threads

Add a new post

BLOGS BY TOPIC

BLOGS BY AUTHOR