564 attributed reviews in the last 3 years
Refreshingly small course sizes
Outstandingly good courseware
Whizzy online classrooms
Wise Owls only (no freelancers)
Almost no cancellations
We have genuine integrity
We invoice after training
Review 30+ years of Wise Owl
View our top 100 clients
How to process tables in your model, and ideas for speeding processing up
Part one of a three-part series of blogs
Processing tables involves loading data into your model, and can take a long time. This blog explains how processing works in SSAS Tabular and the different options available to you, shows how to process tables or models, and gives some ideas on how to make processing run more quickly.
Processing itself in SSAS Tabular is easy; it's understanding what's going on which is hard! This page explains the underlying concepts of processing, and the choices you'll need to make.
To understand how processing works in SSAS tabular models, and how to tweak it, it's vital to understand how the underlying engine stores data.
Most databases (such as SQL Server) use a row-based storage algorithm:
Typically each row in a table is stored as a record, and is accessed by its primary key (unique identifier).
By contrast, the SSAS database engine stores data by columns:
In tabular models, columns are stored separately (because whole tables are processed in one go, this is an efficient way to store and access data).
Duplicate column values are only ever stored once. Thus the SpeciesId and Legs columns above would be stored as:
For example there are only 3 discrete values for legs: 0, 2 and 4. Here's how SSAS would store the legs for each of the 8 animals:
|Animal||Legs||Position in list||Rank used by SSAS|
Putting the last column together gives the values string 2,2,2,1,1,0,2,2 shown in the first table.
This shows that the lower the cardinality of a column (ie the smaller the number of distinct values there are, and hence the more duplication there is), the more efficiently the data will be stored - a point I'll revisit later in this blog.
When you process any model, two things happen:
|Data (loading data for each table)||The xVelocity engine takes each column in each table and builds a dictionary from it, as explained above.|
|Recalc (tidying up loose ends)||The engine then populates any calculated columns, hierarchies, relationships and indexes for the database.|
It's important to bear these two stages in mind when you're considering what your processing strategy should be.
When you process a table or database, in order to achieve continuity SSAS by default runs a SQL transaction to load the new set of data while leaving the old set unaffected.
This means that if the processing fails at any point use of the existing model will not be affected, but it also means that the xVelocity engine requires a little over twice as much memory to process a table or database as is taken up by the table or database being processed.
Bearing this in mind, when you choose to process you'll be presented with a choice of options as follows (partitions are explained in this separate blog, and are included in the table below for completeness):
|Mode||Level||What it does|
|Process Data||Table or partition||Reads in data and builds dictionaries, but doesn't process any calculated columns, indexes, relationships or hierarchies. Typically a Process Recalc is required after all tables' dictionaries have been updated using Process Data.|
|Process Recalc||Database||Updates all of the calculated columns, indexes, relationships and hierarchies for a model.|
|Process Full||Table, partition or database||Runs a Process Data on the table or partition, followed by a Process Recalc. If you run Process Full for an entire database, SSAS will rebuild all of the tables in the database (running a Process Data on each one), then run a single Process Recalc.|
|Process Defrag||Table or database||Tidies up an individual table, or all tables (this is particularly useful when you've processed individual partitions for a table, and dictionaries might no longer be efficiently stored).|
Given the above, the most efficient approach is usually to:
However, this approach does mean that calculated columns, relationships, hierarchies and indexes won't be available between the start of step 1 and the end of step 2.
One way to get round the above is to run steps 1 and 2 as part of a single SQL transaction (equivalent to doing a Process Full on the entire database), but this will use much more memory.
|Parts of this blog|
25 Aytoun Street