BLOGS BY TOPIC
BLOGS BY AUTHOR
BLOGS BY YEAR
Processing tables involves loading data into your model, and can take a long time. This blog explains how processing works in SSAS Tabular and the different options available to you, shows how to process tables or models, and gives some ideas on how to make processing run more quickly.
- The theory of processing tables in SSAS (Tabular) (this blog)
- The practice of processing in SSAS Tabular
- Some ideas for speeding up processing
Posted by Andy Brown on 19 February 2016
You need a minimum screen resolution of about 700 pixels width to see our blogs. This is because they contain diagrams and tables which would not be viewable easily on a mobile phone or small laptop. Please use a larger tablet, notebook or desktop computer, or change your screen resolution settings.
The theory of processing tables in SSAS (Tabular)
Processing itself in SSAS Tabular is easy; it's understanding what's going on which is hard! This page explains the underlying concepts of processing, and the choices you'll need to make.
To understand how processing works in SSAS tabular models, and how to tweak it, it's vital to understand how the underlying engine stores data.
Most databases (such as SQL Server) use a row-based storage algorithm:
Typically each row in a table is stored as a record, and is accessed by its primary key (unique identifier).
By contrast, the SSAS database engine stores data by columns:
In tabular models, columns are stored separately (because whole tables are processed in one go, this is an efficient way to store and access data).
Data compression for column storage
Duplicate column values are only ever stored once. Thus the SpeciesId and Legs columns above would be stored as:
For example there are only 3 discrete values for legs: 0, 2 and 4. Here's how SSAS would store the legs for each of the 8 animals:
|Animal||Legs||Position in list||Rank used by SSAS|
Putting the last column together gives the values string 2,2,2,1,1,0,2,2 shown in the first table.
This shows that the lower the cardinality of a column (ie the smaller the number of distinct values there are, and hence the more duplication there is), the more efficiently the data will be stored - a point I'll revisit later in this blog.
The two stages of processing
When you process any model, two things happen:
|Data (loading data for each table)||The xVelocity engine takes each column in each table and builds a dictionary from it, as explained above.|
|Recalc (tidying up loose ends)||The engine then populates any calculated columns, hierarchies, relationships and indexes for the database.|
It's important to bear these two stages in mind when you're considering what your processing strategy should be.
The types of processing
When you process a table or database, in order to achieve continuity SSAS by default runs a SQL transaction to load the new set of data while leaving the old set unaffected.
This means that if the processing fails at any point use of the existing model will not be affected, but it also means that the xVelocity engine requires a little over twice as much memory to process a table or database as is taken up by the table or database being processed.
Bearing this in mind, when you choose to process you'll be presented with a choice of options as follows (partitions are explained in this separate blog, and are included in the table below for completeness):
|Mode||Level||What it does|
|Process Data||Table or partition||Reads in data and builds dictionaries, but doesn't process any calculated columns, indexes, relationships or hierarchies. Typically a Process Recalc is required after all tables' dictionaries have been updated using Process Data.|
|Process Recalc||Database||Updates all of the calculated columns, indexes, relationships and hierarchies for a model.|
|Process Full||Table, partition or database||Runs a Process Data on the table or partition, followed by a Process Recalc. If you run Process Full for an entire database, SSAS will rebuild all of the tables in the database (running a Process Data on each one), then run a single Process Recalc.|
|Process Defrag||Table or database||Tidies up an individual table, or all tables (this is particularly useful when you've processed individual partitions for a table, and dictionaries might no longer be efficiently stored).|
The most efficient strategy
Given the above, the most efficient approach is usually to:
- Run a Process Data on each table (or on each partition of each table) to create up-to-date dictionaries for all of its columns; then
- When this is finished, run a single Process Recalc on the entire database.
However, this approach does mean that calculated columns, relationships, hierarchies and indexes won't be available between the start of step 1 and the end of step 2.
One way to get round the above is to run steps 1 and 2 as part of a single SQL transaction (equivalent to doing a Process Full on the entire database), but this will use much more memory.