Browse 554 attributed reviews, viewable separately for our classroom and online training
Errors when you delete a primary key in Power BI
Be careful of referencing measures in calculations when you have duplicate rows in a table! This blog explains why you might want to hang on to that unused primary key column.

Posted by Andy Brown on 22 February 2021

You need a minimum screen resolution of about 700 pixels width to see our blogs. This is because they contain diagrams and tables which would not be viewable easily on a mobile phone or small laptop. Please use a larger tablet, notebook or desktop computer, or change your screen resolution settings.

Errors when you delete a primary key in Power BI

I've got a simple Power BI report which shows average sales by supermarket, but it's giving inconsistent data:

Inconsistent averages

There are two measures giving different average sales for Tesco - they can't both be right!

Confession time.  For years I (we) have been telling delegates to remove unnecessary columns in Power BI when loading data:

Primary key

Columns like the PurchaseId above are particularly expensive to load, since the values are all unique (the column storage engine can't therefore compress the data that effectively).

It's not like I'm using this column for any relationship:

Relationship

After all, the PurchaseId column isn't being used in any relationship, so we can't possibly need it - can we?

The problem for the above database is that there is a duplicate row:

Duplicate row

There is a similar problem with the Create a creature database much loved by delegates on our Power BI and DAX courses, which contains exact duplicate purchase records.

 

Let's look at the measures in the report above.  The one summing the sales revenue is simple:

SumSaleRevenue = SUMX(

Purchase,

Purchase[Quantity] * Purchase[Price]

)

The first one calculating the average sales revenue gives the correct answer:

AvgSaleRevenue = AVERAGEX(

Purchase,

Purchase[Quantity] * Purchase[Price]

)

The second one gives the wrong answer:

AvgSaleRevenue2 = AVERAGEX(

Purchase,

[SumSaleRevenue]

)

Here's an Excel worksheet proving that the first answer is correct:

First answer is correct

The average sales figure for Tesco is indeed 16.64 (being total sales for Tesco of 99.84 divided by 6).

What's happening to get the wrong answer is this.  If there's no PurchaseId column in Power BI, the software pre-calculates the [SumSalesRevenue] measure at the time it's created at the lowest level of granularity:

Combining duplicates

These two rows are identical once you've removed the primary key, so they are combined.

So the data stored for the SumSalesRevenue measure is this:

The column stored

Remember that Power BI stores columns, not rows.

When you try to average this, Power BI gets the number of values right but not the total revenue:

Wrong figure

I've shaded the two figures which the measure incorrectly uses.

The big takeaway from this?  Make sure you don't have any duplicate rows in your tables if you're going to reference explicit measures in calculations!

This blog has 0 threads Add post