Read our blogs, tips and tutorials
Try our exercises or test your skills
Watch our tutorial videos or shorts
Take a self-paced course
Read our recent newsletters
License our courseware
Book expert consultancy
Buy our publications
Get help in using our site
549 attributed reviews in the last 3 years
Refreshingly small course sizes
Outstandingly good courseware
Whizzy online classrooms
Wise Owl trainers only (no freelancers)
Almost no cancellations
We have genuine integrity
We invoice after training
Review 30+ years of Wise Owl
View our top 100 clients
Search our website
We also send out useful tips in a monthly email newsletter ...
Errors when you delete a primary key in Power BI |
---|
Be careful of referencing measures in calculations when you have duplicate rows in a table! This blog explains why you might want to hang on to that unused primary key column. |
I've got a simple Power BI report which shows average sales by supermarket, but it's giving inconsistent data:
There are two measures giving different average sales for Tesco - they can't both be right!
Confession time. For years I (we) have been telling delegates to remove unnecessary columns in Power BI when loading data:
Columns like the PurchaseId above are particularly expensive to load, since the values are all unique (the column storage engine can't therefore compress the data that effectively).
It's not like I'm using this column for any relationship:
After all, the PurchaseId column isn't being used in any relationship, so we can't possibly need it - can we?
The problem for the above database is that there is a duplicate row:
There is a similar problem with the Create a creature database much loved by delegates on our Power BI and DAX courses, which contains exact duplicate purchase records.
Let's look at the measures in the report above. The one summing the sales revenue is simple:
SumSaleRevenue = SUMX(
Purchase,
Purchase[Quantity] * Purchase[Price]
)
The first one calculating the average sales revenue gives the correct answer:
AvgSaleRevenue = AVERAGEX(
Purchase,
Purchase[Quantity] * Purchase[Price]
)
The second one gives the wrong answer:
AvgSaleRevenue2 = AVERAGEX(
Purchase,
[SumSaleRevenue]
)
Here's an Excel worksheet proving that the first answer is correct:
The average sales figure for Tesco is indeed 16.64 (being total sales for Tesco of 99.84 divided by 6).
What's happening to get the wrong answer is this. If there's no PurchaseId column in Power BI, the software pre-calculates the [SumSalesRevenue] measure at the time it's created at the lowest level of granularity:
These two rows are identical once you've removed the primary key, so they are combined.
So the data stored for the SumSalesRevenue measure is this:
Remember that Power BI stores columns, not rows.
When you try to average this, Power BI gets the number of values right but not the total revenue:
I've shaded the two figures which the measure incorrectly uses.
The big takeaway from this? Make sure you don't have any duplicate rows in your tables if you're going to reference explicit measures in calculations!
Some other pages relevant to the above blog include:
Kingsmoor House
Railway Street
GLOSSOP
SK13 2AA
Landmark Offices
99 Bishopsgate
LONDON
EC2M 3XD
Holiday Inn
25 Aytoun Street
MANCHESTER
M1 3AE
© Wise Owl Business Solutions Ltd 2024. All Rights Reserved.