Use the GROUP BY clause in SQL for statistics
Part four of a four-part series of blogs

If you want to average, count, sum or perform any other statistic on values in SQL, you need to learn the GROUP BY and HAVING parts of a SQL statement. That's where this online tutorial comes in!

  1. Grouping and Aggregating Data using GROUP BY
  2. Using criteria in a GROUP BY query to restrict results
  3. Special considerations for aggregation statistics
  4. Varying Grouping Results: Rollup, Cube and Compute (this blog)

This blog is part of our much longer full SQL tutorial (although we also run training courses in SQL, for up to 6 people at a time).

Posted by Andy Brown on 14 December 2012

You need a minimum screen resolution of about 700 pixels width to see our blogs. This is because they contain diagrams and tables which would not be viewable easily on a mobile phone or small laptop. Please use a larger tablet, notebook or desktop computer, or change your screen resolution settings.

Varying Grouping Results: Rollup, Cube and Compute

There are a few ways that you can tweak the results of a grouping command in T-SQL - as listed below!

Our example query

Most of the examples on this page begin life with the following query:

SELECT

d.DirectorName,

s.StudioName,

COUNT(*) AS 'Number of films'

FROM

tblFilm AS f

INNER JOIN tblDirector AS d

ON f.FilmDirectorId = d.DirectorId

INNER JOIN tblStudio AS s

ON f.FilmStudioId = s.StudioId

GROUP BY

d.DirectorName,

s.StudioName

ORDER BY

d.DirectorName ASC,

s.StudioName ASC

This query shows the number of films for each combination of director and studio:

Films by director and studio

The query lists the number of films made for each combination of director and studio.

 

Using WITH ROLLUP to combine statistics

Our query above shows the number of films for each director/studio combination.  We might also like to show the number of films for each director, in a separate query:

SELECT

d.DirectorName,

COUNT(*) AS 'Number of films'

FROM

tblFilm AS f

INNER JOIN tblDirector AS d

ON f.FilmDirectorId = d.DirectorId

GROUP BY

d.DirectorName

ORDER BY

d.DirectorName ASC

This would give something like this:

Number of films by director

The start of the output from the above query.

 

However, what happens if you want to show both sets of totals in the same query?  The answer - use WITH ROLLUP:

SELECT

d.DirectorName,

s.StudioName,

COUNT(*) AS 'Number of films'

FROM

tblFilm AS f

INNER JOIN tblDirector AS d

ON f.FilmDirectorId = d.DirectorId

INNER JOIN tblStudio AS s

ON f.FilmStudioId = s.StudioId

GROUP BY

d.DirectorName,

s.StudioName WITH ROLLUP

ORDER BY

d.DirectorName ASC,

s.StudioName ASC

What these two extra words do is to introduce a new level of aggregation:

Number of films by director

You can see a grand total, a total for each director and then the total for each director/studio combination. 

 

Using WITH CUBE to create all possible combinations of subtotals

The above query counts the number of films for each director, but not for each studio.  To show both subtotals yoiu can use the CUBE operator.  You can even specify for exactly which columns you want to agggregate data. 

These grouping lines will show subtotals for each director and for each studio for our example query above:

GROUP BY

CUBE(

d.DirectorName,

s.StudioName

)

This query, by contrast, will only show subtotals for each studio: 

GROUP BY

d.DirectorName,

CUBE(s.StudioName)

The CUBE operator replaces the WITH CUBE keywords (which are now no longer supported).  I confess that I can't see the point of using WITH ROLLUP instead oF using CUBE, but I've left it in this blog just in case I'm missing the point! 

Showing or sorting by the level of grouping 

You can use the GROUPING keyword to see at what level of grouping a statistic is being calculated:

SELECT

d.DirectorName,

s.StudioName,

COUNT(*) AS 'Number of films',

 

GROUPING(d.DirectorName) AS 'Director level',

GROUPING(s.StudioName) AS 'Studio level'

 

FROM

tblFilm AS f

INNER JOIN tblDirector AS d

ON f.FilmDirectorId = d.DirectorId

INNER JOIN tblStudio AS s

ON f.FilmStudioId = s.StudioId

GROUP BY

CUBE(d.DirectorName,s.StudioName)

ORDER BY

'Director Level' DESC,

'Studio Level' DESC

Here are the first few rows returned by this query:

Number of films by studio and director

The grand total is shown first, then each of the director totals.

Using grouping sets 

Still not learnt enough about combining different subtotals?  How about creating grouping sets?  These allow you to combine a number of different grouping levels in the same query - just like the CUBE operator does.  Here's an example:

SELECT

d.DirectorName,

s.StudioName,

COUNT(*) AS 'Number of films'

FROM

tblFilm AS f

INNER JOIN tblDirector AS d

ON f.FilmDirectorId = d.DirectorId

INNER JOIN tblStudio AS s

ON f.FilmStudioId = s.StudioId

GROUP BY

GROUPING SETS (

(d.DirectorName,s.StudioName),

(d.DirectorName),

(s.StudioName),

()

)

In this example, we've specified 4 ways to group data:

Grouping set What grouping by
(d.DirectorName,s.StudioName)  Director and studio combination.
(d.DirectorName)  Each director. 
(s.StudioName)  Each studio.
()  Nothing (just returns a single figure).

Using grouping sets like this allows you to avoid using the UNION operator to combine results from several different queries.

Using COMPUTE to show additional statistics 

One other (odd) way that you can perform grouping is using the COMPUTE keyword.  For example:

SELECT

d.DirectorName,

f.FilmName,

f.FilmOscarWins

FROM

tblFilm AS f

INNER JOIN tblDirector AS d

ON f.FilmDirectorId = d.DirectorId

INNER JOIN tblStudio AS s

ON f.FilmStudioId = s.StudioId

ORDER BY

d.DirectorName

 

-- now show number of films for each director

COMPUTE

COUNT(f.FilmOscarWins)

BY

d.DirectorName

Here's the beginning of what this would show:

Number of films by director

The first couple of output sets for this query.  Each one shows the total number of rows returned for that director.

 

A limitation of COMPUTE is that the query must order the results by the same field that you then compute by.

Quite why you would ever use this I don't know; I've included it for the sake of completeness!

This blog has 0 threads Add post