Use the GROUP BY clause in SQL for statistics
Part three of a four-part series of blogs

If you want to average, count, sum or perform any other statistic on values in SQL, you need to learn the GROUP BY and HAVING parts of a SQL statement. That's where this online tutorial comes in!

1. Grouping and Aggregating Data using GROUP BY
2. Using criteria in a GROUP BY query to restrict results
3. Special considerations for aggregation statistics (this blog)
4. Varying Grouping Results: Rollup, Cube and Compute

This blog is part of our much longer full SQL tutorial (although we also run training courses in SQL, for up to 6 people at a time).

Posted by Andy Brown on 14 December 2012

You need a minimum screen resolution of about 700 pixels width to see our blogs. This is because they contain diagrams and tables which would not be viewable easily on a mobile phone or small laptop. Please use a larger tablet, notebook or desktop computer, or change your screen resolution settings.

# Special considerations for aggregation statistics

This part of the tutorial on grouping in SQL looks at various special cases of aggregation, beginning with counting rows.

## Ways to count rows

You can count rows in 3 different ways:

Method What it will return
COUNT(*) The number of rows
COUNT(ColumnName) Number of rows having non-null column values.
COUNT(DISTINCT ColumnName) The number of unique values for a column.

Here is an example of each type of syntax:

SELECT COUNT(*)

AS 'Number films in total' FROM tblFilm

SELECT COUNT(FilmCertificateId)

AS 'Number films with certificate id' FROM tblFilm

SELECT COUNT(DISTINCT FilmCertificateId)

AS 'Number unique certificate ids' FROM tblFilm

Here's what this returns for the Wise Owl movies database:

There are 261 films in the database, but 101 of these have a null value for the certificate id. There are only 6 distinct certificate values: U, PG, 12, 12A, 15 and 18.

## Converting data types when averaging

Averaging integer fields can be a pain!  Suppose that you want to find out the average number of Oscars won for our database.  You might think this would do it:

SELECT

avg(FilmOscarWins) AS 'Average Oscars won'

FROM

tblFilm

However, it doesn't.  Because the Oscars column is an integer, the data type returned is also an integer - and approximates to zero:

The query returns 0, even though the true answer must be more than this.

The solution is to convert the column being averaged into a non-integer data type before you process it:

SELECT

avg(CAST(FilmOscarWins AS float)) AS 'Average Oscars won'

FROM

tblFilm

However, it's a good idea to then display the result to a smaller number of decimal places:

SELECT

CAST(avg(CAST(FilmOscarWins AS float)) AS decimal(5,2))

AS 'Average Oscars won'

FROM

tblFilm

The before-and-answer results for the above two queries are as follows:

 Before rounding Rounded to 2 decimal places

## Dealing with nulls

When you sum a set of values, any nulls will be ignored.  For example, here are the 3 films in our database whose names end with A:

One of the films (Kagemusha) has a null value for its budget.

The following query would return the value 73,000,000, the sum of 60,000,000 and 13,000,000:

-- total budget for films ending with "A"

SELECT

SUM(FilmBudgetDollars)

FROM

tblFilm

WHERE

FilmName like '%a'

If you look at the results, there's no sign that you may have missed out values:

The total budget returned by this query.

However, the Messages tab tells the full story:

The waning as to what has happened.

Note that Micrsooft Access would return a null value from this query - SQL behaves differently.

When averaging values (some of which may be null), your best bet is to strip out nulls first (see a separate blog for more on ways to deal with nulls).  Consider this query:

SELECT

-- average budget including Kagemusha

AVG(IsNull(FilmBudgetDollars,0)) AS 'Average budget 1',

-- average budget excluding Kagemusha

AVG(FilmBudgetDollars) AS 'Average budget 2'

FROM

tblFilm

WHERE

FilmName like '%a'

This would return two different numbers:

Which is correct? It depends whether you feel that the film Kagemusha should count towards the statistics.

Next: how to use subqueries in grouping queries.