Use the GROUP BY clause in SQL for statistics
Part three of a four-part series of blogs

If you want to average, count, sum or perform any other statistic on values in SQL, you need to learn the GROUP BY and HAVING parts of a SQL statement. That's where this online tutorial comes in!

  1. Grouping and Aggregating Data using GROUP BY
  2. Using criteria in a GROUP BY query to restrict results
  3. Special considerations for aggregation statistics (this blog)
  4. Varying Grouping Results: Rollup, Cube and Compute

This blog is part of our much longer full SQL tutorial (although we also run training courses in SQL, for up to 6 people at a time).

Posted by Andy Brown on 14 December 2012

You need a minimum screen resolution of about 700 pixels width to see our blogs. This is because they contain diagrams and tables which would not be viewable easily on a mobile phone or small laptop. Please use a larger tablet, notebook or desktop computer, or change your screen resolution settings.

Special considerations for aggregation statistics

This part of the tutorial on grouping in SQL looks at various special cases of aggregation, beginning with counting rows.

Ways to count rows

You can count rows in 3 different ways:

Method What it will return
COUNT(*) The number of rows
COUNT(ColumnName) Number of rows having non-null column values.
COUNT(DISTINCT ColumnName) The number of unique values for a column.

Here is an example of each type of syntax:


AS 'Number films in total' FROM tblFilm


SELECT COUNT(FilmCertificateId)

AS 'Number films with certificate id' FROM tblFilm



AS 'Number unique certificate ids' FROM tblFilm

Here's what this returns for the Wise Owl movies database:

Three query results

There are 261 films in the database, but 101 of these have a null value for the certificate id. There are only 6 distinct certificate values: U, PG, 12, 12A, 15 and 18.


Converting data types when averaging

Averaging integer fields can be a pain!  Suppose that you want to find out the average number of Oscars won for our database.  You might think this would do it:


avg(FilmOscarWins) AS 'Average Oscars won'



However, it doesn't.  Because the Oscars column is an integer, the data type returned is also an integer - and approximates to zero:

Average oscars won

The query returns 0, even though the true answer must be more than this.


The solution is to convert the column being averaged into a non-integer data type before you process it:


avg(CAST(FilmOscarWins AS float)) AS 'Average Oscars won'



However, it's a good idea to then display the result to a smaller number of decimal places:


CAST(avg(CAST(FilmOscarWins AS float)) AS decimal(5,2))

AS 'Average Oscars won'



The before-and-answer results for the above two queries are as follows:

Average Oscars without rounding Average Oscars rounded
Before rounding Rounded to 2 decimal places


Dealing with nulls

When you sum a set of values, any nulls will be ignored.  For example, here are the 3 films in our database whose names end with A:

Films ending in A

One of the films (Kagemusha) has a null value for its budget.


The following query would return the value 73,000,000, the sum of 60,000,000 and 13,000,000:

-- total budget for films ending with "A"






FilmName like '%a'

If you look at the results, there's no sign that you may have missed out values:

Results of summing budgets

The total budget returned by this query.


However, the Messages tab tells the full story:

Null value is eliminated by aggregate

The waning as to what has happened.

Note that Micrsooft Access would return a null value from this query - SQL behaves differently.

When averaging values (some of which may be null), your best bet is to strip out nulls first (see a separate blog for more on ways to deal with nulls).  Consider this query:


-- average budget including Kagemusha

AVG(IsNull(FilmBudgetDollars,0)) AS 'Average budget 1',


-- average budget excluding Kagemusha

AVG(FilmBudgetDollars) AS 'Average budget 2'





FilmName like '%a'

This would return two different numbers:

Two different average budget values

Which is correct? It depends whether you feel that the film Kagemusha should count towards the statistics.


Next: how to use subqueries in grouping queries.

This blog has 0 threads Add post