BLOGS BY TOPIC▼
BLOGS BY AUTHOR▼
BLOGS BY YEAR▼
If you want to average, count, sum or perform any other statistic on values in SQL, you need to learn the GROUP BY and HAVING parts of a SQL statement. That's where this online tutorial comes in!
- Grouping and Aggregating Data using GROUP BY (this blog)
- Using criteria in a GROUP BY query to restrict results
- Special considerations for aggregation statistics
- Varying Grouping Results: Rollup, Cube and Compute
This blog is part of our much longer full SQL tutorial (although we also run training courses in SQL, for up to 6 people at a time).
Posted by Andy Brown on 14 December 2012
You need a minimum screen resolution of about 700 pixels width to see our blogs. This is because they contain diagrams and tables which would not be viewable easily on a mobile phone or small laptop. Please use a larger tablet, notebook or desktop computer, or change your screen resolution settings.
Grouping and Aggregating Data using GROUP BY
This part of our SQL training series of blogs looks at the GROUP BY clause.
What does aggregating data involve?
Aggregating data involves taking a set of rows and summarising it. For example, you might take a set of films (movies, to those in the US) ...

A list of films without any aggregation.
... and show the total number of Oscars (wins and nominations) by country:

The aggregated data, showing the total number of Oscar wins and nominations for each film-making country in our database.
One way to think of grouping by a column is that you start with a bucket for each country's data, and then add, count, average or perform some other statistic on all of the films in each country's bucket. Then again, this may not help you!
The above query would group the data by one column (the country). If you don't group it by any columns, you'll get a single row:

The total figure for all films.
What aggregation functions exist?
Here are the possible functions that you can use:
Function | What it does |
---|---|
SUM | Returns the total value of a column or field. |
AVG | Returns the average value of a column or field. |
MIN | Returns the smallest number or date. |
MAX | Returns the largest number or date. |
COUNT | Returns the number of records. |
COUNT_BIG | Returns the number of records as a number of data type bigint. |
STDEV | Returns the standard deviation. |
STDEP | Returns the standard deviation for the population. |
VARP | Returns the variance for the population. |
Syntax of GROUP BY
The basic syntax of the GROUP BY clause is this (where XXX is any of the aggregation functions listed above, such as SUM, COUNT, etc):
SELECT
-- zero, one or more group by columns
FirstColumnGroupingBy,
SecondColumnGroupingBy,
...
-- one or more aggregated columns
XXX(FirstColumnAggregating),
XXX(SecondColumnAggregating),
...
FROM
TableName
GROUP BY
FirstColumnGroupingBy,
SecondColumnGroupingBy,
...
This doesn't mean much on its own, so let's look at a practical example: the query which shows the total number of Oscars by country at the top of this page. Here it is:
SELECT
c.CountryName,
SUM(f.FilmOscarWins) AS 'Wins',
SUM(f.FilmOscarNominations) AS 'Nominations'
FROM
tblFilm AS f
INNER JOIN tblCountry AS c
ON f.FilmCountryId=c.CountryId
GROUP BY
c.CountryName
In a grouping query, the rule is:
What does this mean? In our example, we have 3 columns of output:

The three columns include two aggregated fields plus the column that we're grouping by (the country name).
The following query would not work, and would generate an error:
SELECT
c.CountryName,
SUM(f.FilmOscarWins) AS 'Wins',
SUM(f.FilmOscarNominations) AS 'Nominations'
FROM
tblFilm AS f
INNER JOIN tblCountry AS c
ON f.FilmCountryId=c.CountryId
This is because we're trying to show the country alongside a couple of summed fields, and we haven't said that we want to group by this column.
Grouping by multiple columns
You can group by as many columns as you like. This query shows the number of films in our database for each director and studio (so we're grouping by two columns):
SELECT
d.DirectorName,
s.StudioName,
COUNT(f.FilmId) AS 'Number films'
FROM
tblFilm AS f
INNER JOIN tblDirector AS d
ON f.FilmDirectorId=d.DirectorId
INNER JOIN tblStudio AS s
ON f.FilmStudioId=s.StudioId
GROUP BY
d.DirectorName,
s.StudioName
ORDER BY
'Number films' DESC
Here are the first few rows produced by this query:

What does this show? That Richard Donner is the most loyal person, directing no less than 5 fllms for Warner Brothers?
Grouping by expressions
You can group by any calculated expression, as this example shows:
SELECT
YEAR(FilmReleaseDate) AS 'Release year',
COUNT(FilmId) AS 'Number films'
FROM
tblFilm
GROUP BY
YEAR(FilmReleaseDate)
This would show the number of films released in a given year:

The query shows how many films there are for each year.
Grouping by no columns at all
If you don't group by any columnns, you get a single summary statistics row (as mentioned above):
SELECT
SUM(FilmOscarWins) AS 'Total wins',
SUM(FilmOscarNominations) AS 'Total nominations'
FROM
tblFilm AS f
This query doesn't have any grouping, so produces one row:

if you don't group, you only get one row of data in the results set!
Now that I've covered the basics of grouping, let's look at how you can apply criteria to a grouped query using WHERE and HAVING (then we'll have a look at how to deal with some special cases).
- Grouping and Aggregating Data using GROUP BY (this blog)
- Using criteria in a GROUP BY query to restrict results
- Special considerations for aggregation statistics
- Varying Grouping Results: Rollup, Cube and Compute