This page has 0 threads | Add post

Grouping and Aggregating using GROUP BY

Written by Andy Brown

In this tutorial

What does aggregating data involve?
What aggregation functions exist?
Syntax of GROUP BY
Grouping by multiple columns
Grouping by expressions
Grouping by no columns at all

This part of our SQL training tutorial looks at the GROUP BY clause.

What does aggregating data involve?

Aggregating data involves taking a set of rows and summarising it. For example, you might take a set of films (movies, to those in the US) ...

A list of films without any aggregation.

... and show the total number of Oscars (wins and nominations) by country:

Aggregated data - total Oscars by country

The aggregated data, showing the total number of Oscar wins and nominations for each film-making country in our database.

One way to think of grouping by a column is that you start with a bucket for each country's data, and then add, count, average or perform some other statistic on all of the films in each country's bucket. Then again, this may not help you!

The above query would group the data by one column (the country). If you don't group it by any columns, you'll get a single row:

The total figure for all films.

What aggregation functions exist?

Here are the possible functions that you can use:

Function	What it does
SUM	Returns the total value of a column or field.
AVG	Returns the average value of a column or field.
MIN	Returns the smallest number or date.
MAX	Returns the largest number or date.
COUNT	Returns the number of records.
COUNT_BIG	Returns the number of records as a number of data type bigint.
STDEV	Returns the standard deviation.
STDEP	Returns the standard deviation for the population.
VARP	Returns the variance for the population.

Syntax of GROUP BY

The basic syntax of the GROUP BY clause is this (where XXX is any of the aggregation functions listed above, such as SUM, COUNT, etc):

SELECT

-- zero, one or more group by columns

FirstColumnGroupingBy,

SecondColumnGroupingBy,

...

-- one or more aggregated columns

XXX(FirstColumnAggregating),

XXX(SecondColumnAggregating),

...

FROM

TableName

GROUP BY

FirstColumnGroupingBy,

SecondColumnGroupingBy,

...

This doesn't mean much on its own, so let's look at a practical example: the query which shows the total number of Oscars by country at the top of this page. Here it is:

SELECT

c.CountryName,

SUM(f.FilmOscarWins) AS 'Wins',

SUM(f.FilmOscarNominations) AS 'Nominations'

FROM

tblFilm AS f

INNER JOIN tblCountry AS c

ON f.FilmCountryId=c.CountryId

GROUP BY

c.CountryName

In a grouping query, the rule is:

Every non-aggregated column in your query must be repeated in the GROUP BY clause.

What does this mean? In our example, we have 3 columns of output:

The three columns include two aggregated fields plus the column that we're grouping by (the country name).

The following query would not work, and would generate an error:

SELECT

c.CountryName,

SUM(f.FilmOscarWins) AS 'Wins',

SUM(f.FilmOscarNominations) AS 'Nominations'

FROM

tblFilm AS f

INNER JOIN tblCountry AS c

ON f.FilmCountryId=c.CountryId

This is because we're trying to show the country alongside a couple of summed fields, and we haven't said that we want to group by this column.

Grouping by multiple columns

You can group by as many columns as you like. This query shows the number of films in our database for each director and studio (so we're grouping by two columns):

SELECT

d.DirectorName,

s.StudioName,

COUNT(f.FilmId) AS 'Number films'

FROM

tblFilm AS f

INNER JOIN tblDirector AS d

ON f.FilmDirectorId=d.DirectorId

INNER JOIN tblStudio AS s

ON f.FilmStudioId=s.StudioId

GROUP BY

d.DirectorName,

s.StudioName

ORDER BY

'Number films' DESC

Here are the first few rows produced by this query:

What does this show? That Richard Donner is the most loyal person, directing no less than 5 fllms for Warner Brothers?

Grouping by expressions

You can group by any calculated expression, as this example shows:

SELECT

YEAR(FilmReleaseDate) AS 'Release year',

COUNT(FilmId) AS 'Number films'

FROM

tblFilm

GROUP BY

YEAR(FilmReleaseDate)

This would show the number of films released in a given year:

The query shows how many films there are for each year.

Grouping by no columns at all

If you don't group by any columnns, you get a single summary statistics row (as mentioned above):

SELECT

SUM(FilmOscarWins) AS 'Total wins',

SUM(FilmOscarNominations) AS 'Total nominations'

FROM

tblFilm AS f

This query doesn't have any grouping, so produces one row:

if you don't group, you only get one row of data in the results set!

Now that I've covered the basics of grouping, let's look at how you can apply criteria to a grouped query using WHERE and HAVING.

Previous
tutorial

Joins

Previous
part

Top of page

Next
part

Using criteria in a GROUP BY query to restrict results

Next
tutorial

Views

You can learn more about this topic on the following Wise Owl courses:

This page has 0 threads

Add a new post

Online tutorials

Tutorials on SQL

Grouping data

Grouping and Aggregating using GROUP BY

What does aggregating data involve?

What aggregation functions exist?

Syntax of GROUP BY

Grouping by multiple columns

Grouping by expressions

Grouping by no columns at all

Head office

London

Manchester