SQL subqueries (including correlated subqueries)
Part four of a four-part series of blogs

If you’ve been writing SQL queries for a while you’re probably fairly confident with writing single SELECT statements to return a set of records. If you’re ready for a bit more of a challenge, this blog will teach you how to nest one SELECT statement inside another in order to create a subquery. If you’re thinking “that sounds simple enough” then you might want to read to the end of this series, where we’ll attempt to melt your brain by explaining correlated subqueries!

  1. Creating a Simple Subquery
  2. Subqueries and Aggregate Functions
  3. Using ALL, ANY, SOME and IN with Subqueries
  4. Correlated Subqueries (this blog)

This article is part of our complete SQL Tutorial, which is an alternative to booking classroom-based SQL courses with Wise Owl.

Posted by Andrew Gould on 28 January 2013

You need a minimum screen resolution of about 700 pixels width to see our blogs. This is because they contain diagrams and tables which would not be viewable easily on a mobile phone or small laptop. Please use a larger tablet, notebook or desktop computer, or change your screen resolution settings.

Correlated Subqueries

All of the subqueries we have looked at so far in this blog would have worked perfectly as independent queries - if you executed one of our subqueries by itself it would still return a set of results.  Correlated subqueries don't work in the same way because they depend on the outer query in order to be processed.

Creating a Correlated Subquery

The easiest way to explain a correlated subquery is with an example.  Let's say that we'd like to show the details of the most expensive film made in each year.  We can start by writing a basic query which will show details of the film with the highest budget in the entire table:

SELECT

YEAR(FilmReleaseDate) AS [Year]

,FilmName

,FilmBudgetDollars

FROM

tblFilm

WHERE

FilmBudgetDollars =

(SELECT

MAX(FilmBudgetDollars)

FROM

tblFilm)

The next step is to give the tables in both the outer and inner queries an alias:

SELECT

YEAR(FilmReleaseDate) AS [Year]

,FilmName

,FilmBudgetDollars

FROM

tblFilm AS OuterFilm

WHERE

FilmBudgetDollars =

(SELECT

MAX(FilmBudgetDollars)

FROM

tblFilm AS InnerFilm)

Finally, we can add a WHERE clause to the subquery which compares the year of the film in the outer query with the year of the film in the inner query:

SELECT

YEAR(FilmReleaseDate) AS [Year]

,FilmName

,FilmBudgetDollars

FROM

tblFilm AS OuterFilm

WHERE

FilmBudgetDollars =

(SELECT

MAX(FilmBudgetDollars)

FROM

tblFilm AS InnerFilm

WHERE

YEAR(OuterFilm.FilmReleaseDate) =

YEAR(InnerFilm.FilmReleaseDate))

Hopefully this makes it clear why the table aliases are necessary!  Both the inner and outer query refer to the film table and the inner query must refer to the outer query in order to return the correct result.  When the query is executed the budget of each film is checked to see if it is the highest budget of all films made in the same year and, if so, that film is returned to the result set.  The results of the query can be seen below:

Most expensive film in each year

The results should show one film for each year in the database. I've added an ORDER BY clause to tidy up the results.

 

More Examples of Correlated Subqueries

Once you understand the principle it's relatively simple to compare any value in an outer query with one in the inner query.  The example below looks for the longest film released by each studio:

SELECT

s.StudioName

,f1.FilmName

,f1.FilmRunTimeMinutes

FROM

tblFilm AS f1

JOIN tblStudio AS s ON s.StudioID=f1.FilmStudioID

WHERE

f1.FilmRunTimeMinutes =

(SELECT

MAX(f2.FilmRunTimeMinutes)

FROM

tblFilm AS f2

WHERE

f1.FilmStudioID=f2.FilmStudioID)

ORDER BY

s.StudioName

The result of this query is shown below:

Longest film by studio

We end up with one result for each studio: the longest film that the studio released.

The query below compares each actor to find the oldest person whose name begins the same letter:

SELECT

LEFT(a1.ActorName,1) AS [Initial]

,a1.ActorName

,a1.ActorDOB

FROM

tblActor AS a1

WHERE

a1.ActorDOB =

(SELECT

MIN(a2.ActorDOB)

FROM

tblActor AS a2

WHERE

LEFT(a1.ActorName,1) = LEFT(a2.ActorName,1))

ORDER BY

[Initial]

The result of the query is shown below:

Oldest actor by initial

The result of the query shows us the oldest actor whose name begins with each letter of the alphabet.

 

The common feature of all of these examples is that a field in the outer query is compared against a field in the inner query in order to return a result.  Now it's up to you to find more examples!

Correlated subqueries are famous for running slowly for large tables.  This is because in effect they have to run a separate subquery for every record in the main table.

Correlated Subqueries using ALL and ANY

You can also combine correlated subqueries with other keywords such as ALL and ANY.  In the example below a film will appear in the results if its director is older than ALL of the actors in that film:

SELECT

f1.FilmName

,d.DirectorName

FROM

tblFilm AS f1

JOIN tblDirector AS d ON d.DirectorID=f1.FilmDirectorID

WHERE

d.DirectorDOB < ALL

(

SELECT

a.ActorDOB

FROM

tblActor AS a

JOIN tblCast AS c ON a.ActorID=c.CastActorID

JOIN tblFilm AS f2 ON f2.FilmID=c.CastFilmID

WHERE

f1.FilmID=f2.FilmID

)

And, of course, we could look for films whose director is older than ANY of the actors in that film:

SELECT

f1.FilmName

,d.DirectorName

FROM

tblFilm AS f1

JOIN tblDirector AS d ON d.DirectorID=f1.FilmDirectorID

WHERE

d.DirectorDOB < ANY

(

SELECT

a.ActorDOB

FROM

tblActor AS a

JOIN tblCast AS c ON a.ActorID=c.CastActorID

JOIN tblFilm AS f2 ON f2.FilmID=c.CastFilmID

WHERE

f1.FilmID=f2.FilmID

)

It's a little more difficult to check the results of more complex queries like these, but the best way to gain confidence that your queries are working is to practise using them!

This blog has 0 threads Add post