How to use the LAG and LEAD functions included in SQL Server 2012
SQL Server 2012 introduces new functions to find the previous or subsequent value for a row, without using a complicated self-join. This blog explains how to use LAG and LEAD, including partitioning row sets.

Posted by Andy Brown on 20 September 2013

You need a minimum screen resolution of about 700 pixels width to see our blogs. This is because they contain diagrams and tables which would not be viewable easily on a mobile phone or small laptop. Please use a larger tablet, notebook or desktop computer, or change your screen resolution settings.

The LAG and LEAD functions in SQL

The new (new for SQL Server 2012, that is) LAG and LEAD functions allow you to refer to previous or following rows.  This blog shows how to use these functions!

If you're stuck on older versions of SQL Server, don't despair - I've included a fudge to mimic the LAG function at the end of this blog.

Loading the example data

Suppose that you have a table containing the first 9 X Factor series in the UK:

First X Factor series

The first 9 winners of the X Factor competition in the UK.

You want to list out all of the mentors whose proteges won the competition in consecutive years, to get an answer like this:

List of double-mentors

What the answer should show: series 6 was the only one where the same mentor "won" as in the preceding year.

 

A quick glance at the table above would show you that the answer is that Cheryl Cole is the only person who has achieved this feat, guiding Alexandra Burke and then Joe McElderry to win in consecutive years.  However, this is missing the opportunity to learn some SQL!

To get the examples in this blog to work, run the following script in an existing database to create and populate a table called tblSeries:

-- create table of X Factor series

CREATE TABLE tblSeries(

Series int NULL,

Winner nvarchar(255) NULL,

StartDate date NULL,

EndDate date NULL,

Mentor varchar(50) NULL

)

GO

-- add series winners for the first 9 series

INSERT dbo.tblSeries (Series, Winner, StartDate, EndDate, Mentor)

VALUES (1, N'Steve Brookstein', CAST(0xB32A0B00 AS Date),

CAST(0x152B0B00 AS Date), N'Simon Cowell')

INSERT dbo.tblSeries (Series, Winner, StartDate, EndDate, Mentor)

VALUES (2, N'Shayne Ward', CAST(0x112C0B00 AS Date),

CAST(0x882C0B00 AS Date), N'Louis Walsh')

INSERT dbo.tblSeries (Series, Winner, StartDate, EndDate, Mentor)

VALUES (4, N'Leon Jackson', CAST(0xE92E0B00 AS Date),

CAST(0x602F0B00 AS Date), N'Dannii Minogue')

INSERT dbo.tblSeries (Series, Winner, StartDate, EndDate, Mentor)

VALUES (6, N'Joe McElderry', CAST(0xC8310B00 AS Date),

CAST(0x39320B00 AS Date), N'Cheryl Cole')

INSERT dbo.tblSeries (Series, Winner, StartDate, EndDate, Mentor)

VALUES (7, N'Matt Cardle', CAST(0x34330B00 AS Date),

CAST(0xA5330B00 AS Date), N'Dannii Minogue')

INSERT dbo.tblSeries (Series, Winner, StartDate, EndDate, Mentor)

VALUES (9, N'James Arthur', CAST(0x0C360B00 AS Date),

CAST(0x7D360B00 AS Date), N'Nicole Scherzinger')

INSERT dbo.tblSeries (Series, Winner, StartDate, EndDate, Mentor)

VALUES (3, N'Leona Lewis', CAST(0x7D2D0B00 AS Date),

CAST(0xF42D0B00 AS Date), N'Simon Cowell')

INSERT dbo.tblSeries (Series, Winner, StartDate, EndDate, Mentor)

VALUES (5, N'Alexandra Burke', CAST(0x55300B00 AS Date),

CAST(0xCC300B00 AS Date), N'Cheryl Cole')

INSERT dbo.tblSeries (Series, Winner, StartDate, EndDate, Mentor)

VALUES (8, N'Little Mix (Rhythmix)', CAST(0xA0340B00 AS Date),

CAST(0x11350B00 AS Date), N'Tulisa Contostavlos')

How the LAG and LEAD functions work

The LAG and LEAD functions work in the same way: they look backward or forward n rows.  Here's the syntax of the LAG function, for example:

LAG(Column to show, number periods , what to show if no value)

We want to show the previous winner for each series, so we'll use:

-- get winner of previous series

LAG(Mentor,1,'N/A')

In this code, the arguments to the LAG function are:

Argument Value What it means Default value if omitted
1 Mentor The column in question You can't omit this argument.
2 1 How many rows to go back 1
3 N/A What to show when there are no preceding rows. Null

However, we also need to say what preceding rows actually means.  In our case, we mean for each series we'll take the one which occurred immediately before it in the order in which the series were made.  We do this with an OVER clause, so that the full expression will be:

-- get winner of previous series

LAG(Mentor,1,'N/A') OVER (ORDER BY Series)

That is, show the previous series' mentor for each current series.  Putting this within a full query would give:

SELECT

Series,

Winner,

Mentor,

-- get winner of previous series

LAG(Mentor,1,'N/A') OVER (ORDER BY Series) AS PreviousMentor

FROM

tblseries

ORDER BY

Series

Note that you don't have to LAG over the same column that you ORDER over (although it may be confusing if you don't).

This query would give the following results when run:

X Factor series with lag

The X Factor series in order, showing the previous year's mentor for each.

 

Notice that the previous mentor for the first series shows as N/A, since that's what we've specified should be the default value.

A neat way to show the final answer to our question would be to use a simple common table expression:

WITH Winners AS (

SELECT

Series,

Winner,

Mentor,

 

-- get winner of previous series

LAG(Mentor,1,'N/A') OVER (ORDER BY Series) AS PreviousMentor

FROM

tblseries

)

 

-- based on this set of data, show only the finalists where the

-- mentor is the same as for the previous row

SELECT

Series,

Winner,

Mentor

FROM

Winners

WHERE

Mentor = PreviousMentor

Partitioning

In the previous example, the lagging or leading is done over the entire set of rows returned by the query.  However, you can also partition this set of rows to produce effects like this:

Showing previous winner for each mentor

Here we're showing for each winner the mentor's previous winner (in most cases there won't be one, because it's the first time that particular mentor has won the X Factor competition).

To accomplish this, you can extend the syntax of the LEAD or LAG functions to include a PARTITION BY clause:

LAG(Column,Number periods,Default value) OVER
PARTITION BY Column
ORDER BY Column)

Here's the SQL to show our results above:

SELECT

Series,

Winner,

Mentor,

-- show alongside each winner the previous winner for

-- this mentor

LAG(Winner,1,'First winner')

OVER (PARTITION BY Mentor ORDER BY Series)

AS 'Previous winner for this mentor'

FROM

tblseries

So for each winner we're showing the previous winner, where previous is defined as "the one for the series before the current one for the same mentor".

This has obvious advantages for time series data, allowing you to show (for example) the sales for the same period in the previous year.

Previous versions of SQL Server

If you're reading this jealously because your company doesn't have SQL Server 2012, don't despair!  You can achieve the same effect as for the first example above using the ROW_NUMBER function and a self join:

WITH SeriesWithPrevious AS (

SELECT

Series,

Winner,

Mentor,

-- current row number

ROW_NUMBER() OVER (ORDER BY Series) AS RowNo

FROM

tblSeries

)

SELECT

ThisRow.RowNo AS 'This row no',

PrevRow.RowNo AS 'Prev row no',

ThisRow.Series,

ThisRow.Winner,

ThisRow.Mentor,

IsNull(PrevRow.Mentor,'N/A') AS 'Previous series mentor'

FROM

SeriesWithPrevious AS ThisRow

LEFT OUTER JOIN SeriesWithPrevious AS PrevRow

ON ThisRow.RowNo = (PrevRow.RowNo+1)

It's just much harder!  The query above would return the following results:

Results using ROW_NUMBER

The query shows the current and previous row number for each row.

Time to get SQL Server 2012, perhaps?

This blog has 0 threads Add post