Browse 537 attributed reviews, viewable separately for our classroom and online training
Looping over rows and columns for a Pandas dataframe
This blog shows you various ways in which you can loop over the columns of a pandas dataframe, and also explains how to loop over the rows of a dataframe (together with why you should avoid doing this!).

Posted by Andy Brown on 17 December 2021

You need a minimum screen resolution of about 700 pixels width to see our blogs. This is because they contain diagrams and tables which would not be viewable easily on a mobile phone or small laptop. Please use a larger tablet, notebook or desktop computer, or change your screen resolution settings.

Looping over rows and columns for a Pandas dataframe

Following a question from a course (thanks Andy namesake) I thought I'd summarise how you can loop over the rows and columns of a pandas dataframe in Python.

The example for this blog

This blog loads the following Excel workbook into a dataframe, then reports on it:

Excel movies workbook

You can download this file containing 10 films here.

The first thing to do is to import pandas and load the data above into a dataframe:

import pandas as pd

# import a list of films

df_films = pd.read_excel(

r"C:\wherever\Movies.xlsx",

"Sheet1"

)

Looping over columns

You can loop over all of the columns in a dataframe using this beautifully Pythonic construct:

# looping over columns

for col in df_films:

print(col)

Here's what this would give for our example:

Output from looping over columns
 

You can easily adapt this to get at the columns in the dataframe (using the column name as a key to the dictionary returns the column as a series):

# looping over columns

print("\nFirst 3 rows, all columns\n")

for col in df_films:

# the column name gives access to the column itself

# (show first 3 rows only for each column using slicing)

print(df_films[col][:3])

This would return each column in turn:

The columns in turn
 

You can use the columns property to slice the columns, picking the ones you want to return:

# this gives the same results, but allows slicing

# - showing columns 1 and 2 only

print("\nFirst 3 rows, columns 1 and 2 only\n")

for col in df_films.columns[1:3]:

# slice the first 3 rows only

print(df_films[col][0:3])

The full output from this program would be this:

Rows and columns

The first 3 rows for each of the second and third columns.

Looping over rows

You can use iterrows to loop over the rows of a dataframe (although see the notes at the bottom of this page for why you might not want to do this):

# loop over first 3 rows

print("\nFirst 3 rows\n")

for index, row in df_films[0:3].iterrows():

# each row is returned as a pandas series

print(row)

Each row is - weirdly - returned as a pandas series (so think of this as like transposing each row into a column).  Here's what the above program would return:

The first 3 rows
 

Because each row is returned as a series, you can use each column's key name to pick out its value - as in this example:

# loop over first 3 rows

print("\nFirst 3 rows\n")

for index, row in df_films[0:3].iterrows():

# each row is returned as a pandas series

print(row["Title"])

This would give just the film titles:

The film titles
 

Why you might not want to loop over rows

Here are 3 reasons why you might choose not to loop over the rows of a dataframe:

Reason Notes
It's against dataframe ethos Just as in SQL you should avoid using cursors to go through a table one row at a time, because the whole ethos of the language is that you should work with blocks of rows, so in pandas you should avoid working with a single row at a time, because most syntax is set up to work with columns of data.
It's slow The practical upshot of the above reason is that looping over the rows of a dataframe will run much more slowly than referencing whole columns at a time.
You can't (or shouldn't) make changes When you're looping over any collection of things when programming, attempting to change them while you're iterating over them is usually a recipe for disaster.

There's a passionate argument against looping over dataframe rows at this well-read StackOverflow page (you may need to scroll down a bit to see it) and a warning against it on the Python website.

This blog has 0 threads Add post