WiseOwl Training - Established 1992 Wise Owl Training

Established May 1992
30 years in business
Wise Owl Training
30 years in business
See 525 reviews for our classroom and online training
Looping over rows and columns for a Pandas dataframe
This blog shows you various ways in which you can loop over the columns of a pandas dataframe, and also explains how to loop over the rows of a dataframe (together with why you should avoid doing this!).

Posted by Andy Brown on 17 December 2021

You need a minimum screen resolution of about 700 pixels width to see our blogs. This is because they contain diagrams and tables which would not be viewable easily on a mobile phone or small laptop. Please use a larger tablet, notebook or desktop computer, or change your screen resolution settings.

Looping over rows and columns for a Pandas dataframe

Following a question from a course (thanks Andy namesake) I thought I'd summarise how you can loop over the rows and columns of a pandas dataframe in Python.

The example for this blog

This blog loads the following Excel workbook into a dataframe, then reports on it:

Excel movies workbook

You can download this file containing 10 films here.

The first thing to do is to import pandas and load the data above into a dataframe:

import pandas as pd

# import a list of films

df_films = pd.read_excel(

r"C:\wherever\Movies.xlsx",

"Sheet1"

)

Looping over columns

You can loop over all of the columns in a dataframe using this beautifully Pythonic construct:

# looping over columns

for col in df_films:

print(col)

Here's what this would give for our example:

Output from looping over columns
 

You can easily adapt this to get at the columns in the dataframe (using the column name as a key to the dictionary returns the column as a series):

# looping over columns

print("\nFirst 3 rows, all columns\n")

for col in df_films:

# the column name gives access to the column itself

# (show first 3 rows only for each column using slicing)

print(df_films[col][:3])

This would return each column in turn:

The columns in turn
 

You can use the columns property to slice the columns, picking the ones you want to return:

# this gives the same results, but allows slicing

# - showing columns 1 and 2 only

print("\nFirst 3 rows, columns 1 and 2 only\n")

for col in df_films.columns[1:3]:

# slice the first 3 rows only

print(df_films[col][0:3])

The full output from this program would be this:

Rows and columns

The first 3 rows for each of the second and third columns.

Looping over rows

You can use iterrows to loop over the rows of a dataframe (although see the notes at the bottom of this page for why you might not want to do this):

# loop over first 3 rows

print("\nFirst 3 rows\n")

for index, row in df_films[0:3].iterrows():

# each row is returned as a pandas series

print(row)

Each row is - weirdly - returned as a pandas series (so think of this as like transposing each row into a column).  Here's what the above program would return:

The first 3 rows
 

Because each row is returned as a series, you can use each column's key name to pick out its value - as in this example:

# loop over first 3 rows

print("\nFirst 3 rows\n")

for index, row in df_films[0:3].iterrows():

# each row is returned as a pandas series

print(row["Title"])

This would give just the film titles:

The film titles
 

Why you might not want to loop over rows

Here are 3 reasons why you might choose not to loop over the rows of a dataframe:

Reason Notes
It's against dataframe ethos Just as in SQL you should avoid using cursors to go through a table one row at a time, because the whole ethos of the language is that you should work with blocks of rows, so in pandas you should avoid working with a single row at a time, because most syntax is set up to work with columns of data.
It's slow The practical upshot of the above reason is that looping over the rows of a dataframe will run much more slowly than referencing whole columns at a time.
You can't (or shouldn't) make changes When you're looping over any collection of things when programming, attempting to change them while you're iterating over them is usually a recipe for disaster.

There's a passionate argument against looping over dataframe rows at this well-read StackOverflow page (you may need to scroll down a bit to see it) and a warning against it on the Python website.

This blog has 0 threads Add post