Read our blogs, tips and tutorials
Try our exercises or test your skills
Watch our tutorial videos or shorts
Take a self-paced course
Read our recent newsletters
License our courseware
Book expert consultancy
Buy our publications
Get help in using our site
560 attributed reviews in the last 3 years
Refreshingly small course sizes
Outstandingly good courseware
Whizzy online classrooms
Wise Owl trainers only (no freelancers)
Almost no cancellations
We have genuine integrity
We invoice after training
Review 30+ years of Wise Owl
View our top 100 clients
Search our website
We also send out useful tips in a monthly email newsletter ...
Looping over rows and columns for a Pandas dataframe |
---|
This blog shows you various ways in which you can loop over the columns of a pandas dataframe, and also explains how to loop over the rows of a dataframe (together with why you should avoid doing this!). |
Following a question from a course (thanks Andy namesake) I thought I'd summarise how you can loop over the rows and columns of a pandas dataframe in Python.
This blog loads the following Excel workbook into a dataframe, then reports on it:
You can download this file containing 10 films here.
The first thing to do is to import pandas and load the data above into a dataframe:
import pandas as pd
# import a list of films
df_films = pd.read_excel(
r"C:\wherever\Movies.xlsx",
"Sheet1"
)
You can loop over all of the columns in a dataframe using this beautifully Pythonic construct:
# looping over columns
for col in df_films:
print(col)
Here's what this would give for our example:
You can easily adapt this to get at the columns in the dataframe (using the column name as a key to the dictionary returns the column as a series):
# looping over columns
print("\nFirst 3 rows, all columns\n")
for col in df_films:
# the column name gives access to the column itself
# (show first 3 rows only for each column using slicing)
print(df_films[col][:3])
This would return each column in turn:
You can use the columns property to slice the columns, picking the ones you want to return:
# this gives the same results, but allows slicing
# - showing columns 1 and 2 only
print("\nFirst 3 rows, columns 1 and 2 only\n")
for col in df_films.columns[1:3]:
# slice the first 3 rows only
print(df_films[col][0:3])
The full output from this program would be this:
The first 3 rows for each of the second and third columns.
You can use iterrows to loop over the rows of a dataframe (although see the notes at the bottom of this page for why you might not want to do this):
# loop over first 3 rows
print("\nFirst 3 rows\n")
for index, row in df_films[0:3].iterrows():
# each row is returned as a pandas series
print(row)
Each row is - weirdly - returned as a pandas series (so think of this as like transposing each row into a column). Here's what the above program would return:
Because each row is returned as a series, you can use each column's key name to pick out its value - as in this example:
# loop over first 3 rows
print("\nFirst 3 rows\n")
for index, row in df_films[0:3].iterrows():
# each row is returned as a pandas series
print(row["Title"])
This would give just the film titles:
Here are 3 reasons why you might choose not to loop over the rows of a dataframe:
Reason | Notes |
---|---|
It's against dataframe ethos | Just as in SQL you should avoid using cursors to go through a table one row at a time, because the whole ethos of the language is that you should work with blocks of rows, so in pandas you should avoid working with a single row at a time, because most syntax is set up to work with columns of data. |
It's slow | The practical upshot of the above reason is that looping over the rows of a dataframe will run much more slowly than referencing whole columns at a time. |
You can't (or shouldn't) make changes | When you're looping over any collection of things when programming, attempting to change them while you're iterating over them is usually a recipe for disaster. |
There's a passionate argument against looping over dataframe rows at this well-read StackOverflow page (you may need to scroll down a bit to see it) and a warning against it on the Python website.
Some other pages relevant to the above blog include:
Kingsmoor House
Railway Street
GLOSSOP
SK13 2AA
Landmark Offices
99 Bishopsgate
LONDON
EC2M 3XD
Holiday Inn
25 Aytoun Street
MANCHESTER
M1 3AE
© Wise Owl Business Solutions Ltd 2024. All Rights Reserved.