Pandas: 6 Different ways to iterate over rows in a Dataframe & Update while iterating row by row

In this tutorial, we will review & make you understand six different techniques to iterate over rows. Later we will also explain how to update the contents of a Dataframe while iterating over it row by row.

Iterate over rows of a dataframe using DataFrame.iterrows()
Iterate over rows of a dataframe using DataFrame.itertuples()
Named Tuples without index
Named Tuples with custom names
Iterate over rows in dataframe as Dictionary
Iterate over rows in dataframe using index position and iloc
Iterate over rows in dataframe in reverse using index position and iloc
Iterate over rows in dataframe using index labels and loc[]
Update contents a dataframe While iterating row by row

Let’s first create a dataframe which we will use in our example,

import pandas as pd
empoyees = [('Shikha', 34, 'Mumbai', 5) ,
           ('Rekha', 31, 'Delhi' , 7) ,
           ('Shishir', 16, 'Punjab', 11)
            ]
# Create a DataFrame object
empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c'])
print(empDfObj)

Output:

Name  Age    City        Experience
a  Shikha 34     Mumbai  5
b Rekha   31     Delhi      7
c Shishir  16     Punjab   11

Iterate over rows of a dataframe using DataFrame.iterrows()

Dataframe class implements a member function iterrows() i.e. DataFrame.iterrows(). Now, we will use this function to iterate over rows of a dataframe.

DataFrame.iterrows()

DataFrame.iterrows() returns an iterator that iterator iterate over all the rows of a dataframe.

For each row, it returns a tuple containing the index label and row contents as series.

Let’s use it in an example,

import pandas as pd
empoyees = [('Shikha', 34, 'Mumbai', 5) ,
           ('Rekha', 31, 'Delhi' , 7) ,
           ('Shishir', 16, 'Punjab', 11)
            ]
# Create a DataFrame object
empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c'])

for (index_label, row_series) in empDfObj.iterrows():
   print('Row Index label : ', index_label)
   print('Row Content as Series : ', row_series.values)

Output:

Row Index label : a
Row Content as Series : ['Shikha' 34 'Mumbai' 5]
Row Index label : b
Row Content as Series : ['Rekha' 31 'Delhi' 7]
Row Index label : c
Row Content as Series : ['Shishir' 16 'Punjab' 11]

Note:

Do Not Preserve the data types as iterrows() returns each row contents as series however it doesn’t preserve datatypes of values in the rows.
We can not able to do any modification while iterating over the rows by iterrows(). If we do some changes to it then our original dataframe would not be affected.

Iterate over rows of a dataframe using DataFrame.itertuples()

DataFrame.itertuples()

DataFrame.itertuples() yields a named tuple for each row containing all the column names and their value for that row.

Let’s use it,

import pandas as pd
empoyees = [('Shikha', 34, 'Mumbai', 5) ,
           ('Rekha', 31, 'Delhi' , 7) ,
           ('Shishir', 16, 'Punjab', 11)
            ]
# Create a DataFrame object
empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c'])

# Iterate over the Dataframe rows as named tuples
for namedTuple in empDfObj.itertuples():
   #Print row contents inside the named tuple
   print(namedTuple)

Output:

Pandas(Index='a', Name='Shikha', Age=34, City='Mumbai', Experience=5)
Pandas(Index='b', Name='Rekha', Age=31, City='Delhi', Experience=7)
Pandas(Index='c', Name='Shishir', Age=16, City='Punjab', Experience=11)

So we can see that for every row it returned a named tuple. we can access the individual value by indexing..like,

For the first value,

namedTuple[0]

For the second value,

namedTuple[1]

Do Read:

Named Tuples without index

If we pass argument ‘index=False’ then it only shows the named tuple not the index column.

import pandas as pd
empoyees = [('Shikha', 34, 'Mumbai', 5) ,
           ('Rekha', 31, 'Delhi' , 7) ,
           ('Shishir', 16, 'Punjab', 11)
            ]
# Create a DataFrame object
empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c'])

# Iterate over the Dataframe rows as named tuples without index
for namedTuple in empDfObj.itertuples(index=False):
   # Print row contents inside the named tuple
   print(namedTuple)

Output:

Pandas(Name='Shikha', Age=34, City='Mumbai', Experience=5)
Pandas(Name='Rekha', Age=31, City='Delhi', Experience=7)
Pandas(Name='Shishir', Age=16, City='Punjab', Experience=11)

Named Tuples with custom names

If we don’t want to show Pandas name every time, we can pass custom names too:

import pandas as pd
empoyees = [('Shikha', 34, 'Mumbai', 5) ,
           ('Rekha', 31, 'Delhi' , 7) ,
           ('Shishir', 16, 'Punjab', 11)
            ]
# Create a DataFrame object
empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c'])

# Give Custom Name to the tuple while Iterating over the Dataframe rows
for row in empDfObj.itertuples(name='Employee'):
   # Print row contents inside the named tuple
   print(row)

Output:

Employee(Index='a', Name='Shikha', Age=34, City='Mumbai', Experience=5)
Employee(Index='b', Name='Rekha', Age=31, City='Delhi', Experience=7)
Employee(Index='c', Name='Shishir', Age=16, City='Punjab', Experience=11)

Iterate over rows in dataframe as Dictionary

Using this method we can iterate over the rows of the dataframe and convert them to the dictionary for accessing by column label using the same itertuples().

import pandas as pd
employees = [('Shikha', 34, 'Mumbai', 5) ,
           ('Rekha', 31, 'Delhi' , 7) ,
           ('Shishir', 16, 'Punjab', 11)
            ]
# Create a DataFrame object
empDfObj = pd.DataFrame(employees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c'])

# itertuples() yields an iterate to named tuple
for row in empDfObj.itertuples(name='Employee'):
   # Convert named tuple to dictionary
   dictRow = row._asdict()
   # Print dictionary
   print(dictRow)
   # Access elements from dict i.e. row contents
   print(dictRow['Name'] , ' is from ' , dictRow['City'])

Output:

{'Index': 'a', 'Name': 'Shikha', 'Age': 34, 'City': 'Mumbai', 'Experience': 5}
Shikha is from Mumbai
{'Index': 'b', 'Name': 'Rekha', 'Age': 31, 'City': 'Delhi', 'Experience': 7}
Rekha is from Delhi
{'Index': 'c', 'Name': 'Shishir', 'Age': 16, 'City': 'Punjab', 'Experience': 11}
Shishir is from Punjab

Iterate over rows in dataframe using index position and iloc

We will loop through the 0th index to the last row and access each row by index position using iloc[].

import pandas as pd
employees = [('Shikha', 34, 'Mumbai', 5) ,
           ('Rekha', 31, 'Delhi' , 7) ,
           ('Shishir', 16, 'Punjab', 11)
            ]
# Create a DataFrame object
empDfObj = pd.DataFrame(employees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c'])

# Loop through rows of dataframe by index i.e. from 0 to number of rows
for i in range(0, empDfObj.shape[0]):
   # get row contents as series using iloc{] and index position of row
   rowSeries = empDfObj.iloc[i]
   # print row contents
   print(rowSeries.values)

Output:

['Shikha' 34 'Mumbai' 5]
['Rekha' 31 'Delhi' 7]
['Shishir' 16 'Punjab' 11]

Iterate over rows in dataframe in reverse using index position and iloc

Using this we will loop through the last index to the 0th index and access each row by index position using iloc[].

import pandas as pd
employees = [('Shikha', 34, 'Mumbai', 5) ,
           ('Rekha', 31, 'Delhi' , 7) ,
           ('Shishir', 16, 'Punjab', 11)
            ]
# Create a DataFrame object
empDfObj = pd.DataFrame(employees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c'])

# Loop through rows of dataframe by index in reverse i.e. from last row to row at 0th index.
for i in range(empDfObj.shape[0] - 1, -1, -1):
   # get row contents as series using iloc{] and index position of row
   rowSeries = empDfObj.iloc[i]
   # print row contents
   print(rowSeries.values)

Output:

['Shishir' 16 'Punjab' 11]
['Rekha' 31 'Delhi' 7]
['Shikha' 34 'Mumbai' 5]

Iterate over rows in dataframe using index labels and loc[]

import pandas as pd
employees = [('Shikha', 34, 'Mumbai', 5) ,
           ('Rekha', 31, 'Delhi' , 7) ,
           ('Shishir', 16, 'Punjab', 11)
            ]
# Create a DataFrame object
empDfObj = pd.DataFrame(employees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c'])

# loop through all the names in index label sequence of dataframe
for index in empDfObj.index:
   # For each index label, access the row contents as series
   rowSeries = empDfObj.loc[index]
   # print row contents
   print(rowSeries.values)

Output:

['Shikha' 34 'Mumbai' 5]
['Rekha' 31 'Delhi' 7]
['Shishir' 16 'Punjab' 11]

Update contents a dataframe While iterating row by row

As Dataframe.iterrows() returns a copy of the dataframe contents in a tuple, so updating it will have no effect on the actual dataframe. So, to update the contents of the dataframe we need to iterate over the rows of the dataframe using iterrows() and then access each row using at() to update its contents.

Let’s see an example,

Suppose we have a dataframe i.e

import pandas as pd


# List of Tuples
salaries = [(11, 5, 70000, 1000) ,
           (12, 7, 72200, 1100) ,
           (13, 11, 84999, 1000)
           ]
# Create a DataFrame object
salaryDfObj = pd.DataFrame(salaries, columns=['ID', 'Experience' , 'Salary', 'Bonus'])

Output:

   ID Experience Salary Bonus
0 11    5             70000 1000
1 12    7             72200 1100
2 13   11            84999 1000

Now we will update each value in column ‘Bonus’ by multiplying it with 2 while iterating over the dataframe row by row.

import pandas as pd


# List of Tuples
salaries = [(11, 5, 70000, 1000) ,
           (12, 7, 72200, 1100) ,
           (13, 11, 84999, 1000)
           ]
# iterate over the dataframe row by row
salaryDfObj = pd.DataFrame(salaries, columns=['ID', 'Experience' , 'Salary', 'Bonus'])
for index_label, row_series in salaryDfObj.iterrows():
   # For each row update the 'Bonus' value to it's double
   salaryDfObj.at[index_label , 'Bonus'] = row_series['Bonus'] * 2
print(salaryDfObj)

Output:

    ID    Experience Salary Bonus
0 11          5           70000 2000
1 12          7           72200 2200
2 13        11           84999 2000

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas

Conclusion:

So in this article, you have seen different ways to iterate over rows in a dataframe & update while iterating row by row. Keep following our BtechGeeks for more concepts of python and various programming languages too.