In this tutorial, we will review & make you understand six different techniques to iterate over rows. Later we will also explain how to update the contents of a Dataframe while iterating over it row by row.
- Iterate over rows of a dataframe using DataFrame.iterrows()
- Iterate over rows of a dataframe using DataFrame.itertuples()
- Named Tuples without index
- Named Tuples with custom names
- Iterate over rows in dataframe as Dictionary
- Iterate over rows in dataframe using index position and iloc
- Iterate over rows in dataframe in reverse using index position and iloc
- Iterate over rows in dataframe using index labels and loc[]
- Update contents a dataframe While iterating row by row
Let’s first create a dataframe which we will use in our example,
import pandas as pd empoyees = [('Shikha', 34, 'Mumbai', 5) , ('Rekha', 31, 'Delhi' , 7) , ('Shishir', 16, 'Punjab', 11) ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c']) print(empDfObj)
Output:
Name Age  City    Experience a Shikha 34   Mumbai 5 b Rekha  31   Delhi   7 c Shishir 16   Punjab  11
Iterate over rows of a dataframe using DataFrame.iterrows()
Dataframe class implements a member function iterrows() i.e. DataFrame.iterrows(). Now, we will use this function to iterate over rows of a dataframe.
DataFrame.iterrows()
DataFrame.iterrows() returns an iterator that iterator iterate over all the rows of a dataframe.
For each row, it returns a tuple containing the index label and row contents as series.
Let’s use it in an example,
import pandas as pd empoyees = [('Shikha', 34, 'Mumbai', 5) , ('Rekha', 31, 'Delhi' , 7) , ('Shishir', 16, 'Punjab', 11) ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c']) for (index_label, row_series) in empDfObj.iterrows(): print('Row Index label : ', index_label) print('Row Content as Series : ', row_series.values)
Output:
Row Index label : a Row Content as Series : ['Shikha' 34 'Mumbai' 5] Row Index label : b Row Content as Series : ['Rekha' 31 'Delhi' 7] Row Index label : c Row Content as Series : ['Shishir' 16 'Punjab' 11]
Note:
- Do Not Preserve the data types as iterrows() returns each row contents as series however it doesn’t preserve datatypes of values in the rows.
- We can not able to do any modification while iterating over the rows by iterrows(). If we do some changes to it then our original dataframe would not be affected.
Iterate over rows of a dataframe using DataFrame.itertuples()
DataFrame.itertuples()
DataFrame.itertuples() yields a named tuple for each row containing all the column names and their value for that row.
Let’s use it,
import pandas as pd empoyees = [('Shikha', 34, 'Mumbai', 5) , ('Rekha', 31, 'Delhi' , 7) , ('Shishir', 16, 'Punjab', 11) ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c']) # Iterate over the Dataframe rows as named tuples for namedTuple in empDfObj.itertuples(): #Print row contents inside the named tuple print(namedTuple)
Output:
Pandas(Index='a', Name='Shikha', Age=34, City='Mumbai', Experience=5) Pandas(Index='b', Name='Rekha', Age=31, City='Delhi', Experience=7) Pandas(Index='c', Name='Shishir', Age=16, City='Punjab', Experience=11)
So we can see that for every row it returned a named tuple. we can access the individual value by indexing..like,
For the first value,
namedTuple[0]
For the second value,
namedTuple[1]
Do Read:
- Python Pandas: Select Rows in DataFrame by conditions on multiple columns
- Pandas: count rows in a dataframe | all or those only that satisfy a condition
Named Tuples without index
If we pass argument ‘index=False’ then it only shows the named tuple not the index column.
import pandas as pd empoyees = [('Shikha', 34, 'Mumbai', 5) , ('Rekha', 31, 'Delhi' , 7) , ('Shishir', 16, 'Punjab', 11) ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c']) # Iterate over the Dataframe rows as named tuples without index for namedTuple in empDfObj.itertuples(index=False): # Print row contents inside the named tuple print(namedTuple)
Output:
Pandas(Name='Shikha', Age=34, City='Mumbai', Experience=5) Pandas(Name='Rekha', Age=31, City='Delhi', Experience=7) Pandas(Name='Shishir', Age=16, City='Punjab', Experience=11)
Named Tuples with custom names
If we don’t want to show Pandas name every time, we can pass custom names too:
import pandas as pd empoyees = [('Shikha', 34, 'Mumbai', 5) , ('Rekha', 31, 'Delhi' , 7) , ('Shishir', 16, 'Punjab', 11) ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c']) # Give Custom Name to the tuple while Iterating over the Dataframe rows for row in empDfObj.itertuples(name='Employee'): # Print row contents inside the named tuple print(row)
Output:
Employee(Index='a', Name='Shikha', Age=34, City='Mumbai', Experience=5) Employee(Index='b', Name='Rekha', Age=31, City='Delhi', Experience=7) Employee(Index='c', Name='Shishir', Age=16, City='Punjab', Experience=11)
Iterate over rows in dataframe as Dictionary
Using this method we can iterate over the rows of the dataframe and convert them to the dictionary for accessing by column label using the same itertuples().
import pandas as pd employees = [('Shikha', 34, 'Mumbai', 5) , ('Rekha', 31, 'Delhi' , 7) , ('Shishir', 16, 'Punjab', 11) ] # Create a DataFrame object empDfObj = pd.DataFrame(employees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c']) # itertuples() yields an iterate to named tuple for row in empDfObj.itertuples(name='Employee'): # Convert named tuple to dictionary dictRow = row._asdict() # Print dictionary print(dictRow) # Access elements from dict i.e. row contents print(dictRow['Name'] , ' is from ' , dictRow['City'])
Output:
{'Index': 'a', 'Name': 'Shikha', 'Age': 34, 'City': 'Mumbai', 'Experience': 5} Shikha is from Mumbai {'Index': 'b', 'Name': 'Rekha', 'Age': 31, 'City': 'Delhi', 'Experience': 7} Rekha is from Delhi {'Index': 'c', 'Name': 'Shishir', 'Age': 16, 'City': 'Punjab', 'Experience': 11} Shishir is from Punjab
Iterate over rows in dataframe using index position and iloc
We will loop through the 0th index to the last row and access each row by index position using iloc[].
import pandas as pd employees = [('Shikha', 34, 'Mumbai', 5) , ('Rekha', 31, 'Delhi' , 7) , ('Shishir', 16, 'Punjab', 11) ] # Create a DataFrame object empDfObj = pd.DataFrame(employees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c']) # Loop through rows of dataframe by index i.e. from 0 to number of rows for i in range(0, empDfObj.shape[0]): # get row contents as series using iloc{] and index position of row rowSeries = empDfObj.iloc[i] # print row contents print(rowSeries.values)
Output:
['Shikha' 34 'Mumbai' 5] ['Rekha' 31 'Delhi' 7] ['Shishir' 16 'Punjab' 11]
Iterate over rows in dataframe in reverse using index position and iloc
Using this we will loop through the last index to the 0th index and access each row by index position using iloc[].
import pandas as pd employees = [('Shikha', 34, 'Mumbai', 5) , ('Rekha', 31, 'Delhi' , 7) , ('Shishir', 16, 'Punjab', 11) ] # Create a DataFrame object empDfObj = pd.DataFrame(employees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c']) # Loop through rows of dataframe by index in reverse i.e. from last row to row at 0th index. for i in range(empDfObj.shape[0] - 1, -1, -1): # get row contents as series using iloc{] and index position of row rowSeries = empDfObj.iloc[i] # print row contents print(rowSeries.values)
Output:
['Shishir' 16 'Punjab' 11] ['Rekha' 31 'Delhi' 7] ['Shikha' 34 'Mumbai' 5]
Iterate over rows in dataframe using index labels and loc[]
import pandas as pd employees = [('Shikha', 34, 'Mumbai', 5) , ('Rekha', 31, 'Delhi' , 7) , ('Shishir', 16, 'Punjab', 11) ] # Create a DataFrame object empDfObj = pd.DataFrame(employees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c']) # loop through all the names in index label sequence of dataframe for index in empDfObj.index: # For each index label, access the row contents as series rowSeries = empDfObj.loc[index] # print row contents print(rowSeries.values)
Output:
['Shikha' 34 'Mumbai' 5] ['Rekha' 31 'Delhi' 7] ['Shishir' 16 'Punjab' 11]
Update contents a dataframe While iterating row by row
As Dataframe.iterrows() returns a copy of the dataframe contents in a tuple, so updating it will have no effect on the actual dataframe. So, to update the contents of the dataframe we need to iterate over the rows of the dataframe using iterrows() and then access each row using at() to update its contents.
Let’s see an example,
Suppose we have a dataframe i.e
import pandas as pd # List of Tuples salaries = [(11, 5, 70000, 1000) , (12, 7, 72200, 1100) , (13, 11, 84999, 1000) ] # Create a DataFrame object salaryDfObj = pd.DataFrame(salaries, columns=['ID', 'Experience' , 'Salary', 'Bonus'])
Output:
ID Experience Salary Bonus 0 11 5 70000 1000 1 12 7 72200 1100 2 13 11 84999 1000
Now we will update each value in column ‘Bonus’ by multiplying it with 2 while iterating over the dataframe row by row.
import pandas as pd # List of Tuples salaries = [(11, 5, 70000, 1000) , (12, 7, 72200, 1100) , (13, 11, 84999, 1000) ] # iterate over the dataframe row by row salaryDfObj = pd.DataFrame(salaries, columns=['ID', 'Experience' , 'Salary', 'Bonus']) for index_label, row_series in salaryDfObj.iterrows(): # For each row update the 'Bonus' value to it's double salaryDfObj.at[index_label , 'Bonus'] = row_series['Bonus'] * 2 print(salaryDfObj)
Output:
ID Experience Salary Bonus 0 11 5 70000 2000 1 12 7 72200 2200 2 13 11 84999 2000
Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.
Read more Articles on Python Data Analysis Using Padas
- How to merge Dataframes using Dataframe.merge() in Python?
- How to merge Dataframes on specific columns or on index in Python?
- How to merge Dataframes by index using Dataframe.merge()?
- Count NaN or missing values in DataFrame
- Count rows in a dataframe | all or those only that satisfy a condition
- Loop or Iterate over all or certain columns of a DataFrame
- How to display full Dataframe i.e. print all rows & columns without truncation
Conclusion:
So in this article, you have seen different ways to iterate over rows in a dataframe & update while iterating row by row. Keep following our BtechGeeks for more concepts of python and various programming languages too.