Pandas: Sum rows in Dataframe ( all or certain rows)

Sum rows in Dataframe ( all or certain rows) in Python

In this article we will discuss how we can merge rows into a dataframe and add values as a new queue to the same dataframe.

So, let’s start exploring the topic.

First, we will build a Dataframe,

import pandas as pd
import numpy as np
# The List of Tuples
salary_of_employees = [('Amit', 2000, 2050, 1099, 2134, 2111),
                    ('Rabi', 2122, 3022, 3456, 3111, 2109),
                    ('Abhi', np.NaN, 2334, 2077, np.NaN, 3122),
                    ('Naresh', 3050, 3050, 2010, 2122, 1111),
                    ('Suman', 2023, 2232, 3050, 2123, 1099),
                    ('Viroj', 2050, 2510, np.NaN, 3012, 2122),
                    ('Nabin', 4000, 2000, 2050, np.NaN, 2111)]
# By Creating a DataFrame object from list of tuples
test = pd.DataFrame(salary_of_employees,
                  columns=['Name',  'Jan', 'Feb', 'March', 'April', 'May'])
# To Set column Name as the index of dataframe
test.set_index('Name', inplace=True)
print(test)

Output :
             Jan           Feb        March    April         May
Name 
Amit     2000.0     2050       1099.0    2134.0     2111
Rabi      2122.0    3022       3456.0     3111.0    2109
Abhi      NaN       2334       2077.0     NaN       3122
Naresh  3050.0     3050      2010.0     2122.0    1111
Suman  2023.0     2232      3050.0     2123.0    1099
Viroj     2050.0     2510       NaN        3012.0    2122
Nabin   4000.0    2000       2050.0     NaN        2111

This Dataframe contains employee salaries from January to May. We’ve created a column name as a data name index. Each line of this dataframe contains the employee’s salary from January to May.

Get the sum of all rows in a Pandas Dataframe :

Let’s say in the above dataframe, we want to get details about the total salary paid each month. Basically, we want a Series that contains the total number of rows and columns eg. each item in the Series should contain a total column value.

Let’s see how we can find that series,

import pandas as pd
import numpy as np
# The List of Tuples
salary_of_employees = [('Amit', 2000, 2050, 1099, 2134, 2111),
                    ('Rabi', 2122, 3022, 3456, 3111, 2109),
                    ('Abhi', np.NaN, 2334, 2077, np.NaN, 3122),
                    ('Naresh', 3050, 3050, 2010, 2122, 1111),
                    ('Suman', 2023, 2232, 3050, 2123, 1099),
                    ('Viroj', 2050, 2510, np.NaN, 3012, 2122),
                    ('Nabin', 4000, 2000, 2050, np.NaN, 2111)]
# By Creating a DataFrame object from list of tuples
test = pd.DataFrame(salary_of_employees,
                  columns=['Name',  'Jan', 'Feb', 'March', 'April', 'May'])
# To Set column Name as the index of dataframe
test.set_index('Name', inplace=True)


#By getting sum of all rows in the Dataframe as a Series
total = test.sum()
print('Total salary paid in each month:')
print(total)

Output :
Total salary paid in each month:
Jan 15245.0
Feb 17198.0
March 13742.0
April 12502.0
May 13785.0
dtype: float64

We have called the sum() function in dataframe without parameter. So, it automatically considered the axis as 0 and added all the columns wisely i.e. added all values to each column and returned a string item containing those values. Each item in this series item contains the total amount paid in monthly installments and the name of the month in the index label for that entry.

We can add this Series as a new line to the dataframe i.e.

import pandas as pd
import numpy as np
# The List of Tuples
salary_of_employees = [('Amit', 2000, 2050, 1099, 2134, 2111),
                    ('Rabi', 2122, 3022, 3456, 3111, 2109),
                    ('Abhi', np.NaN, 2334, 2077, np.NaN, 3122),
                    ('Naresh', 3050, 3050, 2010, 2122, 1111),
                    ('Suman', 2023, 2232, 3050, 2123, 1099),
                    ('Viroj', 2050, 2510, np.NaN, 3012, 2122),
                    ('Nabin', 4000, 2000, 2050, np.NaN, 2111)]
# By Creating a DataFrame object from list of tuples
test = pd.DataFrame(salary_of_employees,
                  columns=['Name',  'Jan', 'Feb', 'March', 'April', 'May'])
# To Set column Name as the index of dataframe
test.set_index('Name', inplace=True)




# By getting sum of all rows as a new row in Dataframe
total = test.sum()
total.name = 'Total'
# By assignimg sum of all rows of DataFrame as a new Row
test = test.append(total.transpose())
print(test)

Output :
                 Jan        Feb         March         April          May
Name 
Amit        2000.0   2050.0    1099.0        2134.0     2111.0
Rabi        2122.0    3022.0    3456.0       3111.0     2109.0
Abhi       NaN       2334.0     2077.0       NaN        3122.0
Naresh   3050.0    3050.0     2010.0       2122.0    1111.0
Suman    2023.0   2232.0     3050.0       2123.0    1099.0
Viroj      2050.0     2510.0     NaN          3012.0     2122.0
Nabin    4000.0    2000.0     2050.0       NaN         2111.0
Total     15245.0   17198.0   13742.0    12502.0    13785.0

Added a new line to the dataframe and ‘Total’ reference label. Each entry in this line contains the amount of details paid per month.

How did it work?

We have passed the Series to create a Dataframe in one line. All references in the series became columns in the new dataframe. Then add this new data name to the original dataframe. The result was that I added a new line to the dataframe.

Get Sum of certain rows in Dataframe by row numbers :

In the previous example we added all the rows of data but what if we want to get a total of only a few rows of data? As with the data above we want the total value in the top 3 lines eg to get the total monthly salary for only 3 employees from the top,

import pandas as pd
import numpy as np
# The List of Tuples
salary_of_employees = [('Amit', 2000, 2050, 1099, 2134, 2111),
                    ('Rabi', 2122, 3022, 3456, 3111, 2109),
                    ('Abhi', np.NaN, 2334, 2077, np.NaN, 3122),
                    ('Naresh', 3050, 3050, 2010, 2122, 1111),
                    ('Suman', 2023, 2232, 3050, 2123, 1099),
                    ('Viroj', 2050, 2510, np.NaN, 3012, 2122),
                    ('Nabin', 4000, 2000, 2050, np.NaN, 2111)]
# By Creating a DataFrame object from list of tuples
test = pd.DataFrame(salary_of_employees,
                  columns=['Name',  'Jan', 'Feb', 'March', 'April', 'May'])
# To Set column Name as the index of dataframe
test.set_index('Name', inplace=True)


#By getting sum of values of top 3 DataFrame rows,
sumtabOf = test.iloc[0:3].sum()
print(sumtabOf)

Output :
Jan      4122.0
Feb      7406.0
March    6632.0
April    5245.0
May      7342.0
dtype: float64

We selected the first 3 lines of the data file and called the total () for that. Returns a series containing the total monthly salary paid to selected employees only which means for the first three lines of the actual data list.

Get the sum of specific rows in Pandas Dataframe by index/row label :

Unlike the previous example, we can select specific lines with the reference label and find the value of values in those selected lines only i.e.

import pandas as pd
import numpy as np
# The List of Tuples
salary_of_employees = [('Amit', 2000, 2050, 1099, 2134, 2111),
                    ('Rabi', 2122, 3022, 3456, 3111, 2109),
                    ('Abhi', np.NaN, 2334, 2077, np.NaN, 3122),
                    ('Naresh', 3050, 3050, 2010, 2122, 1111),
                    ('Suman', 2023, 2232, 3050, 2123, 1099),
                    ('Viroj', 2050, 2510, np.NaN, 3012, 2122),
                    ('Nabin', 4000, 2000, 2050, np.NaN, 2111)]
# By Creating a DataFrame object from list of tuples
test = pd.DataFrame(salary_of_employees,
                  columns=['Name',  'Jan', 'Feb', 'March', 'April', 'May'])
# To Set column Name as the index of dataframe
test.set_index('Name', inplace=True)


# By getting sum of 3 DataFrame rows (selected by index labels)
sumtabOf = test.loc[['Amit', 'Naresh', 'Viroj']].sum()
print(sumtabOf)

Output :
Jan      7100.0
Feb      7610.0
March    3109.0
April    7268.0
May      5344.0
dtype: float64

We have selected 3 lines of data name with the reference label namely ‘Amit’, ‘Naresh’ and ‘Viroj’. We then added the queue values for these selected employees only. Return a series with the total amount of salary paid per month to those selected employees per month only wisely.

Conclusion:

So in the above cases we found out that to sum the multiple rows given in a dataframe.