Python: Count Nan and Missing Values in Dataframe Using Pandas

Method to count Nan and missing value in data frames using pandas

In this article, we will discuss null values in data frames and calculate them in rows, columns, and in total. Let discuss nan or missing values in the dataframe.

NaN or Missing values

The full form of NaN is Not A Number.It is used to represent missing values in the dataframe. Let see this with an example.

students= {'students': ['Raj', 'Rahul', 'Mayank', 'Ajay', 'Amar'],
      'Marks':[90,np.nan,87,np.nan,19]}
df = pd.DataFrame(num, columns=['students','Marks'])
print(df)

Output

       students  Marks
0      Raj   90.0
1    Rahul    NaN
2   Mayank   87.0
3     Ajay    NaN
4     Amar   19.0

Here we see that there are NaN inside the Marks column that is used to represent missing values.

Reason to count Missing values or NaN values in Dataframe

One of the main reasons to count missing values is that missing values in any dataframe affects the accuracy of prediction. If there are more missing values in the dataframe then our prediction or result highly effect. Hence we calculate missing values. If there are the high count of missing values we can drop them else we can leave them as it is in dataframe.

Method to count NaN or missing values

To use count or missing value first we use a function isnull(). This function replaces all NaN value with True and non-NaN values with False which helps us to calculate the count of NaN or missing values. Let see this with the help of an example.

students= {'students': ['Raj', 'Rahul', 'Mayank', 'Ajay', 'Amar'],
      'Marks':[90,np.nan,87,np.nan,19]}
df = pd.DataFrame(num, columns=['students','Marks'])
df.isnull()

Output

  students  Marks
0      False   False
1     False   True
2     False   False
3     False   NaN
4     False   True

We get our dataframe something like this. Now we can easily calculate the count of NaN or missing values in the dataframe.

Count NaN or missing values in columns

With the help of .isnull().sum() method, we can easily calculate the count of NaN or missing values. isnull() method converts NaN values to True and non-NaN values to false and then the sum() method calculates the number of false in respective columns. Let see this with an example.

students= {'students': ['Raj', 'Rahul', 'Mayank', 'Ajay', 'Amar'],
      'Marks':[90,np.nan,87,np.nan,19]}
df = pd.DataFrame(num, columns=['students','Marks'])
df.isnull().sum()

Output

students    0
Marks       2
dtype: int64

As we also see in the dataframe that we have no NaN or missing values in the students column but we have 2 in the Marks column.

Count NaN or missing values in Rows

For this, we can iterate through each row using for loop and then using isnull().sum() method calculates NaN or missing values in all the rows. Let see this with an example.

students= {'students': ['Raj', 'Rahul', 'Mayank', 'Ajay', 'Amar'],
      'Marks':[90,np.nan,87,np.nan,19]}
df = pd.DataFrame(num, columns=['students','Marks'])
for i in range(len(df.index)) :
    print("Nan in row ", i , " : " ,  df.iloc[i].isnull().sum())

Output

Nan in row  0  :  0
Nan in row  1  :  1
Nan in row  2  :  0
Nan in row  3  :  1
Nan in row  4  :  0

Count total NaN or missing values in dataframe

In the above two examples, we see how to calculate missing values or NaN in rows or columns. Now we see how to calculate the total missing value in the dataframe For this we have to simply use isnull().sum().sum() method and we get our desired output. Let see this with help of an example.

students= {'students': ['Raj', 'Rahul', 'Mayank', 'Ajay', 'Amar'],
      'Marks':[90,np.nan,87,np.nan,19]}
df = pd.DataFrame(num, columns=['students','Marks'])
print("Total NaN values: ",df.isnull().sum().sum())

Output

Total NaN values:  2

So these are the methods tp count NaN or missing values in dataframes.

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas