Pandas : Get unique values in columns of a Dataframe in Python

How to get unique values in columns of a Dataframe in Python ?

To find the Unique values in a Dataframe we can use-

  1. series.unique(self)- Returns a numpy array of Unique values
  2. series.nunique(self, axis=0, dropna=True )- Returns the count of Unique values along different axis.(If axis = 0 i.e. default value, it checks along the columns.If axis = 1, it checks along the rows)

To test these functions let’s use the following data-

     Name      Age       City          Experience

a     jack       34.0     Sydney             5
b     Riti        31.0      Delhi               7
c     Aadi      16.0       NaN               11
d    Mohit    31.0       Delhi               7
e    Veena    NaN      Delhi               4
f   Shaunak  35.0     Mumbai           5
g    Shaun    35.0    Colombo          11

Finding unique values in a single column :

To get the unique value(here age) we use the unique( ) function on the column

CODE:-

#Program :

import numpy as np
import pandas as pd
# Data list
emp = [('jack', 34, 'Sydney', 5) ,
         ('Riti', 31, 'Delhi' , 7) ,
         ('Aadi', 16, np.NaN, 11) ,
         ('Mohit', 31,'Delhi' , 7) ,
         ('Veena', np.NaN, 'Delhi' , 4) ,
         ('Shaunak', 35, 'Mumbai', 5 ),
         ('Shaun', 35, 'Colombo', 11)
          ]
# Object of Dataframe class created
empObj = pd.DataFrame(emp, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
# Obtain the unique values in column 'Age' of the dataframe
uValues = empObj['Age'].unique()
# empObj[‘Age’] returns a series object of the column ‘Age’
print('The unique values in column "Age" are ')
print(uValues)
Output :
The unique values in column "Age" are
[34. 31. 16. nan 35.]

Counting unique values in a single column :

If we want to calculate the number of Unique values rather than the unique values, we can use the .nunique( ) function.

CODE:-

#Program :

import numpy as np
import pandas as pd
# Data list
emp = [('jack', 34, 'Sydney', 5) ,
('Riti', 31, 'Delhi' , 7) ,
('Aadi', 16, np.NaN, 11) ,
('Mohit', 31,'Delhi' , 7) ,
('Veena', np.NaN, 'Delhi' , 4) ,
('Shaunak', 35, 'Mumbai', 5 ),
('Shaun', 35, 'Colombo', 11)
]
# Object of Dataframe class created
empObj = pd.DataFrame(emp, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
# Counting the  unique values in column 'Age' of the dataframe
uValues = empObj['Age'].nunique()
print('Number of unique values in 'Age' column :')
print(uValues)
Output :
Number of unique values in 'Age' column :
4

Including NaN while counting the Unique values in a column :

NaN’s are not counted by default in the .nunique( ) function. To also include NaN we have to pass the dropna argument

CODE:-

#Program :

import numpy as np
import pandas as pd
# Data list
emp = [('jack', 34, 'Sydney', 5) ,
('Riti', 31, 'Delhi' , 7) ,
('Aadi', 16, np.NaN, 11) ,
('Mohit', 31,'Delhi' , 7) ,
('Veena', np.NaN, 'Delhi' , 4) ,
('Shaunak', 35, 'Mumbai', 5 ),
('Shaun', 35, 'Colombo', 11)
]
# Object of Dataframe class created
empObj = pd.DataFrame(emp, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
# Counting the unique values in column 'Age' also including NaN
uValues = empObj['Age'].nunique(dropna=False)
print('Number of unique values in 'Age' column including NaN:)
print(uValues)
Output :
Number of unique values in 'Age' column including NaN:
5

Counting unique values in each column of the dataframe :

To count the number of Unique values in each columns

CODE:-

#Program :

import numpy as np
import pandas as pd
# Data list
emp = [('jack', 34, 'Sydney', 5) ,
('Riti', 31, 'Delhi' , 7) ,
('Aadi', 16, np.NaN, 11) ,
('Mohit', 31,'Delhi' , 7) ,
('Veena', np.NaN, 'Delhi' , 4) ,
('Shaunak', 35, 'Mumbai', 5 ),
('Shaun', 35, 'Colombo', 11)
]
# Object of Dataframe class created
empObj = pd.DataFrame(emp, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
# Counting the unique values in each column
uValues = empObj.nunique()
print('In each column the number of unique values are')
print(uValues)
Output :
In each column the number of unique values are
Name          7
Age           4
City          4
Experience    4
dtype: int64

To include the NaN, just pass dropna into the function.

Get Unique values in multiple columns :

To get unique values in multiple columns, we have to pass all the contents of columns as a series object into the .unique( ) function

CODE:-

#program :

import numpy as np
import pandas as pd
# Data list
emp = [('jack', 34, 'Sydney', 5) ,
('Riti', 31, 'Delhi' , 7) ,
('Aadi', 16, np.NaN, 11) ,
('Mohit', 31,'Delhi' , 7) ,
('Veena', np.NaN, 'Delhi' , 4) ,
('Shaunak', 35, 'Mumbai', 5 ),
('Shaun', 35, 'Colombo', 11)
]
# Object of Dataframe class created
empObj = pd.DataFrame(emp, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
# Obtain the Unique values in multiple columns i.e. Name & Age
uValues = (empObj['Name'].append(empObj['Age'])).unique()
print('The unique values in column "Name" & "Age" :')
print(uValues)
Output :
The unique values in column "Name" & "Age" :
['jack' 'Riti' 'Aadi' 'Mohit' 'Veena' 'Shaunak' 'Shaun' 34.0 31.0 16.0 nan
35.0]

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas – Select items from a Dataframe