How to get unique values in columns of a Dataframe in Python ?
To find the Unique values in a Dataframe we can use-
- series.unique(self)- Returns a numpy array of Unique values
- series.nunique(self, axis=0, dropna=True )- Returns the count of Unique values along different axis.(If axis = 0 i.e. default value, it checks along the columns.If axis = 1, it checks along the rows)
To test these functions let’s use the following data-
   Name  Age    City Experience a    jack 34.0  Sydney          5 b    Riti 31.0   Delhi          7 c    Aadi 16.0     NaN         11 d   Mohit 31.0   Delhi          7 e   Veena  NaN   Delhi          4 f Shaunak 35.0  Mumbai          5 g   Shaun 35.0 Colombo         11
Finding unique values in a single column :
To get the unique value(here age) we use the unique( )
function on the column
CODE:-
#Program : import numpy as np import pandas as pd # Data list emp = [('jack', 34, 'Sydney', 5) ,         ('Riti', 31, 'Delhi' , 7) ,         ('Aadi', 16, np.NaN, 11) ,         ('Mohit', 31,'Delhi' , 7) ,         ('Veena', np.NaN, 'Delhi' , 4) ,         ('Shaunak', 35, 'Mumbai', 5 ),         ('Shaun', 35, 'Colombo', 11)          ] # Object of Dataframe class created empObj = pd.DataFrame(emp, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g']) # Obtain the unique values in column 'Age' of the dataframe uValues = empObj['Age'].unique() # empObj[‘Age’] returns a series object of the column ‘Age’ print('The unique values in column "Age" are ') print(uValues)
Output : The unique values in column "Age" are [34. 31. 16. nan 35.]
Counting unique values in a single column :
If we want to calculate the number of Unique values rather than the unique values, we can use the .nunique( )
function.
CODE:-
#Program : import numpy as np import pandas as pd # Data list emp = [('jack', 34, 'Sydney', 5) , ('Riti', 31, 'Delhi' , 7) , ('Aadi', 16, np.NaN, 11) , ('Mohit', 31,'Delhi' , 7) , ('Veena', np.NaN, 'Delhi' , 4) , ('Shaunak', 35, 'Mumbai', 5 ), ('Shaun', 35, 'Colombo', 11) ] # Object of Dataframe class created empObj = pd.DataFrame(emp, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g']) # Counting the unique values in column 'Age' of the dataframe uValues = empObj['Age'].nunique() print('Number of unique values in 'Age' column :') print(uValues)
Output : Number of unique values in 'Age' column : 4
Including NaN while counting the Unique values in a column :
NaN’s are not counted by default in the .nunique( )
function. To also include NaN we have to pass the dropna argument
CODE:-
#Program : import numpy as np import pandas as pd # Data list emp = [('jack', 34, 'Sydney', 5) , ('Riti', 31, 'Delhi' , 7) , ('Aadi', 16, np.NaN, 11) , ('Mohit', 31,'Delhi' , 7) , ('Veena', np.NaN, 'Delhi' , 4) , ('Shaunak', 35, 'Mumbai', 5 ), ('Shaun', 35, 'Colombo', 11) ] # Object of Dataframe class created empObj = pd.DataFrame(emp, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g']) # Counting the unique values in column 'Age' also including NaN uValues = empObj['Age'].nunique(dropna=False) print('Number of unique values in 'Age' column including NaN:) print(uValues)
Output : Number of unique values in 'Age' column including NaN: 5
Counting unique values in each column of the dataframe :
To count the number of Unique values in each columns
CODE:-
#Program : import numpy as np import pandas as pd # Data list emp = [('jack', 34, 'Sydney', 5) , ('Riti', 31, 'Delhi' , 7) , ('Aadi', 16, np.NaN, 11) , ('Mohit', 31,'Delhi' , 7) , ('Veena', np.NaN, 'Delhi' , 4) , ('Shaunak', 35, 'Mumbai', 5 ), ('Shaun', 35, 'Colombo', 11) ] # Object of Dataframe class created empObj = pd.DataFrame(emp, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g']) # Counting the unique values in each column uValues = empObj.nunique() print('In each column the number of unique values are') print(uValues)
Output : In each column the number of unique values are Name         7 Age          4 City         4 Experience   4 dtype: int64
To include the NaN, just pass dropna into the function.
Get Unique values in multiple columns :
To get unique values in multiple columns, we have to pass all the contents of columns as a series object into the .unique( )
function
CODE:-
#program : import numpy as np import pandas as pd # Data list emp = [('jack', 34, 'Sydney', 5) , ('Riti', 31, 'Delhi' , 7) , ('Aadi', 16, np.NaN, 11) , ('Mohit', 31,'Delhi' , 7) , ('Veena', np.NaN, 'Delhi' , 4) , ('Shaunak', 35, 'Mumbai', 5 ), ('Shaun', 35, 'Colombo', 11) ] # Object of Dataframe class created empObj = pd.DataFrame(emp, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g']) # Obtain the Unique values in multiple columns i.e. Name & Age uValues = (empObj['Name'].append(empObj['Age'])).unique() print('The unique values in column "Name" & "Age" :') print(uValues)
Output : The unique values in column "Name" & "Age" : ['jack' 'Riti' 'Aadi' 'Mohit' 'Veena' 'Shaunak' 'Shaun' 34.0 31.0 16.0 nan 35.0]
Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.
Read more Articles on Python Data Analysis Using Padas – Select items from a Dataframe
- Select Rows & Columns in a Dataframe using loc & iloc in
- Select Rows in a Dataframe based on conditions
- Get minimum values in rows or columns & their index position in Dataframe
- Select first or last N rows in a Dataframe using head() & tail()
- Get a list of column and row names in a DataFrame
- Get DataFrame contents as a list of rows or columns (list of lists)