In Python, How do you get Unique Values from a Dataframe?

Pandas DataFrames really amazing. DataFrames in Python makes data manipulation very user-friendly.

Pandas allow you to import large datasets and then manipulate them effectively. CSV data can be easily imported into a Pandas DataFrame.

What are Python Dataframes?

Dataframes are two-dimensional labeled data structures with columns of various types.
DataFrames can be used for a wide range of analyses.

Often, the dataset is too large, and it is impossible to examine the entire dataset at once. Instead, we’d like to see the Dataframe’s summary.
We can get the first five rows of the dataset as well as a quick statistical summary of the data. Aside from that, we can gain information about the types of columns in our dataset.

DataFrame is a data structure offered by the Pandas module to cope with large datasets with several dimensions, such as large csv or excel files.

Because we may store a huge volume of data in a data frame, we frequently encounter situations where we need to find the unique data values from a dataset that may contain redundant or repeated values.

This is where the pandas.dataframe.unique() function comes in.

pandas.unique() Function in Python

The pandas.unique() function returns the dataset’s unique values.

It basically employs a hash table-based technique to return the non-redundant values from the set of values existing in the data frame/series data structure.

For Example:

Let dataset values = 5, 6, 7, 5, 2, 6

The output we get by applying unique function = 5, 6, 7,2

We were able to readily find the dataset’s unique values this way.

Syntax:

pandas.unique(data)

When dealing with 1-Dimensional data, the above syntax comes in handy. It symbolizes or represents the unique value among the 1-Dimensional data values (Series data structure).

But what if the data has more than one dimension, such as rows and columns? Yes, we have a solution for it in the syntax below–

Syntax For Multidimensional data:

pandas.dataframe.column-name.unique()

The above syntax allows us to extract unique values from a specific column of a dataset.

It is preferable for the data to be of the categorical type in order for the unique function to produce accurate results. Furthermore, the data is displayed in the order in which it appears in the dataset.

unique() function with Pandas Series

Example

Approach:

  • Import pandas module using the import keyword.
  • Give the list as static input and store it in a variable.
  • Pass the given list as an argument to the pandas.Series() function and store it in another variable.
  • Since the list has only one dimension, we turned it into a series data structure.
  • Pass the above data as an argument to the pandas.unique() function to get all the unique values from the given list(data).
  • Store it in another variable.
  • Print all Unique elements from the given list.
  • The Exit of the Program.

Below is the implementation:

# Import pandas module using the import keyword
import pandas
# Give the list as static input and store it in a variable.
gvn_lst = [5, 6, 7, 5, 2, 6]
# Pass the given list as an argument to the pandas.Series() function and
# store it in another variable.
# Since the list has only one dimension, we turned it to a series data structure.
data_frme = pandas.Series(gvn_lst)
# Pass the above data as an argument to the pandas.unique() function to
# get all the unique values from the given list(data).
# Store it in another variable
uniqval_lst = pandas.unique(data_frme)
# Print all Unique elements from the given list
print("The all Unique elements from the given list = ")
print(uniqval_lst)

Output:

The all Unique elements from the given list = 
[5 6 7 2]
unique() function with Pandas DataFrame

Import the dataset first as shown below:

Importing the Dataset:

Import the dataset into a Pandas Dataframe.

Approach:

  • Import pandas module as pd using the import keyword.
  • Import dataset using read_csv() function by passing the dataset name as an argument to it.
  • Store it in a variable.
  • Print the above dataset if you want to see the dataset(here we just imported).
  • The Exit of the Program.

Below is the implementation:

# Import pandas module as pd using the import keyword
import pandas as pd
# Import dataset using read_csv() function by passing the dataset name as
# an argument to it.
# Store it in a variable.
cereal_dataset = pd.read_csv('cereal.csv')

This will save the dataset in the variable ‘cereal_dataset ‘ as a DataFrame.

pandas.dataframe.nunique() function:

The unique values present in each column of the dataframe are represented by the pandas.dataframe.nunique() function.

Apply nunique() function to the given dataset to get all the unique values present in each column of the dataframe.

cereal_dataset.nunique()

Example:

# Import pandas module as pd using the import keyword
import pandas as pd
# Import dataset using read_csv() function by pasing the dataset name as
# an argument to it.
# Store it in a variable.
cereal_dataset = pd.read_csv('cereal.csv')
# Apply nunique() function to the given dataset to get all the unique
# values present in each column of the dataframe.
cereal_dataset.nunique()

Output:

name        77
mfr          7
type         2
calories    11
protein      6
fat          5
sodium      27
fiber       13
carbo       22
sugars      17
potass      36
vitamins     3
shelf        3
weight       7
cups        12
rating      77
dtype: int64

The below is the code to represent the unique values in the column ‘vitamins’.

cereal_dataset.vitamins.unique()

Example

# Import pandas module as pd using the import keyword
import pandas as pd
# Import dataset using read_csv() function by pasing the dataset name as
# an argument to it.
# Store it in a variable.
cereal_dataset = pd.read_csv('cereal.csv')
# Apply unique() function to the vitamins column in the given dataset to 
# get all the unique values in the column 'vitamins'.
cereal_dataset.vitamins.unique()

Output:

array([ 25, 0, 100])