Python loc() Function: To Extract Values from a Dataset

Python loc() Function:

Python is made up of modules that provide built-in functions for dealing with and manipulating data values.

Pandas is an example of such a module.

The Pandas module allows us to manage enormous data sets including a massive amount of data for processing all at once.

This is where Python’s loc() method comes into play. The loc() function makes it simple to retrieve data values from a dataset.

The loc() function allows us to obtain the data values fitted in a specific row or column based on the index value given to the function.

Syntax:

pandas.DataFrame.loc[index label]

We must supply the index values for which we want the whole data set to be shown in the output.

The index label could be one of the following values:

  • Single label – for example: String
  • List of string
  • Slice objects with labels
  • List of an array of labels, etc.

Using the loc() function, we may extract a specific record from a dataset depending on the index label.

If the provided index is not present as a label it returns KeyError.

Example

# Import pandas module using the import keyword
import pandas as pd
# Pass the some random list of data given to the DataFrame() function and store it in a variable
gvn_data = pd.DataFrame([[110, 2, 25, 14], [100, 3, 22, 10], [115, 1, 27, 9], [90, 5, 12, 14]],
     index=['Almond Delight', 'Clusters', 'Corn Chex', 'Cocoa Puffs'],
     columns=['calories', 'vitamins', 'fats','carboydrates'])
# Print the above dataframe
print("The given input Dataframe: ")
print(gvn_data)

Output:

The given input Dataframe: 
                calories  vitamins  fats  carboydrates
Almond Delight       110         2    25            14
Clusters             100         3    22            10
Corn Chex            115         1    27             9
Cocoa Puffs           90         5    12            14

Extraction of a Row from the Given Dataframe

Get all of the data values linked with the index label ‘clusters’ as shown below:

print(gvn_data.loc['Clusters'])
# Import pandas module using the import keyword
import pandas as pd
# Pass the some random list of data given to the DataFrame() function and store it in a variable
gvn_data = pd.DataFrame([[110, 2, 25, 14], [100, 3, 22, 10], [115, 1, 27, 9], [90, 5, 12, 14]],
     index=['Almond Delight', 'Clusters', 'Corn Chex', 'Cocoa Puffs'],
     columns=['calories', 'vitamins', 'fats','carboydrates'])
# Get all of the data values linked with the index label 'Clusters' using the
# loc[] function and print it.
print(gvn_data.loc['Clusters'])

Output:

calories        100
vitamins          3
fats             22
carboydrates     10
Name: Clusters, dtype: int64

Extraction of Multiple Rows from the Given Dataframe

We cal also get the multiple rows from the given dataframe.

Get all of the data values linked with the index labels ‘clusters’,  ‘Almond Delight’ as shown below:

print(gvn_data.loc[['Clusters', 'Almond Delight']])
# Import pandas module using the import keyword
import pandas as pd
# Pass the some random list of data given to the DataFrame() function and store it in a variable
gvn_data = pd.DataFrame([[110, 2, 25, 14], [100, 3, 22, 10], [115, 1, 27, 9], [90, 5, 12, 14]],
     index=['Almond Delight', 'Clusters', 'Corn Chex', 'Cocoa Puffs'],
     columns=['calories', 'vitamins', 'fats','carboydrates'])
# Extracting multiple rows from the given dataframe.
# Get all of the data values linked with the index labels 'clusters',  'Almond Delight'
# using the loc[] function and print it.
print(gvn_data.loc[['Clusters', 'Almond Delight']])

Output:

                calories  vitamins  fats  carboydrates
Clusters             100         3    22            10
Almond Delight       110         2    25            14

Extraction of Range of Rows from the Given Dataframe

We can retrieve data values of the range of rows using the loc[] function and slicing operator as shown below:

print(gvn_data.loc['Clusters': 'Cocoa Puffs'])
# Import pandas module using the import keyword
import pandas as pd
# Pass the some random list of data given to the DataFrame() function and store it in a variable
gvn_data = pd.DataFrame([[110, 2, 25, 14], [100, 3, 22, 10], [115, 1, 27, 9], [90, 5, 12, 14]],
     index=['Almond Delight', 'Clusters', 'Corn Chex', 'Cocoa Puffs'],
     columns=['calories', 'vitamins', 'fats','carboydrates'])
# Extracting range of rows from the given dataframe.
# Get all of the data values linked with the index labels 'clusters' to 'Cocoa Puffs '
# using the loc[] function,slicing operator and print it.
print(gvn_data.loc['Clusters': 'Cocoa Puffs'])

Output:

             calories  vitamins  fats  carboydrates
Clusters          100         3    22            10
Corn Chex         115         1    27             9
Cocoa Puffs        90         5    12            14