Pandas between() Method:
The Python Pandas module is mostly used to deal with data values that are stored in rows and columns, i.e. in a table/matrix format. Within this, we frequently see data variables with numeric values.
Before doing any type of activity, such as modeling, data must be analyzed and transformed.
To put it simply, the Python Pandas between() function enables easy analysis in terms of comparison and last moment checks.
The between() function looks for a value that exists between the start and end values given to it.
That is, it will verify which data elements fall between the start and end values supplied within a range of values.
Syntax:
series.between(start, end, inclusive=True)
start: This is the start value at which the check begins.
end: The check is stopped at this value.
inclusive: If True, it contains both the passed’start’ and ‘end’ values that are being checked. When set to ‘False,’ it excludes the’start’ and ‘end’ values from the check.
In addition, Python Pandas’ between() function only works good with numeric values and 1-dimensional DataFrames.
1) between() function in Python with inclusive set to ‘True’:
Example:
Here we used pandas.DataFrame() function to create a 1-D Dataframe.
# Import pandas module using the import keyword import pandas as pd # Give some random list of data and store it in a variable gvn_data = {"ID": [11, 12, 13, 14, 15, 16], "Name": ["peter", "irfan", "mary", "riya", "virat", "sunny"], "salary": [10000, 25000, 15000, 50000, 30000, 22000]} # Pass the given data to the DataFrame() function and store it in another variable block_data = pd.DataFrame(gvn_data) # Print the above result print("The given input Dataframe: ") print(block_data)
Output:
The given input Dataframe: ID Name salary 0 11 peter 10000 1 12 irfan 25000 2 13 mary 15000 3 14 riya 50000 4 15 virat 30000 5 16 sunny 22000
We’ve now used the between() method on the data frame’s ‘salary’ variable.
By setting inclusive to True, it will now include and verify what all values fall between 10000 and 25000 (including 10000 and 25000 ), and then return true for the indices whose salary falls within the specified range.
# Import pandas module using the import keyword import pandas as pd # Give some random list of data and store it in a variable gvn_data = {"ID": [11, 12, 13, 14, 15, 16], "Name": ["peter", "irfan", "mary", "riya", "virat", "sunny"], "salary": [10000, 25000, 15000, 50000, 30000, 22000]} # Pass the given data to the DataFrame() function and store it in another variable block_data = pd.DataFrame(gvn_data) # Print the above result print("The given input Dataframe: ") print(block_data) print() # Give the lower and upper limits range and inclusive set to True as the arguments # to the between() function and apply it to the salary block in the given data # Store it in another variable rslt_data = block_data["salary"].between(10000, 25000, inclusive = True) # Print the salaries that falls between the given range print("The salaries that falls between the given range:") print(rslt_data)
Output:
The given input Dataframe: ID Name salary 0 11 peter 10000 1 12 irfan 25000 2 13 mary 15000 3 14 riya 50000 4 15 virat 30000 5 16 sunny 22000 The salaries that falls between the given range: 0 True 1 True 2 True 3 False 4 False 5 True Name: salary, dtype: bool
Explanation:
Hence it returns False for indexes 3 and 4 because their values are beyond the range of 10000 to 25000.
2) between() function in Python with Categorical variable:
Let’s check what it produces for a string or categorical data.
When we send a string or non-numeric variable to the Pandas between() function, it compares the start and end values with the data given and returns True if the data values match either of the start or end values.
# Import pandas module using the import keyword import pandas as pd # Give some random list of data and store it in a variable gvn_data = {"ID": [11, 12, 13, 14, 15, 16], "Name": ["peter", "irfan", "mary", "riya", "virat", "sunny"], "salary": [10000, 25000, 15000, 50000, 30000, 22000]} # Pass the given data to the DataFrame() function and store it in another variable block_data = pd.DataFrame(gvn_data) # Print the above result print("The given input Dataframe: ") print(block_data) print() # Give the two names and inclusive set to True as the arguments # to the between() function and apply it in to the "Name" block in the given data # Store it in another variable rslt_data = block_data["Name"].between("peter", "riya", inclusive = True) # Print the above result print(rslt_data)
Output:
The given input Dataframe: ID Name salary 0 11 peter 10000 1 12 irfan 25000 2 13 mary 15000 3 14 riya 50000 4 15 virat 30000 5 16 sunny 22000 0 True 1 False 2 False 3 True 4 False 5 False Name: Name, dtype: bool
How to Print the values (rows) obtained from between() function?
# Import pandas module using the import keyword import pandas as pd # Give some random list of data and store it in a variable gvn_data = {"ID": [11, 12, 13, 14, 15, 16], "Name": ["peter", "irfan", "mary", "riya", "virat", "sunny"], "salary": [10000, 25000, 15000, 50000, 30000, 22000]} # Pass the given data to the DataFrame() function and store it in another variable block_data = pd.DataFrame(gvn_data) # Print the above result print("The given input Dataframe: ") print(block_data) print() # Give the lower and upper limits range and inclusive set to True as the arguments # to the between() function and apply it to the salary block in the given data # Store it in another variable rslt_data = block_data["salary"].between(10000, 25000, inclusive = True) # Print the salaries that falls between the given range print("The data of salaries that falls between the given range:") print(block_data[rslt_data])
Output:
The given input Dataframe: ID Name salary 0 11 peter 10000 1 12 irfan 25000 2 13 mary 15000 3 14 riya 50000 4 15 virat 30000 5 16 sunny 22000 The data of salaries that falls between the given range: ID Name salary 0 11 peter 10000 1 12 irfan 25000 2 13 mary 15000 5 16 sunny 22000