Python astype() Method with Examples

In this tutorial, we will go over an important idea in detail: Data Type Conversion of Columns in a DataFrame Using Python astype() Method.

Python is a superb language for data analysis, owing to its fantastic ecosystem of data-centric python programmes. Pandas is one of these packages, and it greatly simplifies data import and analysis.

astype() Method:

DataFrame.astype() method is used to convert pandas object to a given datatype. The astype() function can also convert any acceptable existing column to a categorical type.

We frequently come across a stage in the realm of Data Science and Machine Learning when we need to pre-process and transform the data. To be more specific, the transformation of data values is the first step toward modeling.
This is when data column conversion comes into play.

The Python astype() method allows us to convert the data type of an existing data column in a dataset or data frame.

Using the astype() function, we can modify or transform the type of data values or single or multiple columns to a completely different form.

Syntax:

DataFrame.astype(dtype, copy=True, errors='raise')

Parameters

dtype: The data type that should be applied to the entire data frame.
copy: If we set it to True, it makes a new copy of the dataset with the changes incorporated.
errors: By setting it to ‘raise,’ we allow the function to raise exceptions. If it isn’t, we can set it to ‘ignore.’

1)astype() – with DataFrame

Below is the implementation:

# Import pandas module using the import keyword
import pandas as pd
# Give the dictionary as static input and store it in a variable.
# (data given in the dictionary form)
gvn_data = {"ID": [11, 12, 13, 14, 15, 16], "Name": ["peter", "irfan", "mary",
                                                     "riya", "virat", "sunny"], "salary": [10000, 25000, 15000, 50000, 30000, 22000]}
# Pass the given data to the DataFrame() function and store it in another variable
block_data = pd.DataFrame(gvn_data)
# Print the above result
print("The given input Dataframe: ")
print(block_data)
print()
# Apply dtypes to the above block data
block_data.dtypes

Output:

The given input Dataframe: 
   ID   Name  salary
0  11  peter   10000
1  12  irfan   25000
2  13   mary   15000
3  14   riya   50000
4  15  virat   30000
5  16  sunny   22000

ID         int64
Name      object
salary     int64
dtype: object

Now, apply the astype() method on the ‘Name’ column to change the data type to ‘category’

# Import pandas module using the import keyword
import pandas as pd
# Give the dictionary as static input and store it in a variable.
# (data given in the dictionary form)
gvn_data = {"ID": [11, 12, 13, 14, 15, 16], "Name": ["peter", "irfan", "mary",
                                                     "riya", "virat", "sunny"], "salary": [10000, 25000, 15000, 50000, 30000, 22000]}
# Pass the given data to the DataFrame() function and store it in another variable
block_data = pd.DataFrame(gvn_data)
# Apply the astype() method on the 'Name' column to change the data type to 'category'
block_data['Name'] = block_data['Name'].astype('category')
# Apply dtypes to the above block data
block_data.dtypes

Output:

ID           int64
Name      category
salary       int64
dtype: object

Note:

 You can also change to datatype 'string'

2)astype() Method – with a Dataset in Python

Use the pandas.read csv() function to import the dataset. The dataset can be found here.

Approach:

  • Import pandas library using the import keyword.
  • Import some random dataset using the pandas.read_csv() function by passing the filename as an argument to it.
  • Store it in a variable.
  • Apply dtypes to the above dataset.
  • The Exit of the Program.

Below is the implementation:

# Import pandas library using the import keyword
import pandas
# Import some random dataset using the pandas.read_csv() function by passing
# the filename as an argument to it.
# Store it in a variable.
cereal_dataset = pandas.read_csv("cereal.csv")
# Apply dtypes to the above dataset
cereal_dataset.dtypes

Output:

name         object
mfr          object
type         object
calories      int64
protein       int64
fat           int64
sodium        int64
fiber       float64
carbo       float64
sugars        int64
potass        int64
vitamins      int64
shelf         int64
weight      float64
cups        float64
rating      float64
dtype: object

Now attempt to change the datatype of the variables ‘name’ and ‘fat’ to string, float64 respectively. As a result, we can say that the astype() function allows us to change the data types of multiple columns in one go.

# Import pandas library using the import keyword
import pandas
# Import some random dataset using the pandas.read_csv() function by passing
# filename as an argument to it.
# Store it in a variable.
cereal_dataset = pandas.read_csv("cereal.csv")
# Change the datatype of the variables 'name' and 'fat'using the astype() function
print("The dataset after changing datatypes:")
cereal_dataset = cereal_dataset.astype({"name":'string', "fat":'float64'}) 
# Apply dtypes to the above dataset
cereal_dataset.dtypes

Output:

The dataset after changing datatypes:
name         string
mfr          object
type         object
calories      int64
protein       int64
fat         float64
sodium        int64
fiber       float64
carbo       float64
sugars        int64
potass        int64
vitamins      int64
shelf         int64
weight      float64
cups        float64
rating      float64
dtype: object