In this tutorial, we will go over an important idea in detail: Data Type Conversion of Columns in a DataFrame Using Python astype() Method.
Python is a superb language for data analysis, owing to its fantastic ecosystem of data-centric python programmes. Pandas is one of these packages, and it greatly simplifies data import and analysis.
astype() Method:
DataFrame.astype() method is used to convert pandas object to a given datatype. The astype() function can also convert any acceptable existing column to a categorical type.
We frequently come across a stage in the realm of Data Science and Machine Learning when we need to pre-process and transform the data. To be more specific, the transformation of data values is the first step toward modeling.
This is when data column conversion comes into play.
The Python astype() method allows us to convert the data type of an existing data column in a dataset or data frame.
Using the astype() function, we can modify or transform the type of data values or single or multiple columns to a completely different form.
Syntax:
DataFrame.astype(dtype, copy=True, errors='raise')
Parameters
dtype: The data type that should be applied to the entire data frame.
copy: If we set it to True, it makes a new copy of the dataset with the changes incorporated.
errors: By setting it to ‘raise,’ we allow the function to raise exceptions. If it isn’t, we can set it to ‘ignore.’
1)astype() – with DataFrame
Below is the implementation:
# Import pandas module using the import keyword import pandas as pd # Give the dictionary as static input and store it in a variable. # (data given in the dictionary form) gvn_data = {"ID": [11, 12, 13, 14, 15, 16], "Name": ["peter", "irfan", "mary", "riya", "virat", "sunny"], "salary": [10000, 25000, 15000, 50000, 30000, 22000]} # Pass the given data to the DataFrame() function and store it in another variable block_data = pd.DataFrame(gvn_data) # Print the above result print("The given input Dataframe: ") print(block_data) print() # Apply dtypes to the above block data block_data.dtypes
Output:
The given input Dataframe: ID Name salary 0 11 peter 10000 1 12 irfan 25000 2 13 mary 15000 3 14 riya 50000 4 15 virat 30000 5 16 sunny 22000 ID int64 Name object salary int64 dtype: object
Now, apply the astype() method on the ‘Name’ column to change the data type to ‘category’
# Import pandas module using the import keyword import pandas as pd # Give the dictionary as static input and store it in a variable. # (data given in the dictionary form) gvn_data = {"ID": [11, 12, 13, 14, 15, 16], "Name": ["peter", "irfan", "mary", "riya", "virat", "sunny"], "salary": [10000, 25000, 15000, 50000, 30000, 22000]} # Pass the given data to the DataFrame() function and store it in another variable block_data = pd.DataFrame(gvn_data) # Apply the astype() method on the 'Name' column to change the data type to 'category' block_data['Name'] = block_data['Name'].astype('category') # Apply dtypes to the above block data block_data.dtypes
Output:
ID int64 Name category salary int64 dtype: object
Note:
You can also change to datatype 'string'
2)astype() Method – with a Dataset in Python
Use the pandas.read csv() function to import the dataset. The dataset can be found here.
Approach:
- Import pandas library using the import keyword.
- Import some random dataset using the pandas.read_csv() function by passing the filename as an argument to it.
- Store it in a variable.
- Apply dtypes to the above dataset.
- The Exit of the Program.
Below is the implementation:
# Import pandas library using the import keyword import pandas # Import some random dataset using the pandas.read_csv() function by passing # the filename as an argument to it. # Store it in a variable. cereal_dataset = pandas.read_csv("cereal.csv") # Apply dtypes to the above dataset cereal_dataset.dtypes
Output:
name object mfr object type object calories int64 protein int64 fat int64 sodium int64 fiber float64 carbo float64 sugars int64 potass int64 vitamins int64 shelf int64 weight float64 cups float64 rating float64 dtype: object
Now attempt to change the datatype of the variables ‘name’ and ‘fat’ to string, float64 respectively. As a result, we can say that the astype() function allows us to change the data types of multiple columns in one go.
# Import pandas library using the import keyword import pandas # Import some random dataset using the pandas.read_csv() function by passing # filename as an argument to it. # Store it in a variable. cereal_dataset = pandas.read_csv("cereal.csv") # Change the datatype of the variables 'name' and 'fat'using the astype() function print("The dataset after changing datatypes:") cereal_dataset = cereal_dataset.astype({"name":'string', "fat":'float64'}) # Apply dtypes to the above dataset cereal_dataset.dtypes
Output:
The dataset after changing datatypes: name string mfr object type object calories int64 protein int64 fat float64 sodium int64 fiber float64 carbo float64 sugars int64 potass int64 vitamins int64 shelf int64 weight float64 cups float64 rating float64 dtype: object