Changeing data type of single or multiple columns of Dataframe in Python
In this article we will see how we can change the data type of a single or multiple column of Dataframe in Python.
Change Data Type of a Single Column :
We will use series.astype()
to change the data type of columns
Syntax:- Series.astype(self, dtype, copy=True, errors='raise', **kwargs)
where Arguments:
- dtype : It is python type to which whole series object will get converted.
- errors : It is a way of handling errors, which can be ignore/ raise and default value is ‘raised’. (raise- Raise exception in case of invalid parsing , ignore- Return the input as original in case of invalid parsing
- copy : bool (Default value is True) (If False- Will make change in current object , If True- Return a copy)
Returns: If copy argument is true, new Series object with updated type is returned.
import pandas as sc # List of Tuples students = [('Rohit', 34, 'Swimming', 155) , ('Ritik', 25, 'Cricket' , 179) , ('Salim', 26, 'Music', 187) , ('Rani', 29,'Sleeping' , 154) , ('Sonu', 17, 'Singing' , 184) , ('Madhu', 20, 'Travelling', 165 ), ('Devi', 22, 'Art', 141) ] # Create a DataFrame object with different data type of column studObj = sc.DataFrame(students, columns=['Name', 'Age', 'Hobby', 'Height']) print(studObj) print(studObj.dtypes)
Output : Name Age      Hobby Height 0 Rohit  34   Swimming    155 1 Ritik  25    Cricket    179 2 Salim  26      Music    187 3  Rani  29   Sleeping    154 4  Sonu  17    Singing    184 5 Madhu  20 Travelling    165 6  Devi  22        Art    141 Name     object Age       int64 Hobby    object Height    int64 dtype: object
Change data type of a column from int64 to float64 :
We can change data type of a column a column e.g. Let’s try changing data type of ‘Age’ column from int64 to float64. For this we have to write Float64 in astype()
which will get reflected in dataframe.
import pandas as sc # List of Tuples students = [('Rohit', 34, 'Swimming', 155) , ('Ritik', 25, 'Cricket' , 179) , ('Salim', 26, 'Music', 187) , ('Rani', 29,'Sleeping' , 154) , ('Sonu', 17, 'Singing' , 184) , ('Madhu', 20, 'Travelling', 165 ), ('Devi', 22, 'Art', 141) ] # Create a DataFrame object with different datatype of column studObj = sc.DataFrame(students, columns=['Name', 'Age', 'Hobby', 'Height']) # Change data type of column 'Age' to float64 studObj['Age'] = studObj['Age'].astype('float64') print(studObj) print(studObj.dtypes)
Output : Name  Age      Hobby Height 0 Rohit 34.0   Swimming    155 1 Ritik 25.0    Cricket    179 2 Salim 26.0      Music    187 3  Rani 29.0   Sleeping    154 4  Sonu 17.0    Singing    184 5 Madhu 20.0 Travelling    165 6  Devi 22.0        Art    141 Name      object Age      float64 Hobby     object Height     int64 dtype: object
Change data type of a column from int64 to string :
Let’s try to change the data type of ‘Height’ column to string i.e. Object type. As we know by default value of astype() was True, so it returns a copy of passed series with changed Data type which will be assigned to studObj['Height'].
import pandas as sc # List of Tuples students = [('Rohit', 34, 'Swimming', 155) , ('Ritik', 25, 'Cricket' , 179) , ('Salim', 26, 'Music', 187) , ('Rani', 29,'Sleeping' , 154) , ('Sonu', 17, 'Singing' , 184) , ('Madhu', 20, 'Travelling', 165 ), ('Devi', 22, 'Art', 141) ] studObj = sc.DataFrame(students, columns=['Name', 'Age', 'Hobby', 'Height']) # Change data type of column 'Marks' from int64 to float64 studObj['Age'] = studObj['Age'].astype('float64') # Change data type of column 'Marks' from int64 to Object type or string studObj['Height'] = studObj['Height'].astype('object') print(studObj) print(studObj.dtypes)
Output : Name  Age      Hobby Height 0 Rohit 34.0   Swimming   155 1 Ritik 25.0    Cricket   179 2 Salim 26.0      Music   187 3  Rani 29.0   Sleeping   154 4  Sonu 17.0    Singing   184 5 Madhu 20.0 Travelling   165 6  Devi 22.0        Art   141 Name      object Age      float64 Hobby     object Height    object dtype: object
Change Data Type of Multiple Columns in Dataframe :
To change the datatype of multiple column in Dataframe we will use DataFeame.astype()
which can be applied for whole dataframe or selected columns.
Synatx:- DataFrame.astype(self, dtype, copy=True, errors='raise', **kwargs)
Arguments:
- dtype : It is python type to which whole series object will get converted. (Dictionary of column names and data types where given colum will be converted to corrresponding types.)
- errors : It is a way of handling errors, which can be ignore/ raise and default value is ‘raised’.
- raise : Raise exception in case of invalid parsing
- ignore : Return the input as original in case of invalid parsing
- copy : bool (Default value is True) (If False- Will make change in current object , If True- Return a copy)
Returns: If copy argument is true, new Series object with updated type is returned.
Change Data Type of two Columns at same time :
Let’s try to convert columns ‘Age’ & ‘Height of int64 data type to float64 & string respectively. We will pass a Dictionary to Dataframe.astype()
where it contain column name as keys and new data type as values.
import pandas as sc # List of Tuples students = [('Rohit', 34, 'Swimming', 155) , ('Ritik', 25, 'Cricket' , 179) , ('Salim', 26, 'Music', 187) , ('Rani', 29,'Sleeping' , 154) , ('Sonu', 17, 'Singing' , 184) , ('Madhu', 20, 'Travelling', 165 ), ('Devi', 22, 'Art', 141) ] # Create a DataFrame object with different datatype of column studObj = sc.DataFrame(students, columns=['Name', 'Age', 'Hobby', 'Height']) # Convert the data type of column Age to float64 & column Marks to string studObj = studObj.astype({'Age': 'float64', 'Height': 'object'}) print(studObj) print(studObj.dtypes)
Output : Name  Age      Hobby Height 0 Rohit 34.0   Swimming   155 1 Ritik 25.0    Cricket   179 2 Salim 26.0      Music   187 3  Rani 29.0   Sleeping   154 4  Sonu 17.0    Singing   184 5 Madhu 20.0 Travelling   165 6  Devi 22.0        Art   141 Name      object Age      float64 Hobby     object Height    object dtype: object
Handle errors while converting Data Types of Columns :
Using astype()
to convert either a column or multiple column we can’t pass the content which can’t be typecasted. Otherwise error will be produced.
import pandas as sc # List of Tuples students = [('Rohit', 34, 'Swimming', 155) , ('Ritik', 25, 'Cricket' , 179) , ('Salim', 26, 'Music', 187) , ('Rani', 29,'Sleeping' , 154) , ('Sonu', 17, 'Singing' , 184) , ('Madhu', 20, 'Travelling', 165 ), ('Devi', 22, 'Art', 141) ] # Create a DataFrame object with different datatype of column studObj = sc.DataFrame(students, columns=['Name', 'Age', 'Hobby', 'Height']) # Trying to change dataype of a column with unknown dataype try: studObj['Name'] = studObj['Name'].astype('xyz') except TypeError as ex: print(ex)
Output : data type "xyz" not understood
Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.
Read more Articles on Python Data Analysis Using Padas – Modify a Dataframe
- pandas.apply(): Apply a function to each row/column in Dataframe
- Pandas: Sort rows or columns in Dataframe based on values using Dataframe.sort_values()
- Apply a function to single or selected columns or rows in Dataframe
- Sort a DataFrame based on column names or row index labels using Dataframe.sort_index() in Pandas
- Change Column & Row names in DataFrame
- Convert Dataframe column type from string to date time
- Convert Dataframe column into to the Index of Dataframe
- Convert Dataframe indexes into columns