Converting Dataframe column type from string to date time
In this article we will learn to convert data type of dataframe column to from string to datetime where the data can be custom string formats or embedded in big texts. We will also learn how we can handle the error while converting data types.
A function provided by Python’s Pandas module is used to convert a given argument to datetime.
Synatx : pandas.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, utc=None, box=True, format=None, exact=True, unit=None, infer_datetime_format=False, origin='unix', cache=True)
where,
Parameters:
- arg : Element that is to be converted to a datetime with type like int, float, string, datetime, list, 1-d array or Series.
- errors : It is a way to handle error which can be ‘ignore’, ‘raise’, ‘coerce’. Whereas, default value is ‘raise’ (‘raise’: Raise exception in invalid parsing , ‘coerce’: Set as NaT in case of invalid parsing , ‘ignore’: Return the input if invalid parsing found)
- format : string, default Nonedate & time string in format eg “%d/%m/%Y” etc.
Returns:
- It converts and return the value as date time format based on input.
- A series of datetime64 type will be returned, if a series of string is passed.
- A datetime64 object will be returned, if scalar entity is passed
Convert the Data type of a column from string to datetime64 :
Let’s create a dataframe where column ‘DOB’ has dates in string format i.e. DD/MM/YYYY’.
import pandas as sc # List of Tuples players = [('Jason', '31/01/1978', 'Delhi', 155) , ('Johny', '26/05/1980', 'Hyderabad' , 15) , ('Darren', '03/01/1992', 'Jamaica',222) , ('Finch', '22/12/1994','Pune' , 12) , ('Krunal', '16/08/1979', 'Mumbai' , 58) , ('Ravindra', '04/06/1985', 'Chennai', 99 ), ('Dinesh', '23/02/1985', 'Kolkata', 10) ] # Creation of DataFrame object PlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Teams', 'Jersey']) print(PlayerObj) print('Datatype of players dataframe is:') print(PlayerObj.dtypes)
Output : Name DOB Teams Jersey 0 Jason 31/01/1978 Delhi 155 1 Johny 26/05/1980 Hyderabad 15 2 Darren 03/01/1992 Jamaica 222 3 Finch 22/12/1994 Pune 12 4 Krunal 16/08/1979 Mumbai 58 5 Ravindra 04/06/1985 Chennai 99 6 Dinesh 23/02/1985 Kolkata 10 Datatype of players dataframe is: Name object DOB object Teams object Jersey int64 dtype: object
Now let’s try to convert data type of column ‘DOB’ to datetime64.
import pandas as sc # List of Tuples players = [('Jason', '31/01/1978', 'Delhi', 155) , ('Johny', '26/05/1980', 'Hyderabad' , 15) , ('Darren', '03/01/1992', 'Jamaica',222) , ('Finch', '22/12/1994','Pune' , 12) , ('Krunal', '16/08/1979', 'Mumbai' , 58) , ('Ravindra', '04/06/1985', 'Chennai', 99 ), ('Dinesh', '23/02/1985', 'Kolkata', 10) ] # Creation of DataFrame object PlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Teams', 'Jersey']) print(PlayerObj) print('Datatype of players dataframe is:') # Convert the column 'DOB' to datetime64 data type PlayerObj['DOB'] = sc.to_datetime(PlayerObj['DOB']) print(PlayerObj.dtypes)
Output :
Name DOB Teams Jersey 0 Jason 31/01/1978 Delhi 155 1 Johny 26/05/1980 Hyderabad 15 2 Darren 03/01/1992 Jamaica 222 3 Finch 22/12/1994 Pune 12 4 Krunal 16/08/1979 Mumbai 58 5 Ravindra 04/06/1985 Chennai 99 6 Dinesh 23/02/1985 Kolkata 10 Datatype of players dataframe is: Name object DOB datetime64[ns] Teams object Jersey int64 dtype: object
to_datetime()
also converts the DOB strings in ISO8601 format to datetime64 type which automatically handles string types like. Henceforth, let’s try to convert data type of string to datetime64:
DD-MM-YYYY HH:MM AM/PM’
‘YYYY-MM-DDTHH:MM:SS’
‘YYYY-MM-DDT HH:MM:SS.ssssss’, etc.
import pandas as sc # List of Tuples players = [('Jason', '31/01/1978 12:00 AM', 'Delhi', 155) , ('Johny', '26/05/1980 02:00:55', 'Hyderabad' , 15) , ('Darren', '03/01/1992', 'Jamaica',222) , ('Finch', '22/12/1994 T23:11:25Z','Pune' , 12) ] # Creation of DataFrame object PlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Teams', 'Jersey']) print(PlayerObj) print('Datatype of players dataframe is:') # Convert the column 'DOB' to datetime64 datatype PlayerObj['DOB'] = sc.to_datetime(PlayerObj['DOB']) print(PlayerObj.dtypes)
Output : Name DOB Teams Jersey 0 Jason 31/01/1978 12:00 AM Delhi 155 1 Johny 26/05/1980 02:00:55 Hyderabad 15 2 Darren 03/01/1992 Jamaica 222 3 Finch 22/12/1994 T23:11:25Z Pune 12 Datatype of players dataframe is: Name object DOB datetime64[ns] Teams object Jersey int64 dtype: object
Convert the Data type of a column from custom format string to datetime64 :
We can also have case where the dataframe have columns having dates in custom format like DDMMYYYY, DD–MM–YY and then try to convert string format of custom format to datetime64.
import pandas as sc # List of Tuples players = [('Jason', '08091986', 'Delhi', 155), ('Johny', '11101988', 'Hyderabad', 15) ] # Creation of DataFrame object PlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Teams', 'Jersey']) print(PlayerObj) print('Datatype of players dataframe is:') # Convert the column 'DOB' to datetime64 datatype PlayerObj['DOB'] = sc.to_datetime(PlayerObj['DOB'], format='%m%d%Y') print(PlayerObj.dtypes)
Output : Name DOB Teams Jersey 0 Jason 08091986 Delhi 155 1 Johny 11101988 Hyderabad 15 Datatype of players dataframe is: Name object DOB datetime64[ns] Teams object Jersey int64 dtype: object
Convert the Data type of a column from string to datetime by extracting date & time strings from big string :
There may be a case where columns may contain: date of birth is 28101982 OR 17071990 is DOB. We have to pass the argument in pd.to_dataframe()
, if passed as False it will try to match the format anywhere in string. After that let’s convert data type of column DOB as string to datatime64.
import pandas as sc # List of Tuples players = [('Jason', 'date of birth is 08091986', 'Delhi', 155), ('Johny', '11101988 is DOB', 'Hyderabad', 15) ] # Creation of DataFrame object PlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Teams', 'Jersey']) print('Datatype of players dataframe is:') # Convert the column 'DOB' to datetime64 data type PlayerObj['DOB'] = sc.to_datetime(PlayerObj['DOB'], format='%m%d%Y', exact=False) print(PlayerObj) print(PlayerObj.dtypes)
Output : Datatype of players dataframe is: Name DOB Teams Jersey 0 Jason 1986-08-09 Delhi 155 1 Johny 1988-11-10 Hyderabad 15 Name object DOB datetime64[ns] Teams object Jersey int64 dtype: object
Another Example : Extract date & time from big string in a column and add new columns of datetime64 format :
import pandas as sc # List of Tuples players = [('Jason', '12:00 AM on the date 08091986', 'Delhi', 155), ('Johny', '11101988 and evening 07:00 PM', 'Hyderabad', 15) ] # Creation of DataFrame object PlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Teams', 'Jersey']) print('Datatype of players dataframe is:') # Convert the column 'DOB' to datetime64 data type PlayerObj['DOB_time'] = sc.to_datetime(PlayerObj['DOB'], format='%H:%M %p', exact=False) PlayerObj['DOB_date'] = sc.to_datetime(PlayerObj['DOB'], format='%m%d%Y', exact=False) print('New dataframe is:') print(PlayerObj)
Output : Datatype of players dataframe is: New dataframe is: Name DOB ... DOB_time DOB_date 0 Jason 12:00 AM on the date 08091986 ... 1900-01-01 12:00:00 1986-08-09 1 Johny 11101988 and evening 07:00 PM ... 1900-01-01 07:00:00 1988-11-10
In DOB_time column as we provided time only so it took date as default i.e. 1900-01-01, whereas DOB_date contains the date onle. But both the columns i.e. DOB_time & DOB_date have same data type i.e. datetime64.
Handle error while Converting the Data type of a column from string to datetime :
To handle the errors while converting data type of column we can pass error arguments like ‘raise’, ‘coerce’, ‘ignore’ to customize the behavior.
import pandas as sc # List of Tuples players = [('Jason', '08091986', 'Delhi', 155), ('Johny', '11101988', 'Hyderabad', 15) ] # Creation of DataFrame object PlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Team', 'Jersey']) print("Contents of the original Dataframe : ") print(PlayerObj) print('Data types of columns in original dataframe') print(PlayerObj.dtypes) # Ignores errors while converting the type PlayerObj['DOB'] = sc.to_datetime(PlayerObj['DOB'], errors='ignore') print("Contents of the Dataframe : ") print(PlayerObj) print('Data types of columns in modified dataframe') print(PlayerObj.dtypes)
Output : Contents of the original Dataframe : Name DOB Team Jersey 0 Jason 08091986 Delhi 155 1 Johny 11101988 Hyderabad 15 Data types of columns in original dataframe Name object DOB object Team object Jersey int64 dtype: object Contents of the Dataframe : Name DOB Team Jersey 0 Jason 08091986 Delhi 155 1 Johny 11101988 Hyderabad 15 Data types of columns in modified dataframe Name object DOB object Team object Jersey int64 dtype: object
Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.
Read more Articles on Python Data Analysis Using Padas – Modify a Dataframe
- pandas.apply(): Apply a function to each row/column in Dataframe
- Pandas: Sort rows or columns in Dataframe based on values using Dataframe.sort_values()
- Apply a function to single or selected columns or rows in Dataframe
- Sort a DataFrame based on column names or row index labels using Dataframe.sort_index() in Pandas
- Change data type of single or multiple columns of Dataframe in Python
- Change Column & Row names in DataFrame
- Convert Dataframe column into to the Index of Dataframe
- Convert Dataframe indexes into columns