How to convert Dataframe column type from string to date time

Converting Dataframe column type from string to date time

In this article we will learn to convert data type of dataframe column to from string to datetime where the data can be custom string formats or embedded in big texts. We will also learn how we can handle the error while converting data types.

A function provided by Python’s Pandas module is used to convert a given argument to datetime.

Synatx : pandas.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, utc=None, box=True, format=None, exact=True, unit=None, infer_datetime_format=False, origin='unix', cache=True)

where,

Parameters:

  • arg : Element that is to be converted to a datetime with type like int, float, string, datetime, list, 1-d array or Series.
  • errors : It is a way to handle error which can be ‘ignore’, ‘raise’, ‘coerce’. Whereas, default value is ‘raise’ (‘raise’: Raise exception in invalid parsing , ‘coerce’: Set as NaT in case of invalid parsing , ‘ignore’: Return the input if invalid parsing found)
  • format : string, default Nonedate & time string in format eg “%d/%m/%Y” etc.

 Returns:

  • It converts and return the value as date time format based on input.
  1. A series of datetime64 type will be returned, if a series of string is passed.
  2. A datetime64 object will be returned, if scalar entity is passed

Convert the Data type of a column from string to datetime64 :

Let’s create a dataframe where column ‘DOB’ has dates in string format i.e. DD/MM/YYYY’.

import pandas as sc
# List of Tuples
players = [('Jason', '31/01/1978', 'Delhi', 155) ,
            ('Johny', '26/05/1980', 'Hyderabad' , 15) ,
            ('Darren', '03/01/1992', 'Jamaica',222) ,
            ('Finch', '22/12/1994','Pune' , 12) ,
            ('Krunal', '16/08/1979', 'Mumbai' , 58) ,
            ('Ravindra', '04/06/1985', 'Chennai', 99 ),
            ('Dinesh', '23/02/1985', 'Kolkata', 10)
           ]
# Creation of DataFrame object
PlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Teams', 'Jersey'])
print(PlayerObj)
print('Datatype of players dataframe is:')
print(PlayerObj.dtypes)
Output :
       Name         DOB      Teams  Jersey
0     Jason  31/01/1978      Delhi     155
1     Johny  26/05/1980  Hyderabad      15
2    Darren  03/01/1992    Jamaica     222
3     Finch  22/12/1994       Pune      12
4    Krunal  16/08/1979     Mumbai      58
5  Ravindra  04/06/1985    Chennai      99
6    Dinesh  23/02/1985    Kolkata      10
Datatype of players dataframe is:
Name      object
DOB       object
Teams     object
Jersey     int64
dtype: object

Now let’s try to convert data type of column ‘DOB’ to datetime64.

import pandas as sc
# List of Tuples
players = [('Jason', '31/01/1978', 'Delhi', 155) ,
            ('Johny', '26/05/1980', 'Hyderabad' , 15) ,
            ('Darren', '03/01/1992', 'Jamaica',222) ,
            ('Finch', '22/12/1994','Pune' , 12) ,
            ('Krunal', '16/08/1979', 'Mumbai' , 58) ,
            ('Ravindra', '04/06/1985', 'Chennai', 99 ),
            ('Dinesh', '23/02/1985', 'Kolkata', 10)
           ]
# Creation of DataFrame object
PlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Teams', 'Jersey'])
print(PlayerObj)
print('Datatype of players dataframe is:')
# Convert the column 'DOB' to datetime64 data type
PlayerObj['DOB'] = sc.to_datetime(PlayerObj['DOB'])
print(PlayerObj.dtypes)

Output :

Name         DOB      Teams  Jersey
0     Jason  31/01/1978      Delhi     155
1     Johny  26/05/1980  Hyderabad      15
2    Darren  03/01/1992    Jamaica     222
3     Finch  22/12/1994       Pune      12
4    Krunal  16/08/1979     Mumbai      58
5  Ravindra  04/06/1985    Chennai      99
6    Dinesh  23/02/1985    Kolkata      10
Datatype of players dataframe is:
Name              object
DOB       datetime64[ns]
Teams             object
Jersey             int64
dtype: object

to_datetime() also converts the DOB strings in ISO8601 format to datetime64 type which automatically handles string types like. Henceforth, let’s try to convert data type of string to datetime64:

DD-MM-YYYY HH:MM AM/PM’

‘YYYY-MM-DDTHH:MM:SS’

‘YYYY-MM-DDT HH:MM:SS.ssssss’, etc.

import pandas as sc
# List of Tuples
players = [('Jason', '31/01/1978 12:00 AM', 'Delhi', 155) ,
            ('Johny', '26/05/1980 02:00:55', 'Hyderabad' , 15) ,
            ('Darren', '03/01/1992', 'Jamaica',222) ,
            ('Finch', '22/12/1994 T23:11:25Z','Pune' , 12)
           ]
# Creation of DataFrame object
PlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Teams', 'Jersey'])
print(PlayerObj)
print('Datatype of players dataframe is:')
# Convert the column 'DOB' to datetime64 datatype
PlayerObj['DOB'] = sc.to_datetime(PlayerObj['DOB'])
print(PlayerObj.dtypes)
Output :
Name                    DOB      Teams  Jersey
0   Jason    31/01/1978 12:00 AM      Delhi     155
1   Johny    26/05/1980 02:00:55  Hyderabad      15
2  Darren             03/01/1992    Jamaica     222
3   Finch  22/12/1994 T23:11:25Z       Pune      12
Datatype of players dataframe is:
Name              object
DOB       datetime64[ns]
Teams             object
Jersey             int64
dtype: object

Convert the Data type of a column from custom format string to datetime64 :

We can also have case where the dataframe have columns having dates in custom format like DDMMYYYY, DD–MM–YY and then try to convert string format of custom format to datetime64.

import pandas as sc
# List of Tuples
players = [('Jason', '08091986', 'Delhi', 155),
            ('Johny', '11101988', 'Hyderabad', 15)
            ]
# Creation of DataFrame object
PlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Teams', 'Jersey'])
print(PlayerObj)
print('Datatype of players dataframe is:')
# Convert the column 'DOB' to datetime64 datatype
PlayerObj['DOB'] = sc.to_datetime(PlayerObj['DOB'], format='%m%d%Y')
print(PlayerObj.dtypes)
Output :
Name       DOB      Teams  Jersey
0  Jason  08091986      Delhi     155
1  Johny  11101988  Hyderabad      15
Datatype of players dataframe is:
Name              object
DOB       datetime64[ns]
Teams             object
Jersey             int64
dtype: object

Convert the Data type of a column from string to datetime by extracting date & time strings from big string :

 There may be a case where columns may contain: date of birth is 28101982 OR 17071990 is DOB. We have to pass the argument in pd.to_dataframe(), if passed as False it will try to match the format  anywhere in string. After that let’s convert data type of column DOB as string to datatime64.

import pandas as sc
# List of Tuples
players = [('Jason', 'date of birth is 08091986', 'Delhi', 155),
            ('Johny', '11101988 is DOB', 'Hyderabad', 15)
            ]
# Creation of DataFrame object
PlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Teams', 'Jersey'])
print('Datatype of players dataframe is:')
# Convert the column 'DOB' to datetime64 data type
PlayerObj['DOB'] = sc.to_datetime(PlayerObj['DOB'], format='%m%d%Y', exact=False)
print(PlayerObj)
print(PlayerObj.dtypes)
Output :
Datatype of players dataframe is:
    Name        DOB      Teams  Jersey
0  Jason 1986-08-09      Delhi     155
1  Johny 1988-11-10  Hyderabad      15
Name              object
DOB       datetime64[ns]
Teams             object
Jersey             int64
dtype: object

Another Example : Extract date & time from big string in a column and add new columns of datetime64 format :

import pandas as sc

# List of Tuples

players = [('Jason', '12:00 AM on the date 08091986', 'Delhi', 155),

            ('Johny', '11101988 and evening 07:00 PM', 'Hyderabad', 15)

            ]

# Creation of DataFrame object

PlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Teams', 'Jersey'])

print('Datatype of players dataframe is:')

# Convert the column 'DOB' to datetime64 data type

PlayerObj['DOB_time'] = sc.to_datetime(PlayerObj['DOB'], format='%H:%M %p', exact=False)

PlayerObj['DOB_date'] = sc.to_datetime(PlayerObj['DOB'], format='%m%d%Y', exact=False)

print('New dataframe is:')

print(PlayerObj)
Output :
Datatype of players dataframe is:
New dataframe is:
Name DOB ... DOB_time DOB_date
0 Jason 12:00 AM on the date 08091986 ... 1900-01-01 12:00:00 1986-08-09
1 Johny 11101988 and evening 07:00 PM ... 1900-01-01 07:00:00 1988-11-10

In DOB_time column as we provided time only so it took date as default i.e. 1900-01-01, whereas DOB_date contains the date onle. But both the columns i.e. DOB_time & DOB_date have same data type i.e. datetime64.

Handle error while Converting the Data type of a column from string to datetime :

To handle the errors while converting data type of column we can pass error arguments like ‘raise’, ‘coerce’, ‘ignore’ to customize the behavior.

import pandas as sc
# List of Tuples
players = [('Jason', '08091986', 'Delhi', 155),
            ('Johny', '11101988', 'Hyderabad', 15)
            ]

# Creation of DataFrame object
PlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Team', 'Jersey'])
print("Contents of the original Dataframe : ")
print(PlayerObj)
print('Data types of columns in original dataframe')
print(PlayerObj.dtypes)
# Ignores errors while converting the type
PlayerObj['DOB'] = sc.to_datetime(PlayerObj['DOB'], errors='ignore')
print("Contents of the Dataframe : ")
print(PlayerObj)
print('Data types of columns in modified dataframe')
print(PlayerObj.dtypes)
Output :
Contents of the original Dataframe : 
Name DOB Team Jersey
0 Jason 08091986 Delhi 155
1 Johny 11101988 Hyderabad 15
Data types of columns in original dataframe
Name object
DOB object
Team object
Jersey int64
dtype: object
Contents of the Dataframe : 
Name DOB Team Jersey
0 Jason 08091986 Delhi 155
1 Johny 11101988 Hyderabad 15
Data types of columns in modified dataframe
Name object
DOB object
Team object
Jersey int64
dtype: object

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas – Modify a Dataframe