{"id":7289,"date":"2023-11-03T08:13:17","date_gmt":"2023-11-03T02:43:17","guid":{"rendered":"https:\/\/python-programs.com\/?p=7289"},"modified":"2023-11-10T12:15:17","modified_gmt":"2023-11-10T06:45:17","slug":"how-to-convert-dataframe-column-type-from-string-to-date-time","status":"publish","type":"post","link":"https:\/\/python-programs.com\/how-to-convert-dataframe-column-type-from-string-to-date-time\/","title":{"rendered":"How to convert Dataframe column type from string to date time"},"content":{"rendered":"
In this article we will learn to convert data type of dataframe column to from string to datetime where the data can be custom string formats or embedded in big texts. We will also learn how we can handle the error while converting data types.<\/p>\n
A function provided by Python’s Pandas module is used to convert a given argument to datetime.<\/p>\n
Synatx : pandas.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, utc=None, box=True, format=None, exact=True, unit=None, infer_datetime_format=False, origin='unix', cache=True)<\/pre>\nwhere,<\/p>\n
Parameters:<\/strong><\/p>\n
\n
- arg<\/strong> : Element that is to be converted to a datetime with type like int, float, string, datetime, list, 1-d array or Series.<\/li>\n
- errors<\/strong> : It is a way to handle error which can be \u2018ignore\u2019, \u2018raise\u2019, \u2018coerce\u2019. Whereas, default value is \u2018raise\u2019 (\u2018raise<\/strong>\u2019: Raise exception in invalid parsing , \u2018coerce<\/strong>\u2019: Set as NaT in case of invalid parsing , \u2018ignore<\/strong>\u2019: Return the input if invalid parsing found)<\/li>\n
- format<\/strong> : string, default Nonedate & time string in format eg \u201c%d\/%m\/%Y\u201d etc.<\/li>\n<\/ul>\n
\u00a0Returns:<\/strong><\/p>\n
\n
- It converts and return the value as date time format based on input.<\/li>\n<\/ul>\n
\n
- A series of datetime64 type will be returned, if a series of string is passed.<\/li>\n
- A datetime64 object will be returned, if scalar entity is passed<\/li>\n<\/ol>\n
Convert the Data type of a column from string to datetime64 :<\/h3>\n
Let’s create a dataframe where column ‘DOB’ has dates in string format i.e. DD\/MM\/YYYY\u2019.<\/p>\n
import pandas as sc\r\n# List of Tuples\r\nplayers = [('Jason', '31\/01\/1978', 'Delhi', 155) ,\r\n ('Johny', '26\/05\/1980', 'Hyderabad' , 15) ,\r\n ('Darren', '03\/01\/1992', 'Jamaica',222) ,\r\n ('Finch', '22\/12\/1994','Pune' , 12) ,\r\n ('Krunal', '16\/08\/1979', 'Mumbai' , 58) ,\r\n ('Ravindra', '04\/06\/1985', 'Chennai', 99 ),\r\n ('Dinesh', '23\/02\/1985', 'Kolkata', 10)\r\n ]\r\n# Creation of DataFrame object\r\nPlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Teams', 'Jersey'])\r\nprint(PlayerObj)\r\nprint('Datatype of players dataframe is:')\r\nprint(PlayerObj.dtypes)\r\n<\/pre>\nOutput :\r\n \u00a0\u00a0\u00a0\u00a0\u00a0 Name\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 DOB\u00a0\u00a0\u00a0\u00a0\u00a0 Teams\u00a0 Jersey\r\n0\u00a0\u00a0\u00a0\u00a0 Jason\u00a0 31\/01\/1978\u00a0\u00a0\u00a0\u00a0\u00a0 Delhi\u00a0\u00a0\u00a0\u00a0 155\r\n1\u00a0\u00a0\u00a0\u00a0 Johny\u00a0 26\/05\/1980\u00a0 Hyderabad\u00a0\u00a0\u00a0\u00a0\u00a0 15\r\n2\u00a0\u00a0\u00a0 Darren\u00a0 03\/01\/1992\u00a0\u00a0\u00a0 Jamaica\u00a0\u00a0\u00a0\u00a0 222\r\n3\u00a0\u00a0\u00a0\u00a0 Finch\u00a0 22\/12\/1994\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Pune\u00a0\u00a0\u00a0\u00a0\u00a0 12\r\n4\u00a0\u00a0\u00a0 Krunal\u00a0 16\/08\/1979\u00a0\u00a0\u00a0\u00a0 Mumbai\u00a0\u00a0\u00a0\u00a0\u00a0 58\r\n5\u00a0 Ravindra\u00a0 04\/06\/1985\u00a0\u00a0\u00a0 Chennai\u00a0\u00a0\u00a0\u00a0\u00a0 99\r\n6\u00a0\u00a0\u00a0 Dinesh\u00a0 23\/02\/1985\u00a0\u00a0\u00a0 Kolkata\u00a0\u00a0\u00a0\u00a0\u00a0 10\r\nDatatype of players dataframe is:\r\nName\u00a0\u00a0\u00a0\u00a0\u00a0 object\r\nDOB\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 object\r\nTeams\u00a0\u00a0\u00a0\u00a0 object\r\nJersey\u00a0\u00a0 \u00a0\u00a0int64\r\ndtype: object<\/pre>\nNow let’s try to convert data type of column ‘DOB’ to datetime64.<\/p>\n
import pandas as sc\r\n# List of Tuples\r\nplayers = [('Jason', '31\/01\/1978', 'Delhi', 155) ,\r\n ('Johny', '26\/05\/1980', 'Hyderabad' , 15) ,\r\n ('Darren', '03\/01\/1992', 'Jamaica',222) ,\r\n ('Finch', '22\/12\/1994','Pune' , 12) ,\r\n ('Krunal', '16\/08\/1979', 'Mumbai' , 58) ,\r\n ('Ravindra', '04\/06\/1985', 'Chennai', 99 ),\r\n ('Dinesh', '23\/02\/1985', 'Kolkata', 10)\r\n ]\r\n# Creation of DataFrame object\r\nPlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Teams', 'Jersey'])\r\nprint(PlayerObj)\r\nprint('Datatype of players dataframe is:')\r\n# Convert the column 'DOB' to datetime64 data type\r\nPlayerObj['DOB'] = sc.to_datetime(PlayerObj['DOB'])\r\nprint(PlayerObj.dtypes)\r\n<\/pre>\nOutput :<\/p>\n
Name\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 DOB\u00a0\u00a0\u00a0\u00a0\u00a0 Teams\u00a0 Jersey\r\n0\u00a0\u00a0\u00a0\u00a0 Jason\u00a0 31\/01\/1978\u00a0\u00a0\u00a0\u00a0\u00a0 Delhi\u00a0\u00a0\u00a0\u00a0 155\r\n1\u00a0\u00a0\u00a0\u00a0 Johny\u00a0 26\/05\/1980\u00a0 Hyderabad\u00a0\u00a0\u00a0\u00a0\u00a0 15\r\n2\u00a0\u00a0\u00a0 Darren\u00a0 03\/01\/1992\u00a0\u00a0\u00a0 Jamaica\u00a0\u00a0\u00a0\u00a0 222\r\n3\u00a0\u00a0\u00a0\u00a0 Finch\u00a0 22\/12\/1994\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Pune\u00a0\u00a0\u00a0\u00a0\u00a0 12\r\n4\u00a0\u00a0\u00a0 Krunal\u00a0 16\/08\/1979\u00a0\u00a0\u00a0\u00a0 Mumbai\u00a0\u00a0\u00a0\u00a0\u00a0 58\r\n5\u00a0 Ravindra\u00a0 04\/06\/1985\u00a0\u00a0\u00a0 Chennai\u00a0\u00a0\u00a0\u00a0\u00a0 99\r\n6\u00a0\u00a0\u00a0 Dinesh\u00a0 23\/02\/1985\u00a0\u00a0\u00a0 Kolkata\u00a0\u00a0\u00a0\u00a0\u00a0 10\r\nDatatype of players dataframe is:\r\nName\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 object\r\nDOB\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 datetime64[ns]\r\nTeams\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 object\r\nJersey\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 int64\r\ndtype: object<\/pre>\n
to_datetime()<\/code> also converts the DOB strings in ISO8601 format to datetime64 type which automatically handles string types like. Henceforth, let’s try to convert data type of string to datetime64:<\/p>\n
DD-MM-YYYY HH:MM AM\/PM\u2019<\/p>\n
\u2018YYYY-MM-DDTHH:MM:SS\u2019<\/p>\n
\u2018YYYY-MM-DDT HH:MM:SS.ssssss\u2019, etc.<\/p>\n
import pandas as sc\r\n# List of Tuples\r\nplayers = [('Jason', '31\/01\/1978 12:00 AM', 'Delhi', 155) ,\r\n ('Johny', '26\/05\/1980 02:00:55', 'Hyderabad' , 15) ,\r\n ('Darren', '03\/01\/1992', 'Jamaica',222) ,\r\n ('Finch', '22\/12\/1994 T23:11:25Z','Pune' , 12)\r\n ]\r\n# Creation of DataFrame object\r\nPlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Teams', 'Jersey'])\r\nprint(PlayerObj)\r\nprint('Datatype of players dataframe is:')\r\n# Convert the column 'DOB' to datetime64 datatype\r\nPlayerObj['DOB'] = sc.to_datetime(PlayerObj['DOB'])\r\nprint(PlayerObj.dtypes)\r\n<\/pre>\nOutput :\r\nName\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 DOB\u00a0\u00a0\u00a0\u00a0\u00a0 Teams\u00a0 Jersey\r\n0\u00a0\u00a0 Jason\u00a0\u00a0\u00a0 31\/01\/1978 12:00 AM\u00a0\u00a0\u00a0\u00a0\u00a0 Delhi\u00a0\u00a0\u00a0\u00a0 155\r\n1\u00a0\u00a0 Johny\u00a0\u00a0\u00a0 26\/05\/1980 02:00:55\u00a0 Hyderabad\u00a0\u00a0\u00a0\u00a0\u00a0 15\r\n2\u00a0 Darren\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 03\/01\/1992\u00a0\u00a0\u00a0 Jamaica\u00a0\u00a0\u00a0\u00a0 222\r\n3\u00a0\u00a0 Finch\u00a0 22\/12\/1994 T23:11:25Z\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Pune\u00a0\u00a0\u00a0\u00a0\u00a0 12\r\nDatatype of players dataframe is:\r\nName\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 object\r\nDOB\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 datetime64[ns]\r\nTeams\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 object\r\nJersey\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 int64\r\ndtype: object<\/pre>\nConvert the Data type of a column from custom format string to datetime64 :<\/h3>\n
We can also have case where the dataframe have columns having dates in custom format like DDMMYYYY, DD\u2013MM\u2013YY and then try to convert string format of custom format to datetime64.<\/p>\n
import pandas as sc\r\n# List of Tuples\r\nplayers = [('Jason', '08091986', 'Delhi', 155),\r\n ('Johny', '11101988', 'Hyderabad', 15)\r\n ]\r\n# Creation of DataFrame object\r\nPlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Teams', 'Jersey'])\r\nprint(PlayerObj)\r\nprint('Datatype of players dataframe is:')\r\n# Convert the column 'DOB' to datetime64 datatype\r\nPlayerObj['DOB'] = sc.to_datetime(PlayerObj['DOB'], format='%m%d%Y')\r\nprint(PlayerObj.dtypes)\r\n<\/pre>\nOutput :\r\nName\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 DOB\u00a0\u00a0\u00a0\u00a0\u00a0 Teams\u00a0 Jersey\r\n0\u00a0 Jason\u00a0 08091986\u00a0\u00a0\u00a0\u00a0\u00a0 Delhi\u00a0\u00a0\u00a0\u00a0 155\r\n1\u00a0 Johny\u00a0 11101988\u00a0 Hyderabad\u00a0\u00a0\u00a0\u00a0\u00a0 15\r\nDatatype of players dataframe is:\r\nName\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 object\r\nDOB\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 datetime64[ns]\r\nTeams\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 object\r\nJersey\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 int64\r\ndtype: object<\/pre>\nConvert the Data type of a column from string to datetime by extracting date & time strings from big string :<\/h3>\n
\u00a0<\/strong>There may be a case where columns may contain: date of birth is 28101982 OR 17071990 is DOB. We have to pass the argument in
pd.to_dataframe()<\/code>, if passed as False <\/em>it will try to match the format\u00a0 anywhere in string. After that let’s convert data type of column DOB as string to datatime64.<\/p>\n
import pandas as sc\r\n# List of Tuples\r\nplayers = [('Jason', 'date of birth is 08091986', 'Delhi', 155),\r\n ('Johny', '11101988 is DOB', 'Hyderabad', 15)\r\n ]\r\n# Creation of DataFrame object\r\nPlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Teams', 'Jersey'])\r\nprint('Datatype of players dataframe is:')\r\n# Convert the column 'DOB' to datetime64 data type\r\nPlayerObj['DOB'] = sc.to_datetime(PlayerObj['DOB'], format='%m%d%Y', exact=False)\r\nprint(PlayerObj)\r\nprint(PlayerObj.dtypes)\r\n<\/pre>\nOutput :\r\nDatatype of players dataframe is:\r\n \u00a0\u00a0 Name\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 DOB\u00a0\u00a0\u00a0\u00a0\u00a0 Teams\u00a0 Jersey\r\n0\u00a0 Jason 1986-08-09\u00a0\u00a0\u00a0\u00a0\u00a0 Delhi\u00a0\u00a0\u00a0\u00a0 155\r\n1\u00a0 Johny 1988-11-10\u00a0 Hyderabad\u00a0\u00a0\u00a0\u00a0\u00a0 15\r\nName\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 object\r\nDOB\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 datetime64[ns]\r\nTeams\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 object\r\nJersey\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 int64\r\ndtype: object<\/pre>\nAnother Example : Extract date & time from big string in a column and add new columns of datetime64 format :<\/h3>\n
import pandas as sc\r\n\r\n# List of Tuples\r\n\r\nplayers = [('Jason', '12:00 AM on the date 08091986', 'Delhi', 155),\r\n\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ('Johny', '11101988 and evening 07:00 PM', 'Hyderabad', 15)\r\n\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ]\r\n\r\n# Creation of DataFrame object\r\n\r\nPlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Teams', 'Jersey'])\r\n\r\nprint('Datatype of players dataframe is:')\r\n\r\n# Convert the column 'DOB' to datetime64 data type\r\n\r\nPlayerObj['DOB_time'] = sc.to_datetime(PlayerObj['DOB'], format='%H:%M %p', exact=False)\r\n\r\nPlayerObj['DOB_date'] = sc.to_datetime(PlayerObj['DOB'], format='%m%d%Y', exact=False)\r\n\r\nprint('New dataframe is:')\r\n\r\nprint(PlayerObj)<\/pre>\nOutput :\r\nDatatype of players dataframe is:\r\nNew dataframe is:\r\nName DOB ... DOB_time DOB_date\r\n0 Jason 12:00 AM on the date 08091986 ... 1900-01-01 12:00:00 1986-08-09\r\n1 Johny 11101988 and evening 07:00 PM ... 1900-01-01 07:00:00 1988-11-10<\/pre>\nIn DOB_time column as we provided time only so it took date as default i.e. 1900-01-01, whereas DOB_date contains the date onle. But both the columns i.e. DOB_time & DOB_date have same data type i.e. datetime64.<\/em><\/p>\n
Handle error while Converting the Data type of a column from string to datetime :<\/h3>\n
To handle the errors while converting data type of column we can pass error arguments like ‘raise’, ‘coerce’, ‘ignore’ to customize the behavior.<\/p>\n
import pandas as sc\r\n# List of Tuples\r\nplayers = [('Jason', '08091986', 'Delhi', 155),\r\n ('Johny', '11101988', 'Hyderabad', 15)\r\n ]\r\n\r\n# Creation of DataFrame object\r\nPlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Team', 'Jersey'])\r\nprint(\"Contents of the original Dataframe : \")\r\nprint(PlayerObj)\r\nprint('Data types of columns in original dataframe')\r\nprint(PlayerObj.dtypes)\r\n# Ignores errors while converting the type\r\nPlayerObj['DOB'] = sc.to_datetime(PlayerObj['DOB'], errors='ignore')\r\nprint(\"Contents of the Dataframe : \")\r\nprint(PlayerObj)\r\nprint('Data types of columns in modified dataframe')\r\nprint(PlayerObj.dtypes)\r\n<\/pre>\nOutput :\r\nContents of the original Dataframe : \r\nName DOB Team Jersey\r\n0 Jason 08091986 Delhi 155\r\n1 Johny 11101988 Hyderabad 15\r\nData types of columns in original dataframe\r\nName object\r\nDOB object\r\nTeam object\r\nJersey int64\r\ndtype: object\r\nContents of the Dataframe : \r\nName DOB Team Jersey\r\n0 Jason 08091986 Delhi 155\r\n1 Johny 11101988 Hyderabad 15\r\nData types of columns in modified dataframe\r\nName object\r\nDOB object\r\nTeam object\r\nJersey int64\r\ndtype: object<\/pre>\nWant to expert in the python programming language? Exploring\u00a0Python Data Analysis using Pandas<\/a>\u00a0tutorial changes your knowledge from basic to advance level in python concepts.<\/p>\n
Read more Articles on Python Data Analysis Using Padas \u2013 Modify a Dataframe<\/strong><\/p>\n
\n
- pandas.apply(): Apply a function to each row\/column in Dataframe<\/a><\/li>\n
- Pandas: Sort rows or columns in Dataframe based on values using Dataframe.sort_values()<\/a><\/li>\n
- Apply a function to single or selected columns or rows in Dataframe<\/a><\/li>\n
- Sort a DataFrame based on column names or row index labels using Dataframe.sort_index() in Pandas<\/a><\/li>\n
- Change data type of single or multiple columns of Dataframe in Python<\/a><\/li>\n
- Change Column & Row names in DataFrame<\/a><\/li>\n
- Convert Dataframe column into to the Index of Dataframe<\/a><\/li>\n
- Convert Dataframe indexes into columns<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"
Converting Dataframe column type from string to date time In this article we will learn to convert data type of dataframe column to from string to datetime where the data can be custom string formats or embedded in big texts. We will also learn how we can handle the error while converting data types. A …<\/p>\n