{"id":7177,"date":"2021-05-30T09:15:29","date_gmt":"2021-05-30T03:45:29","guid":{"rendered":"https:\/\/python-programs.com\/?p=7177"},"modified":"2021-11-22T18:40:43","modified_gmt":"2021-11-22T13:10:43","slug":"read-csv-file-to-dataframe-with-custom-delimiter-in-python","status":"publish","type":"post","link":"https:\/\/python-programs.com\/read-csv-file-to-dataframe-with-custom-delimiter-in-python\/","title":{"rendered":"Read Csv File to Dataframe With Custom Delimiter in Python"},"content":{"rendered":"

Different methods to read CSV files with custom delimiter in python<\/h2>\n

In this article, we will see what are CSV files, how to use them in pandas, and then we see how and why to use custom delimiter with CSV files in pandas.<\/p>\n

CSV file<\/h3>\n

A simple way to store big data sets is to use CSV files (comma-separated files).CSV files contain plain text and is a well know format that can be read by everyone including Pandas. Generally, CSV files contain columns separated by commas, but they can also contain content separated by a tab, or underscore or hyphen, etc. Generally, CSV files look like this:-<\/p>\n

total_bill,tip,sex,smoker,day,time,size\r\n16.99,1.01,Female,No,Sun,Dinner,2\r\n10.34,1.66,Male,No,Sun,Dinner,3\r\n21.01,3.5,Male,No,Sun,Dinner,3\r\n23.68,3.31,Male,No,Sun,Dinner,2\r\n24.59,3.61,Female,No,Sun,Dinner,4<\/pre>\n

Here we see different columns and their values are separated by commas.<\/p>\n

Use CSV file in pandas<\/h3>\n

read_csv() method is used to import and read CSV files in pandas. After this step, a CSV file act as a normal dataframe and we can use operation in CSV file as we use in dataframe.<\/p>\n

syntax:\u00a0 pandas.<\/span>read_csv<\/span>(<\/span>filepath_or_buffer, sep=<\/span>‘, ‘<\/span>, delimiter=<\/span>None<\/span>, header=<\/span>‘infer’<\/span>, names=<\/span>None<\/span>, index_col=<\/span>None<\/span>, ….<\/span>)<\/span><\/p>\n

','<\/code> is default separator in read_csv() method.<\/p>\n

Let see this with an example<\/p>\n

import pandas as pd\r\ndata=pd.read_csv('example1.csv')\r\ndata.head()<\/pre>\n

Output<\/p>\n\n\n\n\n\n\n\n\n\n
<\/th>\ntotal_bill<\/th>\ntip<\/th>\nsex<\/th>\nsmoker<\/th>\nday<\/th>\ntime<\/th>\nsize<\/th>\n<\/tr>\n<\/thead>\n
0<\/th>\n16.99<\/td>\n1.01<\/td>\nFemale<\/td>\nNo<\/td>\nSun<\/td>\nDinner<\/td>\n2<\/td>\n<\/tr>\n
1<\/th>\n10.34<\/td>\n1.66<\/td>\nMale<\/td>\nNo<\/td>\nSun<\/td>\nDinner<\/td>\n3<\/td>\n<\/tr>\n
2<\/th>\n21.01<\/td>\n3.50<\/td>\nMale<\/td>\nNo<\/td>\nSun<\/td>\nDinner<\/td>\n3<\/td>\n<\/tr>\n
3<\/th>\n23.68<\/td>\n3.31<\/td>\nMale<\/td>\nNo<\/td>\nSun<\/td>\nDinner<\/td>\n2<\/td>\n<\/tr>\n
4<\/th>\n24.59<\/td>\n3.61<\/td>\nFemale<\/td>\nNo<\/td>\nSun<\/td>\nDinner<\/td>\n4<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n

Why use separator or delimiter with read_csv() method<\/h3>\n

Till now we understand that generally, CSV files contain data separated data that is separated by comma but sometimes it can contain data separated by tab or hyphen, etc. So to handle this we use a seperator. Let understand this with the help of an example. Suppose we have a CSV file separated by an underscore and we try to read that CSV file without using a separator or with using default separator i.e. comma. So let see what happens in this case.<\/p>\n

\"total_bill\"_tip_sex_smoker_day_time_size\r\n16.99_1.01_Female_No_Sun_Dinner_2\r\n10.34_1.66_Male_No_Sun_Dinner_3\r\n21.01_3.5_Male_No_Sun_Dinner_3\r\n23.68_3.31_Male_No_Sun_Dinner_2\r\n24.59_3.61_Female_No_Sun_Dinner_4\r\n25.29_4.71_Male_No_Sun_Dinner_4\r\n8.77_2_Male_No_Sun_Dinner_2<\/pre>\n

Suppose this is our CSV file separated by an underscore.<\/p>\n\n\n\n\n\n\n\n\n\n
<\/th>\ntotal_bill_tip_sex_smoker_day_time_size<\/th>\n<\/tr>\n<\/thead>\n
0<\/th>\n16.99_1.01_Female_No_Sun_Dinner_2<\/td>\n<\/tr>\n
1<\/th>\n10.34_1.66_Male_No_Sun_Dinner_3<\/td>\n<\/tr>\n
2<\/th>\n21.01_3.5_Male_No_Sun_Dinner_3<\/td>\n<\/tr>\n
3<\/th>\n23.68_3.31_Male_No_Sun_Dinner_2<\/td>\n<\/tr>\n
4<\/th>\n24.59_3.61_Female_No_Sun_Dinner_4<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n

Now see when we didn’t use a default separator here how unordered our data look like. So to solve this issue we use Separator. Now we will see when we use a separator to underscore how we get the same data in an ordered manner.<\/p>\n

import pandas as pd \r\ndata=pd.read_csv('example2.csv',sep = '_',engine = 'python') \r\ndata.head()<\/pre>\n

Output<\/p>\n\n\n\n\n\n\n\n\n\n
<\/th>\ntotal_bill<\/th>\ntip<\/th>\nsex<\/th>\nsmoker<\/th>\nday<\/th>\ntime<\/th>\nsize<\/th>\n<\/tr>\n<\/thead>\n
0<\/th>\n16.99<\/td>\n1.01<\/td>\nFemale<\/td>\nNo<\/td>\nSun<\/td>\nDinner<\/td>\n2<\/td>\n<\/tr>\n
1<\/th>\n10.34<\/td>\n1.66<\/td>\nMale<\/td>\nNo<\/td>\nSun<\/td>\nDinner<\/td>\n3<\/td>\n<\/tr>\n
2<\/th>\n21.01<\/td>\n3.50<\/td>\nMale<\/td>\nNo<\/td>\nSun<\/td>\nDinner<\/td>\n3<\/td>\n<\/tr>\n
3<\/th>\n23.68<\/td>\n3.31<\/td>\nMale<\/td>\nNo<\/td>\nSun<\/td>\nDinner<\/td>\n2<\/td>\n<\/tr>\n
4<\/th>\n24.59<\/td>\n3.61<\/td>\nFemale<\/td>\nNo<\/td>\nSun<\/td>\nDinner<\/td>\n4<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n

So this example is sufficient to understand why there is a need of using a separator of delimiter in pandas while working on a CSV file.<\/p>\n

Now suppose there is a CSV file in while data is separated by multiple separators. For example:-<\/p>\n

totalbill_tip,sex:smoker,day_time,size\r\n16.99,1.01:Female|No,Sun,Dinner,2\r\n10.34,1.66,Male,No|Sun:Dinner,3\r\n21.01:3.5_Male,No:Sun,Dinner,3\r\n23.68,3.31,Male|No,Sun_Dinner,2\r\n24.59:3.61,Female_No,Sun,Dinner,4\r\n25.29,4.71|Male,No:Sun,Dinner,4<\/pre>\n

Here we see there are multiple seperator used. So here we can not use any custom delimiter. To solve this problem regex or regular expression is used. Let see with the help of an example.<\/p>\n

import pandas as pd \r\ndata=pd.read_csv('example4.csv',sep = '[:, |_]') \r\ndata.head()<\/pre>\n

Output<\/p>\n\n\n\n\n\n\n\n\n\n
<\/th>\ntotalbill<\/th>\ntip<\/th>\nsex<\/th>\nsmoker<\/th>\nday<\/th>\ntime<\/th>\nsize<\/th>\n<\/tr>\n<\/thead>\n
0<\/th>\n16.99<\/td>\n1.01<\/td>\nFemale<\/td>\nNo<\/td>\nSun<\/td>\nDinner<\/td>\n2<\/td>\n<\/tr>\n
1<\/th>\n10.34<\/td>\n1.66<\/td>\nMale<\/td>\nNo<\/td>\nSun<\/td>\nDinner<\/td>\n3<\/td>\n<\/tr>\n
2<\/th>\n21.01<\/td>\n3.50<\/td>\nMale<\/td>\nNo<\/td>\nSun<\/td>\nDinner<\/td>\n3<\/td>\n<\/tr>\n
3<\/th>\n23.68<\/td>\n3.31<\/td>\nMale<\/td>\nNo<\/td>\nSun<\/td>\nDinner<\/td>\n2<\/td>\n<\/tr>\n
4<\/th>\n24.59<\/td>\n3.61<\/td>\nFemale<\/td>\nNo<\/td>\nSun<\/td>\nDinner<\/td>\n4<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n

When we notice we pass a list of separators in the sep parameter that is contained in our CSV file.<\/p>\n

Want to expert in the python programming language? Exploring\u00a0Python Data Analysis using Pandas<\/a>\u00a0tutorial changes your knowledge from basic to advance level in python concepts.<\/p>\n