Read Csv File to Dataframe With Custom Delimiter in Python

Different methods to read CSV files with custom delimiter in python

In this article, we will see what are CSV files, how to use them in pandas, and then we see how and why to use custom delimiter with CSV files in pandas.

CSV file

A simple way to store big data sets is to use CSV files (comma-separated files).CSV files contain plain text and is a well know format that can be read by everyone including Pandas. Generally, CSV files contain columns separated by commas, but they can also contain content separated by a tab, or underscore or hyphen, etc. Generally, CSV files look like this:-

total_bill,tip,sex,smoker,day,time,size
16.99,1.01,Female,No,Sun,Dinner,2
10.34,1.66,Male,No,Sun,Dinner,3
21.01,3.5,Male,No,Sun,Dinner,3
23.68,3.31,Male,No,Sun,Dinner,2
24.59,3.61,Female,No,Sun,Dinner,4

Here we see different columns and their values are separated by commas.

Use CSV file in pandas

read_csv() method is used to import and read CSV files in pandas. After this step, a CSV file act as a normal dataframe and we can use operation in CSV file as we use in dataframe.

syntax:  pandas.read_csv(filepath_or_buffer, sep=‘, ‘, delimiter=None, header=‘infer’, names=None, index_col=None, ….)

',' is default separator in read_csv() method.

Let see this with an example

import pandas as pd
data=pd.read_csv('example1.csv')
data.head()

Output

total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4

Why use separator or delimiter with read_csv() method

Till now we understand that generally, CSV files contain data separated data that is separated by comma but sometimes it can contain data separated by tab or hyphen, etc. So to handle this we use a seperator. Let understand this with the help of an example. Suppose we have a CSV file separated by an underscore and we try to read that CSV file without using a separator or with using default separator i.e. comma. So let see what happens in this case.

"total_bill"_tip_sex_smoker_day_time_size
16.99_1.01_Female_No_Sun_Dinner_2
10.34_1.66_Male_No_Sun_Dinner_3
21.01_3.5_Male_No_Sun_Dinner_3
23.68_3.31_Male_No_Sun_Dinner_2
24.59_3.61_Female_No_Sun_Dinner_4
25.29_4.71_Male_No_Sun_Dinner_4
8.77_2_Male_No_Sun_Dinner_2

Suppose this is our CSV file separated by an underscore.

total_bill_tip_sex_smoker_day_time_size
0 16.99_1.01_Female_No_Sun_Dinner_2
1 10.34_1.66_Male_No_Sun_Dinner_3
2 21.01_3.5_Male_No_Sun_Dinner_3
3 23.68_3.31_Male_No_Sun_Dinner_2
4 24.59_3.61_Female_No_Sun_Dinner_4

Now see when we didn’t use a default separator here how unordered our data look like. So to solve this issue we use Separator. Now we will see when we use a separator to underscore how we get the same data in an ordered manner.

import pandas as pd 
data=pd.read_csv('example2.csv',sep = '_',engine = 'python') 
data.head()

Output

total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4

So this example is sufficient to understand why there is a need of using a separator of delimiter in pandas while working on a CSV file.

Now suppose there is a CSV file in while data is separated by multiple separators. For example:-

totalbill_tip,sex:smoker,day_time,size
16.99,1.01:Female|No,Sun,Dinner,2
10.34,1.66,Male,No|Sun:Dinner,3
21.01:3.5_Male,No:Sun,Dinner,3
23.68,3.31,Male|No,Sun_Dinner,2
24.59:3.61,Female_No,Sun,Dinner,4
25.29,4.71|Male,No:Sun,Dinner,4

Here we see there are multiple seperator used. So here we can not use any custom delimiter. To solve this problem regex or regular expression is used. Let see with the help of an example.

import pandas as pd 
data=pd.read_csv('example4.csv',sep = '[:, |_]') 
data.head()

Output

totalbill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4

When we notice we pass a list of separators in the sep parameter that is contained in our CSV file.

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.