Mayank Gupta

Append Add Row to Dataframe in Pandas

Append/Add Row to Dataframe in Pandas – dataframe.append() | How to Insert Rows to Pandas Dataframe?

Worried about how to append or add rows to a dataframe in Pandas? Then, this tutorial will guide you completely on how to append rows to a dataframe in Pandas Python using the function dataframe.append() We have listed the various methods for appending rows to a dataframe. In this tutorial, we will discuss how to append or add rows to the dataframe in Pandas. Before going to the main concept let us discuss some basic concepts about pandas and Dataframes.

Pandas – Definition

Pandas is a package in python that is used to analyze data in a very easy way. The reason why pandas are so famous is that it is very easy to use. But we can not directly use the pandas’ package in our program. To use this package first we have to import it.

Dataframe – Definition

Dataframe is a 2D data structure that store or represent the data in the 2D form or simply say in tabular form. The tabular form consists of rows, columns, and actual data. By using pandas we can manipulate the data as we want i.e we can see as many columns as we want or as many rows as we want. We can group the data or filter the data.

Let us understand both dataframe and pandas with an easy example

import pandas as pd
d={"Name":["Mayank","Raj","Rahul","Samar"],
   "Marks":[90,88,97,78]
  }
df=pd.DataFrame(d)
print(df)

Output

    Name  Marks
0  Mayank     90
1     Raj     88
2   Rahul     97
3   Samar     78

Here we see that first, we import our pandas package then we create a dictionary, and out of this dictionary, we create our dataframe. When we see our dataframe we see that it consists of rows and columns and data. There are many ways to create a dataframe like importing excel or CSV files or through a dictionary but this is not the main concern of this article.

Before understanding the concept of appending rows to a dataframe first we have to know a little bit about the append() method.

append() method

append() method is used to append rows of other dataframe at the end of the original or given dataframe. It returns a new dataframe object. If some columns are not presented in the original dataframe but presented in a new dataframe then-new column will also be added in the dataframe and data of that column will become NAN.
Syntax: DataFrame.append(other, ignore_index=False, verify_integrity=False, sort=None)

Ways on Pandas append row to Dataframe

Method 1- How to Add dictionary as a row to dataframe

In this method, we see how we can append dictionaries as rows in pandas dataframe. It is a pretty simple way. We have to pass a dictionary in the append() method and our work is done. That dictionary is passed as an argument to other the parameter in the append method. Let us see this with an example.

Add dictionary as a row to dataframe in Pandas

d={"Name":["Mayank","Raj","Rahul","Samar"],
   "Marks":[90,88,97,78]
  }
df=pd.DataFrame(d)
print(df)
print("---------------")
new_d={"Name":"Gaurav",
      "Marks":76}
new_df=df.append(new_d,ignore_index=True)
print(new_df)

Output:

     Name  Marks
0  Mayank     90
1     Raj     88
2   Rahul     97
3   Samar     78
---------------
     Name  Marks
0  Mayank     90
1     Raj     88
2   Rahul     97
3   Samar     78
4  Gaurav     76
Explanation:
In this example, we see how we can append a dictionary in our original dataframe. By this method, our original dataframe will not affect that why we store the new Dataframe in a new variable so that we can analyze the changes.
Instead of assigning it to a new variable, we can assign it to the original dataframe in this case our original dataframe gets modify. It means that the append() method is not inplace.
Note: Passing ignore_index=True is necessary while passing dictionary or series otherwise a TypeError error will come.

Method 2 – Add Series as a row in the dataframe

This is another method to append rows in the dataframe. Let us see why this method is needed.

Add Series as a row in the dataframe in Pandas

d={"Name":["Mayank","Raj","Rahul","Samar"],
   "Marks":[90,88,97,78]
  }
df=pd.DataFrame(d)
print(df)
print("---------------")
new_d={"Name":["Gaurav","Vijay"],
      "Marks":[76,88]}
new_df=df.append(new_d,ignore_index=True)
print(new_df)

Output:

    Name  Marks
0  Mayank     90
1     Raj     88
2   Rahul     97
3   Samar     78
---------------
              Name     Marks
0           Mayank        90
1              Raj        88
2            Rahul        97
3            Samar        78
4  [Gaurav, Vijay]  [76, 88]

If we want to add multiple rows at one time and we try it using a dictionary then we get output like this then we get the output as shown above.

To solve this issue we use series. Let us understand what series means.

Series

Series is a 1-D array that stores a single column or row of data in a dataframe.

syntax: pandas.Series( data, index, dtype, copy)

series=pd.Series(['Ajay','Vijay'])
print(series)
print(type(series))

Output

0     Ajay
1    Vijay
dtype: object
<class 'pandas.core.series.Series'>

That is how we can create a series in pandas. Now we see how we can append series in pandas dataframe. It is similar like as we pass our dictionary. We can simply pass series as an argument in the append() function. Let see this with an example.

d={"Name":["Mayank","Raj","Rahul","Samar"],
   "Marks":[90,88,97,78]
  }
df=pd.DataFrame(d)
print(df)
print("---------------")
series=[pd.Series(['Gaurav',88], index=df.columns ) ,
        pd.Series(['Vijay', 99], index=df.columns )]
new_df=df.append(series,ignore_index=True)
print(new_df)

Output:

     Name  Marks
0  Mayank     90
1     Raj     88
2   Rahul     97
3   Samar     78
---------------
     Name  Marks
0  Mayank     90
1     Raj     88
2   Rahul     97
3   Samar     78
4  Gaurav     88
5   Vijay     99

We see that by this method we solve the problem to add multiple rows at a time that we face in the dictionary.

Method 3 – How to Add row from one dataframe to another dataframe

To understand this method first we have to understand about concepts of loc.

loc[ ]

It is used to access groups of rows and columns by values. Let us understand this concept with help of an example.

students = [ ('Mayank',98) ,
             ('Raj', 75) ,
             ('Rahul', 87) ,
             ('Samar', 78)]
df = pd.DataFrame(  students, 
                    columns = ['Name' , 'Marks'],
                    index=['a', 'b', 'c' , 'd']) 
print(df)
print("------------------")
# If we want only row 'c' and all columns
print(df.loc[['c'],:])
print("------------------")
# If we want only row 'c' and only column 'Name'
print(df.loc['c']['Name'])
print("------------------")
# If we want only row 'c' and 'd' and all columns
print(df.loc[['c','d'],:])
print("------------------")
# If we want only row 'c' and 'd' and only column 'Name'
print(df.loc[['c','d'],['Name']])
print("------------------")

Output:

     Name  Marks
a  Mayank     98
b     Raj     75
c   Rahul     87
d   Samar     78
------------------
    Name  Marks
c  Rahul     87
------------------
Rahul
------------------
    Name  Marks
c  Rahul     87
d  Samar     78
------------------
    Name
c  Rahul
d  Samar
------------------

This example is very helpful to understand how loc works in pandas.

Now it can be very easy to understand how we can add rows of one dataframe to another dataframe. Let us see this with an example.

students1 = [ ('Mayank',98) ,
             ('Raj', 75) ,
             ('Rahul', 87) ,
             ('Samar', 78)]
df1 = pd.DataFrame(  students, 
                    columns = ['Name' , 'Marks'],
                    index=['a', 'b', 'c' , 'd']) 
print(df1)
print("------------------")
students2 = [ ('Vijay',94) ,
             ('Sunil', 76),
             ('Sanjay', 80)
            ]
df2= pd.DataFrame(  students2, 
                    columns = ['Name' , 'Marks'],
                    index=['a', 'b','c']) 
print(df2)

print("------------------")
new_df=df1.append(df2.loc[['a','c'],:],ignore_index=True)
print(new_df)

Output:

     Name  Marks
a  Mayank     98
b     Raj     75
c   Rahul     87
d   Samar     78
------------------
     Name  Marks
a   Vijay     94
b   Sunil     76
c  Sanjay     80
------------------
     Name  Marks
0  Mayank     98
1     Raj     75
2   Rahul     87
3   Samar     78
4   Vijay     94
5  Sanjay     80

In this example, we see how we easily append rows ‘a’ and ‘c’ of df2 in df1.

Method 4 – How to Add a row in the dataframe at index position using iloc[]

iloc[]

iloc[] in pandas allows us to retrieve a particular value belonging to a row and column using the index values assigned to it. IT will raise Index errors if a requested indexer is out-of-bounds.

students1 = [ ('Mayank',98) ,
             ('Raj', 75) ,
             ('Rahul', 87) ,
             ('Samar', 78)]
df1 = pd.DataFrame(  students, 
                    columns = ['Name' , 'Marks'],
                    index=['a', 'b', 'c' , 'd']) 
print(df1.iloc[0])

Output

Name     Mayank
Marks        98
Name: a, dtype: object

This example shows how we can access any row using an index.

Note: We use the index in iloc and not the column name.

Now let us see how we can append row in dataframe using iloc

students1 = [ ('Mayank',98) ,
('Raj', 75) ,
('Rahul', 87) ,
('Samar', 78)]
df1 = pd.DataFrame( students, 
columns = ['Name' , 'Marks'],
index=['a', 'b', 'c' , 'd']) 
print("Original dataframe")
print(df1)
print("------------------")
df1.iloc[2] = ['Vijay', 80]
print("New dataframe")
print(df1)

Output:

Original dataframe
     Name  Marks
a  Mayank     98
b     Raj     75
c   Rahul     87
d   Samar     78
------------------
New dataframe
     Name  Marks
a  Mayank     98
b     Raj     75
c   Vijay     80
d   Samar     78

This example shows how we add a column in the dataframe at a specific index using iloc.

So these are the methods to add or append rows in the dataframe.

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas – Add Contents to a Dataframe

Pandas Drop Rows With NaNMissing Values in any or Selected Columns of Dataframe

Pandas: Drop Rows With NaN/Missing Values in any or Selected Columns of Dataframe

Pandas provide several data structures and operations to manipulate data and time series. There might be instances in which some data can go missing and pandas use two values to denote the missing data namely None, NaN. You will come across what does None and Nan indicate. In this tutorial we will discuss the dropna() function, why is it necessary to remove rows which contain missing values or NaN, and different methods to drop rows with NaN or Missing values in any or selected column in the dataframe.

dropna() function

The dropna() function is used to analyze and drop rows or columns having NaN or missing values in different ways.

syntax:  DataFrameName.dropna(axis, how, thresh, subset, inplace)

Parameters:

1) axis: If the axis is 0 rows with missing or NaN values will be dropped else if axis=1 columns with NaN or missing values will be dropped.

2) how: how to take a string as a parameter ‘any’ or ‘all’.  ‘any’ is used if any NaN value is present otherwise ‘all’ is used if all values are NaN.

3) thresh: It tells the minimum amount of NaN values that is to be dropped.

4) inplace: If inplace is true chance will be made in the existing dataset otherwise changes will be made in different datasets.

The Necessity to remove NaN or Missing values

NaN stands for Not a Number. It is used to signify whether a particular cell contains any data or not. When we work on different datasets we found that there are some cells that may have NaN or missing values. If we work on that type of dataset then the chances are high that we do not get an accurate result. Hence while working on any dataset we check whether our datasets contain any missing values or not. If it contains NaN values we will remove it so as to get results with more accuracy.

How to drop rows of Pandas DataFrame whose value in a certain column is NaN or a Missing Value?

There are different methods to drop rows of Pandas Dataframe whose value is missing or Nan. All 4 methods are explained with enough examples so that you can better understand the concept and apply the conceptual knowledge to other programs on your own.

Method 1: Drop Rows with missing value / NaN in any column

In this method, we will see how to drop rows with missing or NaN values in any column. As we know in all our methods dropna() function is going to be used hence we have to play with parameters. By default value of the axis is 0 and how is ‘any’ hence dropna() function without any parameter will going to be used to drop rows with missing or NaN values in any column. Let see this with the help of an example.

import pandas as pd
import numpy as np
students = [('Raj', 24, 'Mumbai', 95) ,
            ('Rahul', 21, 'Delhi' , 97) ,
            ('Aadi', 22, np.NaN, 81) ,
            ('Abhay', np.NaN,'Rajasthan' , np.NaN) ,
            ('Ajjet', 21, 'Delhi' , 74)]
# Create a DataFrame object
df = pd.DataFrame(  students, 
                    columns=['Name', 'Age', 'City', 'Marks'])
print("Original Dataframe\n")
print(df,'\n')
new_df=df.dropna()
print("New Dataframe\n")
print(new_df)

How to Drop Rows with missing valueNaN in any column of Pandas Dataframe

Output

Original Dataframe

    Name   Age       City  Marks
0    Raj  24.0     Mumbai   95.0
1  Rahul  21.0      Delhi   97.0
2   Aadi  22.0        NaN   81.0
3  Abhay   NaN  Rajasthan    NaN
4  Ajjet  21.0      Delhi   74.0 

New Dataframe

    Name   Age    City  Marks
0    Raj  24.0  Mumbai   95.0
1  Rahul  21.0   Delhi   97.0
4  Ajjet  21.0   Delhi   74.0

Here we see that we get only those rows that don’t have any NaN or missing value.

Method 2: Drop Rows in dataframe which has all values as NaN

In this method, we have to drop only those rows in which all the values are NaN or missing. Hence we have to only pass how as an argument with value ‘all’ and all the parameters work with their default values. Let see this with an example.

import pandas as pd
import numpy as np
students = [('Raj', 24, 'Mumbai', 95) ,
            ('Rahul', 21, 'Delhi' , 97) ,
            ('Aadi', 22, np.NaN, 81) ,
            ('Abhay', np.NaN,'Rajasthan' , np.NaN) ,
            ('Ajjet', 21, 'Delhi' , 74),
            (np.NaN,np.NaN,np.NaN,np.NaN),
            ('Aman',np.NaN,np.NaN,76)]
# Create a DataFrame object
df = pd.DataFrame(  students, 
                    columns=['Name', 'Age', 'City', 'Marks'])
print("Original Dataframe\n")
print(df,'\n')
new_df=df.dropna(how='all')
print("New Dataframe\n")
print(new_df)

 

How to Drop Rows in dataframe which has all values as NaN in Pandas Dataframe

Output

Original Dataframe

    Name   Age       City  Marks
0    Raj  24.0     Mumbai   95.0
1  Rahul  21.0      Delhi   97.0
2   Aadi  22.0        NaN   81.0
3  Abhay   NaN  Rajasthan    NaN
4  Ajjet  21.0      Delhi   74.0
5    NaN   NaN        NaN    NaN
6   Aman   NaN        NaN   76.0 

New Dataframe

    Name   Age       City  Marks
0    Raj  24.0     Mumbai   95.0
1  Rahul  21.0      Delhi   97.0
2   Aadi  22.0        NaN   81.0
3  Abhay   NaN  Rajasthan    NaN
4  Ajjet  21.0      Delhi   74.0
6   Aman   NaN        NaN   76.0

Here we see that row 5 is dropped because it has all the values as NaN.

Method 3: Drop Rows with any missing value in selected columns only

In this method, we see how to drop rows with any of the NaN values in the selected column only. Here also axis and how to take default value but we have to give a list of columns in the subset in which we want to perform our operation. Let see this with the help of an example.

import pandas as pd
import numpy as np
students = [('Raj', 24, 'Mumbai', 95) ,
            ('Rahul', 21, 'Delhi' , 97) ,
            ('Aadi', 22, np.NaN, 81) ,
            ('Abhay', np.NaN,'Rajasthan' , np.NaN) ,
            ('Ajjet', 21, 'Delhi' , 74),
            (np.NaN,np.NaN,np.NaN,np.NaN),
            ('Aman',np.NaN,np.NaN,76)]
# Create a DataFrame object
df = pd.DataFrame(  students, 
                    columns=['Name', 'Age', 'City', 'Marks'])
print("Original Dataframe\n")
print(df,'\n')
new_df=df.dropna(subset=['Name', 'Age'])
print("New Dataframe\n")
print(new_df)

How to Drop Rows with any missing value in selected columns only in Pandas Dataframe

Output

Original Dataframe

    Name   Age       City  Marks
0    Raj  24.0     Mumbai   95.0
1  Rahul  21.0      Delhi   97.0
2   Aadi  22.0        NaN   81.0
3  Abhay   NaN  Rajasthan    NaN
4  Ajjet  21.0      Delhi   74.0
5    NaN   NaN        NaN    NaN
6   Aman   NaN        NaN   76.0 

New Dataframe

    Name   Age    City  Marks
0    Raj  24.0  Mumbai   95.0
1  Rahul  21.0   Delhi   97.0
2   Aadi  22.0     NaN   81.0
4  Ajjet  21.0   Delhi   74.0

Here we see in rows 3,5 and 6 columns ‘Name’ and ‘Age’ has NaN or missing values so these columns are dropped.

Method 4: Drop Rows with missing values or NaN in all the selected columns

In this method we see how to drop rows that have all the values as NaN or missing values in a select column i.e if we select two columns ‘A’ and ‘B’ then both columns must have missing values. Here we have to pass a list of columns in the subset and ‘all’ in how. Let see this with the help of an example.

import pandas as pd
import numpy as np
students = [('Raj', 24, 'Mumbai', 95) ,
            ('Rahul', 21, 'Delhi' , 97) ,
            ('Aadi', 22, np.NaN, 81) ,
            ('Abhay', np.NaN,'Rajasthan' , np.NaN) ,
            ('Ajjet', 21, 'Delhi' , 74),
            (np.NaN,np.NaN,np.NaN,np.NaN),
            ('Aman',np.NaN,np.NaN,76)]
# Create a DataFrame object
df = pd.DataFrame(  students, 
                    columns=['Name', 'Age', 'City', 'Marks'])
print("Original Dataframe\n")
print(df,'\n')
new_df=df.dropna(how='all',subset=['Name', 'Age'])
print("New Dataframe\n")
print(new_df)

How to Drop Rows with missing values or NaN in all the selected columns in Pandas Dataframe

Output

Original Dataframe

    Name   Age       City  Marks
0    Raj  24.0     Mumbai   95.0
1  Rahul  21.0      Delhi   97.0
2   Aadi  22.0        NaN   81.0
3  Abhay   NaN  Rajasthan    NaN
4  Ajjet  21.0      Delhi   74.0
5    NaN   NaN        NaN    NaN
6   Aman   NaN        NaN   76.0 

New Dataframe

    Name   Age       City  Marks
0    Raj  24.0     Mumbai   95.0
1  Rahul  21.0      Delhi   97.0
2   Aadi  22.0        NaN   81.0
3  Abhay   NaN  Rajasthan    NaN
4  Ajjet  21.0      Delhi   74.0
6   Aman   NaN        NaN   76.0

Here we see that only row 7 has NaN value in both the columns hence it is dropped, while row 3 and row 6 have NaN value only in the age column hence it is not dropped.

So these are the methods to drop rows having all values as NaN or selected value as NaN.

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Pandas – Remove Contents from a Dataframe

Python: Add a Column to an Existing CSV File

Methods to add a column to an existing CSV File

In this article, we will discuss how to add a column to an existing CSV file using csv.reader and csv.DictWriter  classes. Apart from appending the columns, we will also discuss how to insert columns in between other columns of the existing CSV file.

Original CSV file content

total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
  • Method 1-Add a column with the same values to an existing CSV file

In this, we see how we make one column and add it to our CSV file but all the values in this column are the same.

Steps will be to append a column in CSV file are,

  1. Open ‘input.csv’ file in read mode and create csv.reader object for this CSV file
  2. Open ‘output.csv’ file in write mode and create csv.writer object for this CSV file
  3. Using reader object, read the ‘input.csv’ file line by line
  4. For each row (read like a list ), append default text in the list.
  5. Write this updated list / row in the ‘output.csv’ using csv.writer object for this file.
  6. Close both input.csv and output.csv file.

Let see this with the help of an example

from csv import writer
from csv import reader
default_text = 'New column'
# Open the input_file in read mode and output_file in write mode
with open('example1.csv', 'r') as read_obj, \
        open('output_1.csv', 'w', newline='') as write_obj:
    # Create a csv.reader object from the input file object
    csv_reader = reader(read_obj)
    # Create a csv.writer object from the output file object
    csv_writer = writer(write_obj)
    # Read each row of the input csv file as list
    for row in csv_reader:
        # Append the default text in the row / list
        row.append(default_text)
        # Add the updated row / list to the output file
        csv_writer.writerow(row)
output_data=pd.read_csv('output_1.csv')
output_data.head()

Output

total_bill tip sex smoker day time size New column
0 16.99 1.01 Female No Sun Dinner 2 New column
1 10.34 1.66 Male No Sun Dinner 3 New column
2 21.01 3.50 Male No Sun Dinner 3 New column
3 23.68 3.31 Male No Sun Dinner 2 New column
4 24.59 3.61 Female No Sun Dinner 4 New column

Here we see that new column is added but all value in this column is same.

Now we see how we can add different values in the column.

  •  Method 2-Add a column to an existing CSV file, based on values from other columns

In this method how we can make a new column but in this column the value we add will be a combination of two or more columns. As we know there is no direct function to achieve so we have to write our own function to achieve this task. Let see the code for this.

from csv import writer
from csv import reader
def add_column_in_csv(input_file, output_file, transform_row):
    """ Append a column in existing csv using csv.reader / csv.writer classes"""
    # Open the input_file in read mode and output_file in write mode
    with open(input_file, 'r') as read_obj, \
            open(output_file, 'w', newline='') as write_obj:
        # Create a csv.reader object from the input file object
        csv_reader = reader(read_obj)
        # Create a csv.writer object from the output file object
        csv_writer = writer(write_obj)
        # Read each row of the input csv file as list
        for row in csv_reader:
            # Pass the list / row in the transform function to add column text for this row
            transform_row(row, csv_reader.line_num)
            # Write the updated row / list to the output file
            csv_writer.writerow(row)
add_column_in_csv('example1.csv', 'output_2.csv', lambda row, line_num: row.append(row[0] + '__' + row[1]))
output_data=pd.read_csv('output_2.csv')
output_data.head()

Output

total_bill tip sex smoker day time size total_bill__tip
0 16.99 1.01 Female No Sun Dinner 2 16.99__1.01
1 10.34 1.66 Male No Sun Dinner 3 10.34__1.66
2 21.01 3.50 Male No Sun Dinner 3 21.01__3.5
3 23.68 3.31 Male No Sun Dinner 2 23.68__3.31
4 24.59 3.61 Female No Sun Dinner 4 24.59__3.61

Here we see the new column is formed as the combination of the values of the 1st and 2nd column.

Explanation:

In the Lambda function, we received each row as a list and the line number. It then added a value in the list and the value is a merger of the first and second value of the list. It appended the column in the contents of example1.csv by merging values of the first and second columns and then saved the changes as output_2.csv files.

  • Method 3-Add a list as a column to an existing csv file

In this method, we will add our own value in the column by making a list of our values and pass this into the function that we will make. Let see the code for this.

from csv import writer
from csv import reader
def add_column_in_csv(input_file, output_file, transform_row):
    """ Append a column in existing csv using csv.reader / csv.writer classes"""
    # Open the input_file in read mode and output_file in write mode
    with open(input_file, 'r') as read_obj, \
            open(output_file, 'w', newline='') as write_obj:
        # Create a csv.reader object from the input file object
        csv_reader = reader(read_obj)
        # Create a csv.writer object from the output file object
        csv_writer = writer(write_obj)
        # Read each row of the input csv file as list
        for row in csv_reader:
            # Pass the list / row in the transform function to add column text for this row
            transform_row(row, csv_reader.line_num)
            # Write the updated row / list to the output file
            csv_writer.writerow(row)
l=[]
l.append("New Column")
rows = len(data.axes[0])
for i in range(rows):
    val=i+1
    l.append(val)
add_column_in_csv('example1.csv', 'output_3.csv', lambda row, line_num: row.append(l[line_num - 1]))
output_data=pd.read_csv('output_3.csv')
output_data.head()

Output

total_bill tip sex smoker day time size New Column
0 16.99 1.01 Female No Sun Dinner 2 1
1 10.34 1.66 Male No Sun Dinner 3 2
2 21.01 3.50 Male No Sun Dinner 3 3
3 23.68 3.31 Male No Sun Dinner 2 4
4 24.59 3.61 Female No Sun Dinner 4 5

Explanation

In the Lambda function, we received each row as a list and the line number. It then added a value in the list and the value is an entry from our list l at index  line_num – 1.Thus all the entries in the list l are added as a column in the CSV.

So these are some of the methods to add new column in csv.

Python: How to create a zip archive from multiple files or Directory

Method to create zip archive from multiple files or directory in python

In this article, we discuss how we can create a zip archive of multiple files or directories in python. To understand this let us understand about ZipFile class.

ZipFile class

To execute this program we have to import ZipFile class of zipfile module.ZipFile is a class of zipfile modules for reading and writing zip files.

syntax: zipfile.ZipFile(file, mode='r', compression=ZIP_STORED, allowZip64=True, compresslevel=None, *, strict_timestamps=True)

Create a zip archives of multiples file

    • Method 1:Without using with statement

Let us first write the program and then see how code works.

from zipfile import ZipFile
# create a ZipFile object
zipObj = ZipFile('sample.zip', 'w')
# Add multiple files to the zip
zipObj.write('file1.txt')
zipObj.write('file2.txt')
# close the Zip File
zipObj.close()

Output

Directory structure before the execution of the program

Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 10-06-2021 18:41 14 file1.txt
-a---- 10-06-2021 18:41 15 file2.txt
-a---- 10-06-2021 19:06 216 zip.py

Directory structure after the execution of the program

Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 10-06-2021 18:41 14 file1.txt
-a---- 10-06-2021 18:41 15 file2.txt
-a---- 10-06-2021 19:09 239 sample.zip
-a---- 10-06-2021 19:09 216 zip.py

Here we clearly see that a zip file is created.

Let see how the program works. First, we create a ZipFile object bypassing the new file name and mode as ‘w’ (write mode). It will create a new zip file and open it within the ZipFile object. Then we use Call write() function on ZipFile object to add the files in it and then call close() on ZipFile object to Close the zip file.

  • Method 2:Using with statement

The difference between this and the previous method is that when we didn’t use with the statement then we have to close the zip file when the ZipFile object goes out of scope but when we use with statement the zip file automatically close when the ZipFile object goes out of scope. Let see the code for this.

from zipfile import ZipFile
# Create a ZipFile Object
with ZipFile('sample2.zip', 'w') as zipObj:
   # Add multiple files to the zip
   zipObj.write('file1.txt')
   zipObj.write('file2.txt')

Output

Directory structure before the execution of the program

Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 10-06-2021 18:41 14 file1.txt
-a---- 10-06-2021 18:41 15 file2.txt
-a---- 10-06-2021 19:09 239 sample.zip
-a---- 10-06-2021 19:21 429 zip.py

Directory structure after the execution of the program

Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 10-06-2021 18:41 14 file1.txt
-a---- 10-06-2021 18:41 15 file2.txt
-a---- 10-06-2021 19:09 239 sample.zip
-a---- 10-06-2021 19:24 239 sample2.zip
-a---- 10-06-2021 19:24 429 zip.py

Here we see that another zip file is created.

Create a zip archive of the directory

To zip selected files from a directory we need to check the condition on each file path while iteration before adding it to the zip file. As here we work on directory and file so we also have to import os module. Let see the code for this.

from zipfile import ZipFile
import os
from os.path import basename
# create a ZipFile object
dirName="../zip"
with ZipFile('sampleDir.zip', 'w') as zipObj:
   # Iterate over all the files in directory
   for folderName, subfolders, filenames in os.walk(dirName):
       for filename in filenames:
           #create complete filepath of file in directory
           filePath = os.path.join(folderName, filename)
           # Add file to zip
           zipObj.write(filePath, basename(filePath))

Output

Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 10-06-2021 18:41 14 file1.txt
-a---- 10-06-2021 18:41 15 file2.txt
-a---- 10-06-2021 19:09 239 sample.zip
-a---- 10-06-2021 19:24 239 sample2.zip
-a---- 10-06-2021 21:44 2796 sampleDir.zip
-a---- 10-06-2021 19:24 429 zip.py
-a---- 10-06-2021 21:44 506 zipdir.py

Here we see that the sampleDir.zip file is created.

Python : Create an Empty 2D Numpy Array and Append Rows or Columns to it

Create an empty 2-D NumPy array and append rows and columns

In this article, we will discuss what is a NumPy array, how to create a NumPy array in python, and how to append rows and columns to the empty NumPy array. First, let see what a NumPy array is and how we can create it. You can also create empty numpy array

NumPy

NumPy is a library in python that is created to work efficiently with arrays in python. It is fast, easy to learn, and provides efficient storage. It also provides a better way of handling data for the process. We can create an n-dimensional array in NumPy. To use NumPy simply have to import it in our program and then we can easily use the functionality of NumPy in our program. Let us see how NumPy works with the help of an example.

import numpy as np

#0-D array
arr = np.array(1)
print(arr)
print(type(arr))
print()

#1-D array
arr_1d = np.array([1, 2, 3, 4, 5])
print(arr_1d)
print(type(arr_1d))
print()

#2-D array
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print(arr_2d)
print(type(arr_2d))

Output

1
<class 'numpy.ndarray'>

[1 2 3 4 5]
<class 'numpy.ndarray'>

[[1 2 3]
 [4 5 6]]
<class 'numpy.ndarray'>

Here we see how we can easily work with an n-dimensional array in python using NumPy.

Let us come to the main topic of the article i.e how to create an empty 2-D array and append rows and columns to it.

Create an empty NumPy array

To create an empty array there is an inbuilt function in NumPy through which we can easily create an empty array. The function here we are talking about is the .empty() function.

Syntax: numpy.empty(shape, dtype=float, order=‘C’)

It accepts shape and data type as arguments. Then returns a new array of given shapes and data types. Let us understand this with the help of an example.

array = np.empty((0, 4), int)
print(array)
print(type(array))

Output

[]
<class 'numpy.ndarray'>

This is the NumPy array consisting of 0 rows and 4 columns.

Now we understand how to create an empty 2-D NumPy array, now let us see how to append rows and columns to this empty array.

As we want to append rows and columns so there is also an inbuilt functioning NumPy to done this task and the method name is .append().

Syntax: numpy.append(arr, values, axis=None)

It contains 3 parameters. First is arr in which we have to pass our NumPy array, second is values i.e. what values we want to be appended in our NumPy array and 3rd is the axis. Axis along which values need to be appended. To append as row axis is 0, whereas to append as column it is 1.

Append rows to empty NumPy array

With the help of the append() method, we can be done this task but we need to take care of some points before using the append function.

  1.  As we append data row-wise so we need to pass axis=0.
  2. We have to pass the row to be appended as the same shape of the numpy array otherwise we can get an error i.e. as we have created an empty array with 4 columns so now we are restricted to use 4 elements in our NumPy array otherwise we will get an error.

Let us see how this function works with the help of an example.

array = np.empty((0, 4), int)
array = np.append(array, np.array([[1,2,3,4], [5,6,7,8]]), axis=0)
print(array)
type(array)

Output

[[1 2 3 4]
 [5 6 7 8]]
numpy.ndarray

Here we see with the help of append() we easily append rows in our empty 2-D NumPy array.

Append columns to empty NumPy array

With the help of the append() method, we can be done this task but here also we need to take care of some points before using the append function.

  1.  As we append data column-wise so we need to pass axis=1.
  2. We have to pass the column to be appended as the same shape of the numpy array otherwise we can get an error.

Let us see how this function works with the help of an example.

# Create an empty 2D numpy array with 4 rows and 0 column
array = np.empty((4, 0), int)

columns = np.array([[1,2,3,4], [5,6,7,8]])
array = np.append(array, columns.transpose(), axis=1)
print(array)

Output

[[1 5]
 [2 6]
 [3 7]
 [4 8]]

Here we see with the help of the append method how we are able to append columns to our empty 2-D NumPy array. In this example or method, we see that we use the transpose() function. So let us understand why the transpose() function is used here.

transpose() Function

Here transpose() has the same functionality that the transpose of a matrix in mathematics. Here we see that we create our array row-wise but we want to enter them in the .append() function column-wise. Hence transpose is used to swap rows and columns.

Python: Find Unique Values in a Numpy Array With Frequency and Indices

Methods to find unique values in a numpy array with frequency and indices

In this article, we will discuss how to find unique values, rows, and columns in a 1D & 2D Numpy array. Before going to the methods first we see numpy.unique() method because this method is going to be used.

numpy.unique() method

numpy.unique() method help us to get the unique() values from given array.

syntax:numpy.unique(array, return_index=False, return_inverse=False, return_counts=False, axis=None)

Parameters

  1. array-Here we have to pass our array from which we want to get unique value.
  2. return_index- If this parameter is true then it will return the array of the index of the first occurrence of each unique value. By default it is false.
  3. return_counts-If this parameter is true then it will return the array of the count of the occurrence of each unique value. By default it is false.
  4. axis- It is used in the case of nd array, not in 1d array. axis=1 means we have to do operation column-wise and axis=0 means we have to do operation row-wise.

Now we will see different methods to find unique value with their indices and frequencies in a numpy array.

case 1-When our array is 1-D

  • Method 1-Find unique value from the array

As we only need unique values and not their frequencies and indices hence we simply pass our numpy array in the unique() method because the default value of other parameters is false so we don’t need to change them. Let see this with the help of an example.

import numpy as np
arr = np.array([1, 1, 2, 3, 4, 5, 6, 7, 2, 3, 1, 4, 7])
unique_values=np.unique(arr)
print("Original array is")
print(arr)
print("------------------")
print("Unique values are")
print(unique_values)

Output

Original array is
[1 1 2 3 4 5 6 7 2 3 1 4 7]
------------------
Unique values are
[1 2 3 4 5 6 7]
  • Method 2-Find unique value from the array along with their indices

In this method, as we want to get unique values along with their indices hence we make the return_index parameter true and pass our array. Let see this with the help of an example.

import numpy as np
arr = np.array([1, 1, 2, 3, 4, 5, 6, 7, 2, 3, 1, 4, 7])
unique_values,index=np.unique(arr,return_index=True)
print("Original array is")
print(arr)
print("------------------")
print("Unique values are")
print(unique_values)
print("First index of unique values are:")
print(index)

Output

Original array is
[1 1 2 3 4 5 6 7 2 3 1 4 7]
------------------
Unique values are
[1 2 3 4 5 6 7]
First index of unique values are:
[0 2 3 4 5 6 7]
  • Method 3-Find unique value from the array along with their frequencies

In this method, as we want to get unique values along with their frequencies hence we make the return_counts parameter true and pass our array. Let see this with the help of an example.

import numpy as np
arr = np.array([1, 1, 2, 3, 4, 5, 6, 7, 2, 3, 1, 4, 7])
unique_values,count=np.unique(arr,return_counts=True)
print("Original array is")
print(arr)
print("------------------")
print("Unique values are")
print(unique_values)
print("Count of unique values are:")
for i in range(0,len(unique_values)):
  print("count of ",unique_values[i]," is ",count[i])

Output

Original array is
[1 1 2 3 4 5 6 7 2 3 1 4 7]
------------------
Unique values are
[1 2 3 4 5 6 7]
Count of unique values are:
count of  1  is  3
count of  2  is  2
count of  3  is  2
count of  4  is  2
count of  5  is  1
count of  6  is  1
count of  7  is  2

Case 2: When our array is 2-D

  • Method 1-Find unique value from the array

Here we simply pass our array and all the parameter remain the same. Here we don’t make any changes because we want to work on both rows and columns. Let see this with the help of an example.

import numpy as np
arr = np.array([[1, 1, 2,1] ,[ 3, 1, 2,1] , [ 6, 1, 2, 1],  [1, 1, 2, 1]])
unique_values=np.unique(arr)
print("Original array is")
print(arr)
print("------------------")
print("Unique values are")
print(unique_values)

Output

Original array is
[[1 1 2 1]
 [3 1 2 1]
 [6 1 2 1]
 [1 1 2 1]]
------------------
Unique values are
[1 2 3 6]

Method 2-Get unique rows

As here want to want to work only on rows so here we will make axis=0 and simply pass our array. Let see this with the help of an example.

import numpy as np
arr = np.array([[1, 1, 2,1] ,[ 3, 1, 2,1] , [ 6, 1, 2, 1],  [1, 1, 2, 1]])
unique_values=np.unique(arr,axis=0)
print("Original array is")
print(arr)
print("------------------")
print("Unique rows are")
print(unique_values)

Output

Original array is
[[1 1 2 1]
 [3 1 2 1]
 [6 1 2 1]
 [1 1 2 1]]
------------------
Unique rows are
[[1 1 2 1]
 [3 1 2 1]
 [6 1 2 1]]

Method 3-Get unique columns

As here want to want to work only on columns so here we will make axis=1 and simply pass our array. Let see this with the help of an example.

import numpy as np
arr = np.array([[1, 1, 2,1] ,[ 3, 1, 2,1] , [ 6, 1, 2, 1],  [1, 1, 2, 1]])
unique_values=np.unique(arr,axis=1)
print("Original array is")
print(arr)
print("------------------")
print("Unique columns are")
print(unique_values)

Output

Original array is
[[1 1 2 1]
 [3 1 2 1]
 [6 1 2 1]
 [1 1 2 1]]
------------------
Unique columns are
[[1 1 2]
 [1 3 2]
 [1 6 2]
 [1 1 2]]

so these are the methods to find unique values in a numpy array with frequency and indices.

 

Pandas: Add Two Columns into a New Column in Dataframe

Methods to add two columns into a new column in Dataframe

In this article, we discuss how to add to column to an existing column in the dataframe and how to add two columns to make a new column in the dataframe using pandas. We will also discuss how to deal with NaN values.

  • Method 1-Sum two columns together to make a new series

In this method, we simply select two-column by their column name and then simply add them.Let see this with the help of an example.

import pandas as pd 
import numpy as np 
students = [('Raj', 24, 'Mumbai', 95) , 
            ('Rahul', 21, 'Delhi' , 97) , 
            ('Aadi', 22, np.NaN, 81) , 
            ('Abhay', 25,'Rajasthan' , 90) , 
            ('Ajjet', 21, 'Delhi' , 74)] 
# Create a DataFrame object 
df = pd.DataFrame( students, columns=['Name', 'Age', 'City', 'Marks']) 
print("Original Dataframe\n") 
print(df,'\n') 
total = df['Age'] + df['Marks']
print("New Series \n") 
print(total)
print(type(total))

Output

Original Dataframe

    Name  Age       City  Marks
0    Raj   24     Mumbai     95
1  Rahul   21      Delhi     97
2   Aadi   22        NaN     81
3  Abhay   25  Rajasthan     90
4  Ajjet   21      Delhi     74 

New Series 

0    119
1    118
2    103
3    115
4     95
dtype: int64
<class 'pandas.core.series.Series'>

Here we see that when we add two columns then a series will be formed.]

Note: We can’t add a string with int or float. We can only add a string with a string or a number with a number.

Let see the example of adding string with string.

import pandas as pd 
import numpy as np 
students = [('Raj', 24, 'Mumbai', 95) , 
            ('Rahul', 21, 'Delhi' , 97) , 
            ('Aadi', 22, 'Kolkata', 81) , 
            ('Abhay', 25,'Rajasthan' , 90) , 
            ('Ajjet', 21, 'Delhi' , 74)] 
# Create a DataFrame object 
df = pd.DataFrame( students, columns=['Name', 'Age', 'City', 'Marks']) 
print("Original Dataframe\n") 
print(df,'\n') 
total = df['Name'] + " "+df['City']
print("New Series \n") 
print(total)
print(type(total))

Output

Original Dataframe

    Name  Age       City  Marks
0    Raj   24     Mumbai     95
1  Rahul   21      Delhi     97
2   Aadi   22    Kolkata     81
3  Abhay   25  Rajasthan     90
4  Ajjet   21      Delhi     74 

New Series 

0         Raj Mumbai
1        Rahul Delhi
2       Aadi Kolkata
3    Abhay Rajasthan
4        Ajjet Delhi
dtype: object
<class 'pandas.core.series.Series'>
  • Method 2-Sum two columns together having NaN values to make a new series

In the previous method, there is no NaN or missing values but in this case, we also have NaN values. So when we add two columns in which one or two-column contains NaN values then we will see that we also get the result as NaN. Let see this with the help of an example.

import pandas as pd 
import numpy as np 
students = [('Raj', 24, 'Mumbai', 95) , 
            ('Rahul', 21, 'Delhi' , 97) , 
            ('Aadi', 22, 'Kolkata', np.NaN) , 
            ('Abhay', np.NaN,'Rajasthan' , 90) , 
            ('Ajjet', 21, 'Delhi' , 74)] 
# Create a DataFrame object 
df = pd.DataFrame( students, columns=['Name', 'Age', 'City', 'Marks']) 
print("Original Dataframe\n") 
print(df,'\n') 
total = df['Marks'] + df['Age']
print("New Series \n") 
print(total)
print(type(total))

Output

Original Dataframe

    Name   Age       City  Marks
0    Raj  24.0     Mumbai   95.0
1  Rahul  21.0      Delhi   97.0
2   Aadi  22.0    Kolkata    NaN
3  Abhay   NaN  Rajasthan   90.0
4  Ajjet  21.0      Delhi   74.0 

New Series 

0    119.0
1    118.0
2      NaN
3      NaN
4     95.0
dtype: float64
<class 'pandas.core.series.Series'>
  • Method 3-Add two columns to make a new column

We know that a dataframe is a group of series. We see that when we add two columns it gives us a series and we store that sum in a variable. If we make that variable a column in the dataframe then our work will be easily done. Let see this with the help of an example.

import pandas as pd 
import numpy as np 
students = [('Raj', 24, 'Mumbai', 95) , 
('Rahul', 21, 'Delhi' , 97) , 
('Aadi', 22, 'Kolkata',76) , 
('Abhay',23,'Rajasthan' , 90) , 
('Ajjet', 21, 'Delhi' , 74)] 
# Create a DataFrame object 
df = pd.DataFrame( students, columns=['Name', 'Age', 'City', 'Marks']) 
print("Original Dataframe\n") 
print(df,'\n') 
df['total'] = df['Marks'] + df['Age']
print("New Dataframe \n") 
print(df)
 
print(df)

Output

Original Dataframe

    Name  Age       City  Marks
0    Raj   24     Mumbai     95
1  Rahul   21      Delhi     97
2   Aadi   22    Kolkata     76
3  Abhay   23  Rajasthan     90
4  Ajjet   21      Delhi     74 

New Dataframe 

    Name  Age       City  Marks  total
0    Raj   24     Mumbai     95    119
1  Rahul   21      Delhi     97    118
2   Aadi   22    Kolkata     76     98
3  Abhay   23  Rajasthan     90    113
4  Ajjet   21      Delhi     74     95
  • Method 4-Add two columns with NaN values to make a new column

The same is the case with NaN values. But here NaN values will be shown.Let see this with the help of an example.

import pandas as pd 
import numpy as np 
students = [('Raj', 24, 'Mumbai', 95) , 
            ('Rahul', 21, 'Delhi' , 97) , 
            ('Aadi', 22, 'Kolkata', np.NaN) , 
            ('Abhay', np.NaN,'Rajasthan' , 90) , 
            ('Ajjet', 21, 'Delhi' , 74)] 
# Create a DataFrame object 
df = pd.DataFrame( students, columns=['Name', 'Age', 'City', 'Marks']) 
print("Original Dataframe\n") 
print(df,'\n') 
df['total'] = df['Marks'] + df['Age']
print("New Dataframe \n") 
print(df)

Output

Original Dataframe

    Name   Age       City  Marks
0    Raj  24.0     Mumbai   95.0
1  Rahul  21.0      Delhi   97.0
2   Aadi  22.0    Kolkata    NaN
3  Abhay   NaN  Rajasthan   90.0
4  Ajjet  21.0      Delhi   74.0 

New Dataframe 

    Name   Age       City  Marks  total
0    Raj  24.0     Mumbai   95.0  119.0
1  Rahul  21.0      Delhi   97.0  118.0
2   Aadi  22.0    Kolkata    NaN    NaN
3  Abhay   NaN  Rajasthan   90.0    NaN
4  Ajjet  21.0      Delhi   74.0   95.0

So these are the methods to add two columns in the dataframe.

Pandas: Create Dataframe from List of Dictionaries

Methods of creating a dataframe from a list of dictionaries

In this article, we discuss different methods by which we can create a dataframe from a list of dictionaries. Before going to the actual article let us done some observations that help to understand the concept easily. Suppose we have a list of dictionary:-

list_of_dict = [
{'Name': 'Mayank' , 'Age': 25, 'Marks': 91},
{'Name': 'Raj', 'Age': 21, 'Marks': 97},
{'Name': 'Rahul', 'Age': 23, 'Marks': 79},
{'Name': 'Manish' , 'Age': 23},
]

Here we know that dictionaries consist of key-value pairs. So we can analyze that if we make the key as our column name and values as the column value then a dataframe is easily created. And we have a list of dictionaries so a dataframe with multiple rows also.

pandas.DataFrame

This methods helps us to create dataframe in python

syntax: pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)

Let us see different methods to create dataframe from a list of dictionaries

  • Method 1-Create Dataframe from list of dictionaries with default indexes

As we see in in pandas.Datframe() method there is parameter name data.We have to simply pass our list of dictionaries in this method and it will return the dataframe.Let see this with the help of an example.

import pandas as pd
import numpy as np

list_of_dict = [
    {'Name': 'Mayank' ,  'Age': 25,  'Marks': 91},
    {'Name': 'Raj',  'Age': 21,  'Marks': 97},
    {'Name': 'Rahul',  'Age': 23,  'Marks': 79},
    {'Name': 'Manish' ,  'Age': 23,  'Marks': 86},
]
#create dataframe
df=pd.DataFrame(list_of_dict)
print(df)

Output

   Age  Marks    Name
0   25     91  Mayank
1   21     97     Raj
2   23     79   Rahul
3   23     86  Manish

Here we see that dataframe is created with default indexes 0,1,2,3….

Now a question may arise if from any dictionary key-value pair is less than other dictionaries.So in this case what happened.Let understand it with the help of an example.

import pandas as pd
import numpy as np

list_of_dict = [
    {'Name': 'Mayank' ,  'Age': 25,  'Marks': 91},
    {'Name': 'Raj',  'Age': 21,  'Marks': 97},
    {'Name': 'Rahul',  'Marks': 79},
    {'Name': 'Manish' ,  'Age': 23},
]
#create dataframe
df=pd.DataFrame(list_of_dict)
print(df)

Output

    Age  Marks    Name
0  25.0   91.0  Mayank
1  21.0   97.0     Raj
2   NaN   79.0   Rahul
3  23.0    NaN  Manish

Here we see in case of missing key value pair NaN value is there in the output.

  • Method 2- Create Dataframe from list of dictionary with custom indexes

Unlike the previous method where we have default indexes we can also give custom indexes by passes list of indexes in index parameter of pandas.DataFrame() function.Let see this with the help of an example.

import pandas as pd
import numpy as np

list_of_dict = [
    {'Name': 'Mayank' ,  'Age': 25,  'Marks': 91},
    {'Name': 'Raj',  'Age': 21,  'Marks': 97},
    {'Name': 'Rahul',  'Marks': 79},
    {'Name': 'Manish' ,  'Age': 23},
]
#create dataframe
df=pd.DataFrame(list_of_dict,index=['a','b','c','d'])
print(df)

Output

    Age  Marks    Name
a  25.0   91.0  Mayank
b  21.0   97.0     Raj
c   NaN   79.0   Rahul
d  23.0    NaN  Manish

Here we see that instead of default index 1,2,3….. we have now indes a,b,c,d.

  • Method 3-Create Dataframe from list of dictionaries with changed order of columns

With the help of pandas.DataFrame() method we can easily arrange order of column by simply passes list ozf columns in columns parameter in the order in which we want to display it in our dataframe.Let see this with the help of example.

import pandas as pd
import numpy as np

list_of_dict = [
    {'Name': 'Mayank' ,  'Age': 25,  'Marks': 91},
    {'Name': 'Raj',  'Age': 21,  'Marks': 97},
    {'Name': 'Rahul',  'Age': 23,  'Marks': 79},
    {'Name': 'Manish' ,  'Age': 23,  'Marks': 86},
]
#create dataframe
df=pd.DataFrame(list_of_dict,columns=['Name', 'Marks', 'Age'])
print(df)

Output

     Name  Marks  Age
0  Mayank     91   25
1     Raj     97   21
2   Rahul     79   23
3  Manish     86   23

Here also a question may arise if we pass less column in columns parameter or we pass more column in parameter then what happened.Let see this with the help of an example.

Case 1: Less column in column parameter

In this case the column which we don’t pass will be drop from the dataframe.Let see this with the help of an example.

import pandas as pd
import numpy as np

list_of_dict = [
    {'Name': 'Mayank' ,  'Age': 25,  'Marks': 91},
    {'Name': 'Raj',  'Age': 21,  'Marks': 97},
    {'Name': 'Rahul',  'Age': 23,  'Marks': 79},
    {'Name': 'Manish' ,  'Age': 23,  'Marks': 86},
]
#create dataframe
df=pd.DataFrame(list_of_dict,columns=['Name', 'Marks'])
print(df)

Output

     Name  Marks
0  Mayank     91
1     Raj     97
2   Rahul     79
3  Manish     86

Here we see that we didn’t pass Age column that’s why Age clumn is also not in our dataframe.

Case 2: More column in column parameter

In this case a new column will be added in dataframe but its all the value will be NaN.Let see this with the help of an example.

import pandas as pd
import numpy as np

list_of_dict = [
    {'Name': 'Mayank' ,  'Age': 25,  'Marks': 91},
    {'Name': 'Raj',  'Age': 21,  'Marks': 97},
    {'Name': 'Rahul',  'Age': 23,  'Marks': 79},
    {'Name': 'Manish' ,  'Age': 23,  'Marks': 86},
]
#create dataframe
df=pd.DataFrame(list_of_dict,columns=['Name', 'Marks', 'Age','city'])
print(df)

Output

     Name  Marks  Age  city
0  Mayank     91   25   NaN
1     Raj     97   21   NaN
2   Rahul     79   23   NaN
3  Manish     86   23   NaN

So these are the methods to create dataframe from list of dictionary in pandas.

Matplotlib: Line plot with markers

Methods to draw line plot with markers with the help of Matplotlib

In this article, we will discuss some basics of matplotlib and then discuss how to draw line plots with markers.

Matplotlib

We know that data that is in the form of numbers is difficult and boring to analyze. But if we convert that number into graphs, bar plots, piecharts, etc then it will be easy and interesting to visualize the data. Here Matplotlib library of python came into use. Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.

For using this library we have to first import it into the program. For importing this we can use

from matplotlib import pyplot as plt or import matplotlib. pyplot as plt.

In this article, we only discuss the line plot. So let see the function in matplotlib to draw a line plot.

syntax:  plt.plot(x,y, scalex=True, scaley=True, data=None, marker=’marker style’, **kwargs)

Parameters

  1. x,y: They represent vertical and horizontal axis.
  2. scalex, scaley: These parameters determine if the view limits are adapted to the data limits. The default value is True.
  3. marker: It contains types of markers that can be used. Like point marker, circle marker, etc.

Here is the list of markers used in this

  • “’.’“           point marker
  • “’,’“           pixel marker
  • “’o’“          circle marker
  • “’v’“          triangle_down marker
  • “’^’“          triangle_up marker
  • “'<‘“          triangle_left marker
  • “’>’“          triangle_right marker
  • “’1’“          tri_down marker
  • “’2’“          tri_up marker
  • “’3’“          tri_left marker
  • “’4’“          tri_right marker
  • “’s’“          square marker
  • “’p’“          pentagon marker
  • “’*’“          star marker
  • “’h’“          hexagon1 marker
  • “’H’“         hexagon2 marker
  • “’+’“          plus marker
  • “’x’“          x marker
  • “’D’“         diamond marker
  • “’d’“          thin_diamond marker
  • “’|’“           vline marker
  • “’_’“          hline marker

Examples of Line plot with markers in matplotlib

  • Line Plot with the Point marker

Here we use marker='.'.Let see this with the help of an example.

import matplotlib.pyplot as plt
import numpy as np

x = np.arange(-5,40,.5)
y = np.sin(x)
plt.plot(x,y, marker='.')
plt.title('Sin Function')
plt.xlabel('x values')
plt.ylabel('y= sin(x)')
plt.show()

Output

  • Line Plot with the Point marker and give marker some color

In the above example, we see the color of the marker is the same as the color of the line plot. So there is an attribute in plt.plot() function marker face color or mfc: color which is used to give color to the marker. Let see this with the help of an example.

import matplotlib.pyplot as plt
import numpy as np

x = np.arange(-5,40,.5)
y = np.sin(x)
plt.plot(x,y, marker='.',mfc='red')
plt.title('Sin Function')
plt.xlabel('x values')
plt.ylabel('y= sin(x)')
plt.show()

Output

Here we see that color of the pointer changes to red.

  • Line Plot with the Point marker and change the size of the marker

To change the size of the marker there is an attribute in pointer ply.plot() function that is used to achieve this. marker size or ms attribute is used to achieve this. We can pass an int value in ms and then its size increases or decreases according to this. Let see this with the help of an example.

import matplotlib.pyplot as plt
import numpy as np

x = np.arange(-5,40,.5)
y = np.sin(x)
plt.plot(x,y, marker='.',mfc='red',ms='17')
plt.title('Sin Function')
plt.xlabel('x values')
plt.ylabel('y= sin(x)')
plt.show()

Output

Here we see that size of the pointer changes.

  • Line Plot with the Point marker and change the color of the edge of the marker

We can also change the color of the edge of marker with the help of markeredgecolor or mec attribute. Let see this with the help of an example.

import matplotlib.pyplot as plt
import numpy as np

x = np.arange(-5,40,.5)
y = np.sin(x)
plt.plot(x,y, marker='.',mfc='red',ms='17', mec='yellow')
plt.title('Sin Function')
plt.xlabel('x values')
plt.ylabel('y= sin(x)')
plt.show()

Output

Here we see that the color of the edge of the pointer changes to yellow.

So here are some examples of how we can work with markers in line plots.

Note: These examples are applicable to any of the marker.

Read Csv File to Dataframe With Custom Delimiter in Python

Different methods to read CSV files with custom delimiter in python

In this article, we will see what are CSV files, how to use them in pandas, and then we see how and why to use custom delimiter with CSV files in pandas.

CSV file

A simple way to store big data sets is to use CSV files (comma-separated files).CSV files contain plain text and is a well know format that can be read by everyone including Pandas. Generally, CSV files contain columns separated by commas, but they can also contain content separated by a tab, or underscore or hyphen, etc. Generally, CSV files look like this:-

total_bill,tip,sex,smoker,day,time,size
16.99,1.01,Female,No,Sun,Dinner,2
10.34,1.66,Male,No,Sun,Dinner,3
21.01,3.5,Male,No,Sun,Dinner,3
23.68,3.31,Male,No,Sun,Dinner,2
24.59,3.61,Female,No,Sun,Dinner,4

Here we see different columns and their values are separated by commas.

Use CSV file in pandas

read_csv() method is used to import and read CSV files in pandas. After this step, a CSV file act as a normal dataframe and we can use operation in CSV file as we use in dataframe.

syntax:  pandas.read_csv(filepath_or_buffer, sep=‘, ‘, delimiter=None, header=‘infer’, names=None, index_col=None, ….)

',' is default separator in read_csv() method.

Let see this with an example

import pandas as pd
data=pd.read_csv('example1.csv')
data.head()

Output

total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4

Why use separator or delimiter with read_csv() method

Till now we understand that generally, CSV files contain data separated data that is separated by comma but sometimes it can contain data separated by tab or hyphen, etc. So to handle this we use a seperator. Let understand this with the help of an example. Suppose we have a CSV file separated by an underscore and we try to read that CSV file without using a separator or with using default separator i.e. comma. So let see what happens in this case.

"total_bill"_tip_sex_smoker_day_time_size
16.99_1.01_Female_No_Sun_Dinner_2
10.34_1.66_Male_No_Sun_Dinner_3
21.01_3.5_Male_No_Sun_Dinner_3
23.68_3.31_Male_No_Sun_Dinner_2
24.59_3.61_Female_No_Sun_Dinner_4
25.29_4.71_Male_No_Sun_Dinner_4
8.77_2_Male_No_Sun_Dinner_2

Suppose this is our CSV file separated by an underscore.

total_bill_tip_sex_smoker_day_time_size
0 16.99_1.01_Female_No_Sun_Dinner_2
1 10.34_1.66_Male_No_Sun_Dinner_3
2 21.01_3.5_Male_No_Sun_Dinner_3
3 23.68_3.31_Male_No_Sun_Dinner_2
4 24.59_3.61_Female_No_Sun_Dinner_4

Now see when we didn’t use a default separator here how unordered our data look like. So to solve this issue we use Separator. Now we will see when we use a separator to underscore how we get the same data in an ordered manner.

import pandas as pd 
data=pd.read_csv('example2.csv',sep = '_',engine = 'python') 
data.head()

Output

total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4

So this example is sufficient to understand why there is a need of using a separator of delimiter in pandas while working on a CSV file.

Now suppose there is a CSV file in while data is separated by multiple separators. For example:-

totalbill_tip,sex:smoker,day_time,size
16.99,1.01:Female|No,Sun,Dinner,2
10.34,1.66,Male,No|Sun:Dinner,3
21.01:3.5_Male,No:Sun,Dinner,3
23.68,3.31,Male|No,Sun_Dinner,2
24.59:3.61,Female_No,Sun,Dinner,4
25.29,4.71|Male,No:Sun,Dinner,4

Here we see there are multiple seperator used. So here we can not use any custom delimiter. To solve this problem regex or regular expression is used. Let see with the help of an example.

import pandas as pd 
data=pd.read_csv('example4.csv',sep = '[:, |_]') 
data.head()

Output

totalbill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4

When we notice we pass a list of separators in the sep parameter that is contained in our CSV file.

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.