Pandas : How to Merge Dataframes using Dataframe.merge() in Python – Part 1

Merging Dataframes using Dataframe.merge() in Python

In this article, we will learn to merge two different DataFrames into a single one using function Dataframe.merge().

Dataframe.merge() :

Dataframe class of Python’s Pandas library provide a function i.e. merge() which helps in merging of two DataFrames.

Syntax:- DataFrame.merge(right, how='inner', on=None, leftOn=None, rightOn=None, left_index=False, right_index=False, sort=False, suffix=('_x', '_y'), copy=True, indicate=False, validate=None)

Arguments:-

  • right : A dataframe that is to be merged with the calling dataframe.
  • how : (Merge type). Some values are : left, right, outer, inner. It’s default value is ‘inner’. If the two dataframes contains different columns, then based how value, columns will be considered accordingly for merging.
  • on : It is the column name on which merge will be done. If not provided then merged done on basis of indexes.
  • left_on : Column in left dataframe where merging is to be done.
  • right_on : Column in right datframe, where merging is to be done.
  • left_index : (bool), default is False (If found True index index from left dataframe selected as join key)
  • right_index : (bool), default is False (If found True index index from right dataframe selecte as join key)
  • suffixes : tuple of (str, str), default (‘_x’, ‘_y’)
  • Suffix that is to be applied on overlapping columns in left and right dataframes respectively.

Let’s see one by one

Merge DataFrames on common columns (Default Inner Join) :

If we have two DataFrames of two common columns, by directly calling merge()  function the two columns will be merged considering common columns as join keys and the dissimilar columns would just be copied from one dataframe to another dataframe.

Let’s see the below program to understand it clearly.

import pandas as sc
# List of Tuples
players = [(15,'Smith','Pune', 17,12000),
            (99,'Rana', 'Mumbai', 20,2000),
            (51,'Jaydev','Kolkata', 22,25640),
            (31,'Shikhar','Hyderabad', 28,85410),
            (12,'Sanju','Rajasthan', 21,63180),
            (35,'Raina','Gujarat', 18,62790)
            ]
# Creation of DataFrame object
playDFObj = sc.DataFrame(players, columns=['JersyN','Name', 'Team', 'Age','Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 1 : ')
print(playDFObj)
moreInfo = [(15, 13, 180000, 12000) ,
           (99, 2, 195200, 2000) ,
           (51, 7, 15499, 25640) ,
           (31, 17, 654000, 85410) ,
           (12, 5, 201000, 63180) ,
           (35, 14, 741000, 62790)
            ]
# Creation of DataFrame object
moreinfoObj = sc.DataFrame(moreInfo, columns=['JersyN', 'PLayingSince' , 'Salary', 'Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 2 : ')
print(moreinfoObj)
# Merge two Dataframes on basis of common column by default INNER JOIN
mergedDataf = playDFObj.merge(moreinfoObj)
print(mergedDataf)
Output :
DataFrame 1 :
     JersyN     Name       Team  Age  Sponsered
I        15    Smith       Pune   17      12000
II       99     Rana     Mumbai   20       2000
III      51   Jaydev    Kolkata   22      25640
IV       31  Shikhar  Hyderabad   28      85410
V        12    Sanju  Rajasthan   21      63180
VI       35    Raina    Gujarat   18      62790
DataFrame 2 :
     JersyN  PLayingSince  Salary  Sponsered
I        15            13  180000      12000
II       99             2  195200       2000
III      51             7   15499      25640
IV       31            17  654000      85410
V        12             5  201000      63180
VI       35            14  741000      62790
   JersyN     Name       Team  Age  Sponsered  PLayingSince  Salary
0      15    Smith       Pune   17      12000            13  180000
1      99     Rana     Mumbai   20       2000             2  195200
2      51   Jaydev    Kolkata   22      25640             7   15499
3      31  Shikhar  Hyderabad   28      85410            17  654000
4      12    Sanju  Rajasthan   21      63180             5  201000
5      35    Raina    Gujarat   18      62790            14  741000

What is Inner Join ?

In above case, inner join occured for key columns i.e. ‘JersyN’ & ‘Sponsered’. During inner join the common columns of two dataframes are picked and merged. We can also explicitly do inner join by passing how argument with values as inner. After implementing both the cases will have same result.

Merge Dataframes using Left Join :

What is left join ?

While merging columns we can include all rows from left DataFrame and NaN from which values are missing in right DataFrame.

Let’s see the below program to understand it clearly.

import pandas as sc
# List of Tuples
players = [(15,'Smith','Pune', 17,12000),
            (99,'Rana', 'Mumbai', 20,2000),
            (51,'Jaydev','Kolkata', 22,25640),
            (31,'Shikhar','Hyderabad', 28,85410),
            (12,'Sanju','Rajasthan', 21,63180),
            (35,'Raina','Gujarat', 18,62790)
            ]
# Creation of DataFrame object
playDFObj = sc.DataFrame(players, columns=['JersyN','Name', 'Team', 'Age','Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 1 : ')
print(playDFObj)
moreInfo = [(15, 13, 180000, 12000) ,
           (99, 2, 2000) ,
           (51, 7, 15499, 25640) ,
           (31, 17, 654000) ,
           (12, 5, 201000, 63180) ,
           (35, 14, 741000, 62790)
            ]
# Creation of DataFrame object
moreinfoObj = sc.DataFrame(moreInfo, columns=['JersyN', 'PLayingSince' , 'Salary', 'Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 2 : ')
print(moreinfoObj)
# Merge two Dataframes on basis of common column by default INNER JOIN
mergedDataf = playDFObj.merge(moreinfoObj, how='left')
print('After merging: ')
print(mergedDataf)
Output :
DataFrame 1 :
     JersyN     Name       Team  Age  Sponsered
I        15    Smith       Pune   17      12000
II       99     Rana     Mumbai   20       2000
III      51   Jaydev    Kolkata   22      25640
IV       31  Shikhar  Hyderabad   28      85410
V        12    Sanju  Rajasthan   21      63180
VI       35    Raina    Gujarat   18      62790
DataFrame 2 :
     JersyN  PLayingSince  Salary  Sponsered
I        15            13  180000    12000.0
II       99             2    2000        NaN
III      51             7   15499    25640.0
IV       31            17  654000        NaN
V        12             5  201000    63180.0
VI       35            14  741000    62790.0
After merging:
   JersyN     Name       Team  Age Sponsered  PLayingSince    Salary
0      15    Smith       Pune   17     12000          13.0  180000.0
1      99     Rana     Mumbai   20      2000           NaN       NaN
2      51   Jaydev    Kolkata   22     25640           7.0   15499.0
3      31  Shikhar  Hyderabad   28     85410           NaN       NaN
4      12    Sanju  Rajasthan   21     63180           5.0  201000.0
5      35    Raina    Gujarat   18     62790          14.0  741000.0

Merge DataFrames using Right Join :

What is Right join ?

While merging columns we can include all rows from right DataFrame and NaN from which values are missing in left DataFrame.

Let’s see the below program to understand it clearly.

import pandas as sc
# List of Tuples
players = [(15,'Smith','Pune', 17,12000),
            (99,'Rana', 'Mumbai', 20,2000),
            (51,'Jaydev','Kolkata', 22,25640),
            (31,'Shikhar','Hyderabad', 28,85410),
            (12,'Sanju','Rajasthan', 21,63180),
            (35,'Raina','Gujarat', 18,62790)
            ]
# Creation of DataFrame object
playDFObj = sc.DataFrame(players, columns=['JersyN','Name', 'Team', 'Age','Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 1 : ')
print(playDFObj)
moreInfo = [(15, 13, 180000, 12000) ,
           (99, 2, 2000) ,
           (51, 7, 15499, 25640) ,
           (31, 17, 654000) ,
           (12, 5, 201000, 63180) ,
           (35, 14, 741000, 62790)
            ]
# Creation of DataFrame object
moreinfoObj = sc.DataFrame(moreInfo, columns=['JersyN', 'PLayingSince' , 'Salary', 'Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 2 : ')
print(moreinfoObj)
# Merge two Dataframes on basis of common column by default INNER JOIN
mergedDataf = playDFObj.merge(moreinfoObj, how='right')
print('After merging: ')
print(mergedDataf)
Output :
DataFrame 1 :
     JersyN     Name       Team  Age  Sponsered
I        15    Smith       Pune   17      12000
II       99     Rana     Mumbai   20       2000
III      51   Jaydev    Kolkata   22      25640
IV       31  Shikhar  Hyderabad   28      85410
V        12    Sanju  Rajasthan   21      63180
VI       35    Raina    Gujarat   18      62790
DataFrame 2 :
     JersyN  PLayingSince  Salary  Sponsered
I        15            13  180000    12000.0
II       99             2    2000        NaN
III      51             7   15499    25640.0
IV       31            17  654000        NaN
V        12             5  201000    63180.0
VI       35            14  741000    62790.0
After merging:
   JersyN    Name       Team   Age  Sponsered  PLayingSince  Salary
0      15   Smith       Pune  17.0    12000.0            13  180000
1      51  Jaydev    Kolkata  22.0    25640.0             7   15499
2      12   Sanju  Rajasthan  21.0    63180.0             5  201000
3      35   Raina    Gujarat  18.0    62790.0            14  741000
4      99     NaN        NaN   NaN        NaN             2    2000
5      31     NaN        NaN   NaN        NaN            17  654000

Merge DataFrames using Outer Join :

What is Outer join ?

While merging columns of two dataframes, we can even include all rows of two DataFrames and add NaN for the values missing in left or right DataFrame.

Let’s see the below program to understand it clearly.

import pandas as sc
# List of Tuples
players = [(15,'Smith','Pune', 17,12000),
            (99,'Rana', 'Mumbai', 20,2000),
            (51,'Jaydev','Kolkata', 22,25640),
            (31,'Shikhar','Hyderabad', 28,85410),
            (12,'Sanju','Rajasthan', 21,63180),
            (35,'Raina','Gujarat', 18,62790)
            ]
# Creation of DataFrame object
playDFObj = sc.DataFrame(players, columns=['JersyN','Name', 'Team', 'Age','Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 1 : ')
print(playDFObj)
moreInfo = [(15, 13, 180000, 12000) ,
           (99, 2, 2000) ,
           (51, 7, 15499, 25640) ,
           (31, 17, 654000) ,
           (12, 5, 201000, 63180) ,
           (35, 14, 741000, 62790)
            ]
# Creation of DataFrame object
moreinfoObj = sc.DataFrame(moreInfo, columns=['JersyN', 'PLayingSince' , 'Salary', 'Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 2 : ')
print(moreinfoObj)
# Merge two Dataframes on basis of common column by default INNER JOIN
mergedDataf = playDFObj.merge(moreinfoObj, how='outer')
print('After merging: ')
print(mergedDataf)
Output :
DataFrame 1 :
     JersyN     Name       Team  Age  Sponsered
I        15    Smith       Pune   17      12000
II       99     Rana     Mumbai   20       2000
III      51   Jaydev    Kolkata   22      25640
IV       31  Shikhar  Hyderabad   28      85410
V        12    Sanju  Rajasthan   21      63180
VI       35    Raina    Gujarat   18      62790
DataFrame 2 :
     JersyN  PLayingSince  Salary  Sponsered
I        15            13  180000    12000.0
II       99             2    2000        NaN
III      51             7   15499    25640.0
IV       31            17  654000        NaN
V        12             5  201000    63180.0
VI       35            14  741000    62790.0
After merging:
   JersyN     Name       Team   Age  Sponsered  PLayingSince    Salary
0      15    Smith       Pune  17.0    12000.0          13.0  180000.0
1      99     Rana     Mumbai  20.0     2000.0           NaN       NaN
2      51   Jaydev    Kolkata  22.0    25640.0           7.0   15499.0
3      31  Shikhar  Hyderabad  28.0    85410.0           NaN       NaN
4      12    Sanju  Rajasthan  21.0    63180.0           5.0  201000.0
5      35    Raina    Gujarat  18.0    62790.0          14.0  741000.0
6      99      NaN        NaN   NaN        NaN           2.0    2000.0
7      31      NaN        NaN   NaN        NaN          17.0  654000.0

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas