Merging Dataframes using Dataframe.merge() in Python
In this article, we will learn to merge two different DataFrames into a single one using function Dataframe.merge()
.
Dataframe.merge() :
Dataframe class of Python’s Pandas library provide a function i.e. merge()
which helps in merging of two DataFrames.
Syntax:- DataFrame.merge(right, how='inner', on=None, leftOn=None, rightOn=None, left_index=False, right_index=False, sort=False, suffix=('_x', '_y'), copy=True, indicate=False, validate=None)
Arguments:-
- right : A dataframe that is to be merged with the calling dataframe.
- how : (Merge type). Some values are : left, right, outer, inner. It’s default value is ‘inner’. If the two dataframes contains different columns, then based how value, columns will be considered accordingly for merging.
- on : It is the column name on which merge will be done. If not provided then merged done on basis of indexes.
- left_on : Column in left dataframe where merging is to be done.
- right_on : Column in right datframe, where merging is to be done.
- left_index : (bool), default is False (If found True index index from left dataframe selected as join key)
- right_index : (bool), default is False (If found True index index from right dataframe selecte as join key)
- suffixes : tuple of (str, str), default (‘_x’, ‘_y’)
- Suffix that is to be applied on overlapping columns in left and right dataframes respectively.
Let’s see one by one
Merge DataFrames on common columns (Default Inner Join) :
If we have two DataFrames of two common columns, by directly calling merge()
 function the two columns will be merged considering common columns as join keys and the dissimilar columns would just be copied from one dataframe to another dataframe.
Let’s see the below program to understand it clearly.
import pandas as sc
# List of Tuples
players = [(15,'Smith','Pune', 17,12000),
(99,'Rana', 'Mumbai', 20,2000),
(51,'Jaydev','Kolkata', 22,25640),
(31,'Shikhar','Hyderabad', 28,85410),
(12,'Sanju','Rajasthan', 21,63180),
(35,'Raina','Gujarat', 18,62790)
]
# Creation of DataFrame object
playDFObj = sc.DataFrame(players, columns=['JersyN','Name', 'Team', 'Age','Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 1 : ')
print(playDFObj)
moreInfo = [(15, 13, 180000, 12000) ,
(99, 2, 195200, 2000) ,
(51, 7, 15499, 25640) ,
(31, 17, 654000, 85410) ,
(12, 5, 201000, 63180) ,
(35, 14, 741000, 62790)
]
# Creation of DataFrame object
moreinfoObj = sc.DataFrame(moreInfo, columns=['JersyN', 'PLayingSince' , 'Salary', 'Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 2 : ')
print(moreinfoObj)
# Merge two Dataframes on basis of common column by default INNER JOIN
mergedDataf = playDFObj.merge(moreinfoObj)
print(mergedDataf)
Output :
DataFrame 1 :
   JersyN    Name      Team Age Sponsered
I       15   Smith      Pune  17     12000
II      99    Rana    Mumbai  20      2000
III     51  Jaydev   Kolkata  22     25640
IV      31 Shikhar Hyderabad  28     85410
V       12   Sanju Rajasthan  21     63180
VI      35   Raina   Gujarat  18     62790
DataFrame 2 :
   JersyN PLayingSince Salary Sponsered
IÂ Â Â Â Â Â Â 15Â Â Â Â Â Â Â Â Â Â Â 13Â 180000Â Â Â Â Â 12000
IIÂ Â Â Â Â Â 99Â Â Â Â Â Â Â Â Â Â Â Â 2Â 195200Â Â Â Â Â Â 2000
IIIÂ Â Â Â Â 51Â Â Â Â Â Â Â Â Â Â Â Â 7Â Â 15499Â Â Â Â Â 25640
IVÂ Â Â Â Â Â 31Â Â Â Â Â Â Â Â Â Â Â 17Â 654000Â Â Â Â Â 85410
VÂ Â Â Â Â Â Â 12Â Â Â Â Â Â Â Â Â Â Â Â 5Â 201000Â Â Â Â Â 63180
VIÂ Â Â Â Â Â 35Â Â Â Â Â Â Â Â Â Â Â 14Â 741000Â Â Â Â Â 62790
 JersyN    Name      Team Age Sponsered PLayingSince Salary
0     15   Smith      Pune  17     12000           13 180000
1     99    Rana    Mumbai  20      2000            2 195200
2     51  Jaydev   Kolkata  22     25640            7  15499
3     31 Shikhar Hyderabad  28     85410           17 654000
4     12   Sanju Rajasthan  21     63180            5 201000
5     35   Raina   Gujarat  18     62790           14 741000
What is Inner Join ?
In above case, inner join occured for key columns i.e. ‘JersyN’ & ‘Sponsered’. During inner join the common columns of two dataframes are picked and merged. We can also explicitly do inner join by passing how
argument with values as inner
. After implementing both the cases will have same result.
Merge Dataframes using Left Join :
What is left join ?
While merging columns we can include all rows from left DataFrame and NaN from which values are missing in right DataFrame.
Let’s see the below program to understand it clearly.
import pandas as sc
# List of Tuples
players = [(15,'Smith','Pune', 17,12000),
(99,'Rana', 'Mumbai', 20,2000),
(51,'Jaydev','Kolkata', 22,25640),
(31,'Shikhar','Hyderabad', 28,85410),
(12,'Sanju','Rajasthan', 21,63180),
(35,'Raina','Gujarat', 18,62790)
]
# Creation of DataFrame object
playDFObj = sc.DataFrame(players, columns=['JersyN','Name', 'Team', 'Age','Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 1 : ')
print(playDFObj)
moreInfo = [(15, 13, 180000, 12000) ,
(99, 2, 2000) ,
(51, 7, 15499, 25640) ,
(31, 17, 654000) ,
(12, 5, 201000, 63180) ,
(35, 14, 741000, 62790)
]
# Creation of DataFrame object
moreinfoObj = sc.DataFrame(moreInfo, columns=['JersyN', 'PLayingSince' , 'Salary', 'Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 2 : ')
print(moreinfoObj)
# Merge two Dataframes on basis of common column by default INNER JOIN
mergedDataf = playDFObj.merge(moreinfoObj, how='left')
print('After merging: ')
print(mergedDataf)
Output :
DataFrame 1 :
   JersyN    Name      Team Age Sponsered
I       15   Smith      Pune  17     12000
II      99    Rana    Mumbai  20      2000
III     51  Jaydev   Kolkata  22     25640
IV      31 Shikhar Hyderabad  28     85410
V       12   Sanju Rajasthan  21     63180
VI      35   Raina   Gujarat  18     62790
DataFrame 2 :
   JersyN PLayingSince Salary Sponsered
IÂ Â Â Â Â Â Â 15Â Â Â Â Â Â Â Â Â Â Â 13Â 180000Â Â Â 12000.0
IIÂ Â Â Â Â Â 99Â Â Â Â Â Â Â Â Â Â Â Â 2Â Â Â 2000Â Â Â Â Â Â Â NaN
IIIÂ Â Â Â Â 51Â Â Â Â Â Â Â Â Â Â Â Â 7Â Â 15499Â Â Â 25640.0
IVÂ Â Â Â Â Â 31Â Â Â Â Â Â Â Â Â Â Â 17Â 654000Â Â Â Â Â Â Â NaN
VÂ Â Â Â Â Â Â 12Â Â Â Â Â Â Â Â Â Â Â Â 5Â 201000Â Â Â 63180.0
VIÂ Â Â Â Â Â 35Â Â Â Â Â Â Â Â Â Â Â 14Â 741000Â Â Â 62790.0
After merging:
 JersyN    Name      Team Age Sponsered PLayingSince   Salary
0      15   Smith      Pune  17    12000         13.0 180000.0
1     99    Rana    Mumbai  20     2000          NaN      NaN
2     51  Jaydev   Kolkata  22    25640          7.0  15499.0
3     31 Shikhar Hyderabad  28    85410          NaN      NaN
4     12   Sanju Rajasthan  21    63180          5.0 201000.0
5     35   Raina   Gujarat  18    62790         14.0 741000.0
Merge DataFrames using Right Join :
What is Right join ?
While merging columns we can include all rows from right DataFrame and NaN from which values are missing in left DataFrame.
Let’s see the below program to understand it clearly.
import pandas as sc
# List of Tuples
players = [(15,'Smith','Pune', 17,12000),
(99,'Rana', 'Mumbai', 20,2000),
(51,'Jaydev','Kolkata', 22,25640),
(31,'Shikhar','Hyderabad', 28,85410),
(12,'Sanju','Rajasthan', 21,63180),
(35,'Raina','Gujarat', 18,62790)
]
# Creation of DataFrame object
playDFObj = sc.DataFrame(players, columns=['JersyN','Name', 'Team', 'Age','Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 1 : ')
print(playDFObj)
moreInfo = [(15, 13, 180000, 12000) ,
(99, 2, 2000) ,
(51, 7, 15499, 25640) ,
(31, 17, 654000) ,
(12, 5, 201000, 63180) ,
(35, 14, 741000, 62790)
]
# Creation of DataFrame object
moreinfoObj = sc.DataFrame(moreInfo, columns=['JersyN', 'PLayingSince' , 'Salary', 'Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 2 : ')
print(moreinfoObj)
# Merge two Dataframes on basis of common column by default INNER JOIN
mergedDataf = playDFObj.merge(moreinfoObj, how='right')
print('After merging: ')
print(mergedDataf)
Output :
DataFrame 1 :
   JersyN    Name      Team Age Sponsered
I       15   Smith      Pune  17     12000
II      99    Rana    Mumbai  20      2000
III     51  Jaydev   Kolkata  22     25640
IV      31 Shikhar Hyderabad  28     85410
V       12   Sanju Rajasthan  21     63180
VI      35   Raina   Gujarat  18     62790
DataFrame 2 :
   JersyN PLayingSince Salary Sponsered
IÂ Â Â Â Â Â Â 15Â Â Â Â Â Â Â Â Â Â Â 13Â 180000Â Â Â 12000.0
IIÂ Â Â Â Â Â 99Â Â Â Â Â Â Â Â Â Â Â Â 2Â Â Â 2000Â Â Â Â Â Â Â NaN
IIIÂ Â Â Â Â 51Â Â Â Â Â Â Â Â Â Â Â Â 7Â Â 15499Â Â Â 25640.0
IVÂ Â Â Â Â Â 31Â Â Â Â Â Â Â Â Â Â Â 17Â 654000Â Â Â Â Â Â Â NaN
VÂ Â Â Â Â Â Â 12Â Â Â Â Â Â Â Â Â Â Â Â 5Â 201000Â Â Â 63180.0
VIÂ Â Â Â Â Â 35Â Â Â Â Â Â Â Â Â Â Â 14Â 741000Â Â Â 62790.0
After merging:
 JersyN   Name      Team  Age Sponsered PLayingSince Salary
0     15  Smith      Pune 17.0   12000.0           13 180000
1     51 Jaydev   Kolkata 22.0   25640.0            7  15499
2     12  Sanju Rajasthan 21.0   63180.0            5 201000
3     35  Raina   Gujarat 18.0   62790.0           14 741000
4Â Â Â Â Â 99Â Â Â Â NaNÂ Â Â Â Â Â Â NaNÂ Â NaNÂ Â Â Â Â Â Â NaNÂ Â Â Â Â Â Â Â Â Â Â Â 2Â Â Â 2000
5Â Â Â Â Â 31Â Â Â Â NaNÂ Â Â Â Â Â Â NaNÂ Â NaNÂ Â Â Â Â Â Â NaNÂ Â Â Â Â Â Â Â Â Â Â 17Â 654000
Merge DataFrames using Outer Join :
What is Outer join ?
While merging columns of two dataframes, we can even include all rows of two DataFrames and add NaN for the values missing in left or right DataFrame.
Let’s see the below program to understand it clearly.
import pandas as sc
# List of Tuples
players = [(15,'Smith','Pune', 17,12000),
(99,'Rana', 'Mumbai', 20,2000),
(51,'Jaydev','Kolkata', 22,25640),
(31,'Shikhar','Hyderabad', 28,85410),
(12,'Sanju','Rajasthan', 21,63180),
(35,'Raina','Gujarat', 18,62790)
]
# Creation of DataFrame object
playDFObj = sc.DataFrame(players, columns=['JersyN','Name', 'Team', 'Age','Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 1 : ')
print(playDFObj)
moreInfo = [(15, 13, 180000, 12000) ,
(99, 2, 2000) ,
(51, 7, 15499, 25640) ,
(31, 17, 654000) ,
(12, 5, 201000, 63180) ,
(35, 14, 741000, 62790)
]
# Creation of DataFrame object
moreinfoObj = sc.DataFrame(moreInfo, columns=['JersyN', 'PLayingSince' , 'Salary', 'Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 2 : ')
print(moreinfoObj)
# Merge two Dataframes on basis of common column by default INNER JOIN
mergedDataf = playDFObj.merge(moreinfoObj, how='outer')
print('After merging: ')
print(mergedDataf)
Output :
DataFrame 1 :
   JersyN    Name      Team Age Sponsered
I       15   Smith      Pune  17     12000
II      99    Rana    Mumbai  20      2000
III     51  Jaydev   Kolkata  22     25640
IV      31 Shikhar Hyderabad  28     85410
V       12   Sanju Rajasthan  21     63180
VI      35   Raina   Gujarat  18     62790
DataFrame 2 :
   JersyN PLayingSince Salary Sponsered
IÂ Â Â Â Â Â Â 15Â Â Â Â Â Â Â Â Â Â Â 13Â 180000Â Â Â 12000.0
IIÂ Â Â Â Â Â 99Â Â Â Â Â Â Â Â Â Â Â Â 2Â Â Â 2000Â Â Â Â Â Â Â NaN
IIIÂ Â Â Â Â 51Â Â Â Â Â Â Â Â Â Â Â Â 7Â Â 15499Â Â Â 25640.0
IVÂ Â Â Â Â Â 31Â Â Â Â Â Â Â Â Â Â Â 17Â 654000Â Â Â Â Â Â Â NaN
VÂ Â Â Â Â Â Â 12Â Â Â Â Â Â Â Â Â Â Â Â 5Â 201000Â Â Â 63180.0
VIÂ Â Â Â Â Â 35Â Â Â Â Â Â Â Â Â Â Â 14Â 741000Â Â Â 62790.0
After merging:
 JersyN    Name      Team  Age Sponsered PLayingSince   Salary
0     15   Smith      Pune 17.0   12000.0         13.0 180000.0
1     99    Rana    Mumbai 20.0    2000.0          NaN      NaN
2     51  Jaydev   Kolkata 22.0   25640.0          7.0  15499.0
3     31 Shikhar Hyderabad 28.0   85410.0          NaN      NaN
4     12   Sanju Rajasthan 21.0   63180.0          5.0 201000.0
5     35   Raina   Gujarat 18.0   62790.0         14.0 741000.0
6Â Â Â Â Â 99Â Â Â Â Â NaNÂ Â Â Â Â Â Â NaNÂ Â NaNÂ Â Â Â Â Â Â NaNÂ Â Â Â Â Â Â Â Â Â 2.0Â Â Â 2000.0
7Â Â Â Â Â 31Â Â Â Â Â NaNÂ Â Â Â Â Â Â NaNÂ Â NaNÂ Â Â Â Â Â Â NaNÂ Â Â Â Â Â Â Â Â 17.0Â 654000.0
Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.
Read more Articles on Python Data Analysis Using Padas