Merging Dataframes using Dataframe.merge() in Python
In this article, we will learn to merge two different DataFrames into a single one using function Dataframe.merge()
.
Dataframe.merge() :
Dataframe class of Python’s Pandas library provide a function i.e. merge()
which helps in merging of two DataFrames.
Syntax:- DataFrame.merge(right, how='inner', on=None, leftOn=None, rightOn=None, left_index=False, right_index=False, sort=False, suffix=('_x', '_y'), copy=True, indicate=False, validate=None)
Arguments:-
- right : A dataframe that is to be merged with the calling dataframe.
- how : (Merge type). Some values are : left, right, outer, inner. It’s default value is ‘inner’. If the two dataframes contains different columns, then based how value, columns will be considered accordingly for merging.
- on : It is the column name on which merge will be done. If not provided then merged done on basis of indexes.
- left_on : Column in left dataframe where merging is to be done.
- right_on : Column in right datframe, where merging is to be done.
- left_index : (bool), default is False (If found True index index from left dataframe selected as join key)
- right_index : (bool), default is False (If found True index index from right dataframe selecte as join key)
- suffixes : tuple of (str, str), default (‘_x’, ‘_y’)
- Suffix that is to be applied on overlapping columns in left and right dataframes respectively.
Let’s see one by one
- Merge DataFrames on common columns (Default Inner Join)
- Merge Dataframes using Left Join
- Merge DataFrames using Right Join
- Merge DataFrames using Outer Join
Merge DataFrames on common columns (Default Inner Join) :
If we have two DataFrames of two common columns, by directly calling merge()
function the two columns will be merged considering common columns as join keys and the dissimilar columns would just be copied from one dataframe to another dataframe.
Let’s see the below program to understand it clearly.
import pandas as sc # List of Tuples players = [(15,'Smith','Pune', 17,12000), (99,'Rana', 'Mumbai', 20,2000), (51,'Jaydev','Kolkata', 22,25640), (31,'Shikhar','Hyderabad', 28,85410), (12,'Sanju','Rajasthan', 21,63180), (35,'Raina','Gujarat', 18,62790) ] # Creation of DataFrame object playDFObj = sc.DataFrame(players, columns=['JersyN','Name', 'Team', 'Age','Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI']) print('DataFrame 1 : ') print(playDFObj) moreInfo = [(15, 13, 180000, 12000) , (99, 2, 195200, 2000) , (51, 7, 15499, 25640) , (31, 17, 654000, 85410) , (12, 5, 201000, 63180) , (35, 14, 741000, 62790) ] # Creation of DataFrame object moreinfoObj = sc.DataFrame(moreInfo, columns=['JersyN', 'PLayingSince' , 'Salary', 'Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI']) print('DataFrame 2 : ') print(moreinfoObj) # Merge two Dataframes on basis of common column by default INNER JOIN mergedDataf = playDFObj.merge(moreinfoObj) print(mergedDataf)
Output : DataFrame 1 : JersyN Name Team Age Sponsered I 15 Smith Pune 17 12000 II 99 Rana Mumbai 20 2000 III 51 Jaydev Kolkata 22 25640 IV 31 Shikhar Hyderabad 28 85410 V 12 Sanju Rajasthan 21 63180 VI 35 Raina Gujarat 18 62790 DataFrame 2 : JersyN PLayingSince Salary Sponsered I 15 13 180000 12000 II 99 2 195200 2000 III 51 7 15499 25640 IV 31 17 654000 85410 V 12 5 201000 63180 VI 35 14 741000 62790 JersyN Name Team Age Sponsered PLayingSince Salary 0 15 Smith Pune 17 12000 13 180000 1 99 Rana Mumbai 20 2000 2 195200 2 51 Jaydev Kolkata 22 25640 7 15499 3 31 Shikhar Hyderabad 28 85410 17 654000 4 12 Sanju Rajasthan 21 63180 5 201000 5 35 Raina Gujarat 18 62790 14 741000
What is Inner Join ?
In above case, inner join occured for key columns i.e. ‘JersyN’ & ‘Sponsered’. During inner join the common columns of two dataframes are picked and merged. We can also explicitly do inner join by passing how
argument with values as inner
. After implementing both the cases will have same result.
Merge Dataframes using Left Join :
What is left join ?
While merging columns we can include all rows from left DataFrame and NaN from which values are missing in right DataFrame.
Let’s see the below program to understand it clearly.
import pandas as sc # List of Tuples players = [(15,'Smith','Pune', 17,12000), (99,'Rana', 'Mumbai', 20,2000), (51,'Jaydev','Kolkata', 22,25640), (31,'Shikhar','Hyderabad', 28,85410), (12,'Sanju','Rajasthan', 21,63180), (35,'Raina','Gujarat', 18,62790) ] # Creation of DataFrame object playDFObj = sc.DataFrame(players, columns=['JersyN','Name', 'Team', 'Age','Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI']) print('DataFrame 1 : ') print(playDFObj) moreInfo = [(15, 13, 180000, 12000) , (99, 2, 2000) , (51, 7, 15499, 25640) , (31, 17, 654000) , (12, 5, 201000, 63180) , (35, 14, 741000, 62790) ] # Creation of DataFrame object moreinfoObj = sc.DataFrame(moreInfo, columns=['JersyN', 'PLayingSince' , 'Salary', 'Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI']) print('DataFrame 2 : ') print(moreinfoObj) # Merge two Dataframes on basis of common column by default INNER JOIN mergedDataf = playDFObj.merge(moreinfoObj, how='left') print('After merging: ') print(mergedDataf)
Output : DataFrame 1 : JersyN Name Team Age Sponsered I 15 Smith Pune 17 12000 II 99 Rana Mumbai 20 2000 III 51 Jaydev Kolkata 22 25640 IV 31 Shikhar Hyderabad 28 85410 V 12 Sanju Rajasthan 21 63180 VI 35 Raina Gujarat 18 62790 DataFrame 2 : JersyN PLayingSince Salary Sponsered I 15 13 180000 12000.0 II 99 2 2000 NaN III 51 7 15499 25640.0 IV 31 17 654000 NaN V 12 5 201000 63180.0 VI 35 14 741000 62790.0 After merging: JersyN Name Team Age Sponsered PLayingSince Salary 0 15 Smith Pune 17 12000 13.0 180000.0 1 99 Rana Mumbai 20 2000 NaN NaN 2 51 Jaydev Kolkata 22 25640 7.0 15499.0 3 31 Shikhar Hyderabad 28 85410 NaN NaN 4 12 Sanju Rajasthan 21 63180 5.0 201000.0 5 35 Raina Gujarat 18 62790 14.0 741000.0
Merge DataFrames using Right Join :
What is Right join ?
While merging columns we can include all rows from right DataFrame and NaN from which values are missing in left DataFrame.
Let’s see the below program to understand it clearly.
import pandas as sc # List of Tuples players = [(15,'Smith','Pune', 17,12000), (99,'Rana', 'Mumbai', 20,2000), (51,'Jaydev','Kolkata', 22,25640), (31,'Shikhar','Hyderabad', 28,85410), (12,'Sanju','Rajasthan', 21,63180), (35,'Raina','Gujarat', 18,62790) ] # Creation of DataFrame object playDFObj = sc.DataFrame(players, columns=['JersyN','Name', 'Team', 'Age','Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI']) print('DataFrame 1 : ') print(playDFObj) moreInfo = [(15, 13, 180000, 12000) , (99, 2, 2000) , (51, 7, 15499, 25640) , (31, 17, 654000) , (12, 5, 201000, 63180) , (35, 14, 741000, 62790) ] # Creation of DataFrame object moreinfoObj = sc.DataFrame(moreInfo, columns=['JersyN', 'PLayingSince' , 'Salary', 'Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI']) print('DataFrame 2 : ') print(moreinfoObj) # Merge two Dataframes on basis of common column by default INNER JOIN mergedDataf = playDFObj.merge(moreinfoObj, how='right') print('After merging: ') print(mergedDataf)
Output : DataFrame 1 : JersyN Name Team Age Sponsered I 15 Smith Pune 17 12000 II 99 Rana Mumbai 20 2000 III 51 Jaydev Kolkata 22 25640 IV 31 Shikhar Hyderabad 28 85410 V 12 Sanju Rajasthan 21 63180 VI 35 Raina Gujarat 18 62790 DataFrame 2 : JersyN PLayingSince Salary Sponsered I 15 13 180000 12000.0 II 99 2 2000 NaN III 51 7 15499 25640.0 IV 31 17 654000 NaN V 12 5 201000 63180.0 VI 35 14 741000 62790.0 After merging: JersyN Name Team Age Sponsered PLayingSince Salary 0 15 Smith Pune 17.0 12000.0 13 180000 1 51 Jaydev Kolkata 22.0 25640.0 7 15499 2 12 Sanju Rajasthan 21.0 63180.0 5 201000 3 35 Raina Gujarat 18.0 62790.0 14 741000 4 99 NaN NaN NaN NaN 2 2000 5 31 NaN NaN NaN NaN 17 654000
Merge DataFrames using Outer Join :
What is Outer join ?
While merging columns of two dataframes, we can even include all rows of two DataFrames and add NaN for the values missing in left or right DataFrame.
Let’s see the below program to understand it clearly.
import pandas as sc # List of Tuples players = [(15,'Smith','Pune', 17,12000), (99,'Rana', 'Mumbai', 20,2000), (51,'Jaydev','Kolkata', 22,25640), (31,'Shikhar','Hyderabad', 28,85410), (12,'Sanju','Rajasthan', 21,63180), (35,'Raina','Gujarat', 18,62790) ] # Creation of DataFrame object playDFObj = sc.DataFrame(players, columns=['JersyN','Name', 'Team', 'Age','Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI']) print('DataFrame 1 : ') print(playDFObj) moreInfo = [(15, 13, 180000, 12000) , (99, 2, 2000) , (51, 7, 15499, 25640) , (31, 17, 654000) , (12, 5, 201000, 63180) , (35, 14, 741000, 62790) ] # Creation of DataFrame object moreinfoObj = sc.DataFrame(moreInfo, columns=['JersyN', 'PLayingSince' , 'Salary', 'Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI']) print('DataFrame 2 : ') print(moreinfoObj) # Merge two Dataframes on basis of common column by default INNER JOIN mergedDataf = playDFObj.merge(moreinfoObj, how='outer') print('After merging: ') print(mergedDataf)
Output : DataFrame 1 : JersyN Name Team Age Sponsered I 15 Smith Pune 17 12000 II 99 Rana Mumbai 20 2000 III 51 Jaydev Kolkata 22 25640 IV 31 Shikhar Hyderabad 28 85410 V 12 Sanju Rajasthan 21 63180 VI 35 Raina Gujarat 18 62790 DataFrame 2 : JersyN PLayingSince Salary Sponsered I 15 13 180000 12000.0 II 99 2 2000 NaN III 51 7 15499 25640.0 IV 31 17 654000 NaN V 12 5 201000 63180.0 VI 35 14 741000 62790.0 After merging: JersyN Name Team Age Sponsered PLayingSince Salary 0 15 Smith Pune 17.0 12000.0 13.0 180000.0 1 99 Rana Mumbai 20.0 2000.0 NaN NaN 2 51 Jaydev Kolkata 22.0 25640.0 7.0 15499.0 3 31 Shikhar Hyderabad 28.0 85410.0 NaN NaN 4 12 Sanju Rajasthan 21.0 63180.0 5.0 201000.0 5 35 Raina Gujarat 18.0 62790.0 14.0 741000.0 6 99 NaN NaN NaN NaN 2.0 2000.0 7 31 NaN NaN NaN NaN 17.0 654000.0
Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.
Read more Articles on Python Data Analysis Using Padas
- How to merge Dataframes on specific columns or on index in Python?
- How to merge Dataframes by index using Dataframe.merge()?
- Count NaN or missing values in DataFrame
- Count rows in a dataframe | all or those only that satisfy a condition
- 6 Different ways to iterate over rows in a Dataframe & Update while iterating row by row
- Loop or Iterate over all or certain columns of a DataFrame
- How to display full Dataframe i.e. print all rows & columns without truncation