Merge Dataframes on specific columns or on index in Python
In this article, we will learn to merge dataframes on basis of given columns or index.
- Merging Dataframe on a given column name as join key
- Merging Dataframe on a given column with suffix for similar column names
- Merging Dataframe different columns
Dataframe.merge() :
Dataframe class of Python’s Pandas library provide a function i.e. merge() which helps in merging of two DataFrames.
Syntax: DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)
Arguments:-
- right : A dataframe that is to be merged with the calling dataframe.
- how : (Merge type). Some values are : left, right, outer, inner. Its default value is ‘inner’. If the two dataframes contains different columns, then based
how
value, columns will be considered accordingly for merging. - on : It is the column name on which merge will be done. If not provided then merged done on basis of indexes.
- left_on : Column in left dataframe where merging is to be done.
- right_on : Column in right dataframe, where merging is to be done.
- left_index : (bool), default is False (If found True index index from left dataframe selected as join key)
- right_index : (bool), default is False (If found True index index from right dataframe selected as join key)
- suffixes : tuple of (str, str), default (‘_x’, ‘_y’) (Suffix that is to be applied on overlapping columns in left and right dataframes respectively.)
Merging Dataframe on a given column name as join key :
Let’s take a scenario where the columns names are same, but contents are different i.e. one column data is of int type and other column data is of string type. And if we apply merge() on them without passing any argument, it wouldn’t merge here. Here, we can merge dataframes on a single column by passing on argument in merge() function.
And as both dataframes have common column i.e. sponsered, so after merging columns are named by default. It will splitted by taking a suffix i.e. Sponsered_x and Sponsered_y as left and right dataframe respectively.
import pandas as sc # List of Tuples players = [(15,'Smith','Pune', 17,12000), (99,'Rana', 'Mumbai', 20,2000), (51,'Jaydev','Kolkata', 22,25640), (31,'Shikhar','Hyderabad', 28,85410), (12,'Sanju','Rajasthan', 21,63180), (35,'Raina','Gujarat', 18,62790) ] # Creation of DataFrame object playDFObj = sc.DataFrame(players, columns=['JersyN','Name', 'Team', 'Age','Salary'], index=['I', 'II', 'III', 'IV', 'V', 'VI']) print('DataFrame 1 : ') print(playDFObj) # List of Tuples moreInfo = [(15, 13, 180000, 'Nissin') , (99, 2, 195200, 'Jio') , (51, 7, 15499, 'Lays') , (31, 17, 654000, 'AmbujaC') , (12, 5, 201000, 'AsianP') , (35, 14, 741000, 'Airtel') ] # Creation of DataFrame object moreinfoObj = sc.DataFrame(moreInfo, columns=['JersyN', 'PLayingSince' , 'Salary', 'Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI']) print('DataFrame 2 : ') print(moreinfoObj) # Merge two Dataframes on basis of common column by default INNER JOIN mergedDataf = playDFObj.merge(moreinfoObj, on='JersyN') print('After merging: ') print(mergedDataf)
Output : DataFrame 1 : JersyN Name Team Age Salary I 15 Smith Pune 17 12000 II 99 Rana Mumbai 20 2000 III 51 Jaydev Kolkata 22 25640 IV 31 Shikhar Hyderabad 28 85410 V 12 Sanju Rajasthan 21 63180 VI 35 Raina Gujarat 18 62790 DataFrame 2 : JersyN PLayingSince Salary Sponsered I 15 13 180000 Nissin II 99 2 195200 Jio III 51 7 15499 Lays IV 31 17 654000 AmbujaC V 12 5 201000 AsianP VI 35 14 741000 Airtel After merging: JersyN Name Team Age Salary_x PLayingSince Salary_y Sponsered 0 15 Smith Pune 17 12000 13 180000 Nissin 1 99 Rana Mumbai 20 2000 2 195200 Jio 2 51 Jaydev Kolkata 22 25640 7 15499 Lays 3 31 Shikhar Hyderabad 28 85410 17 654000 AmbujaC 4 12 Sanju Rajasthan 21 63180 5 201000 AsianP 5 5 Raina Gujarat 18 62790 14 741000 Airtel
Merging Dataframe on a given column with suffix for similar column names :
In previous example, for common columns with dissimilar contents suffix x & y are added. We can also add our own custom suffix.
import pandas as sc # List of Tuples players = [(15,'Smith','Pune', 17,12000), (99,'Rana', 'Mumbai', 20,2000), (51,'Jaydev','Kolkata', 22,25640), (31,'Shikhar','Hyderabad', 28,85410), (12,'Sanju','Rajasthan', 21,63180), (35,'Raina','Gujarat', 18,62790) ] # Creation of DataFrame object playDFObj = sc.DataFrame(players, columns=['JersyN','Name', 'Team', 'Age','Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI']) print('DataFrame 1 : ') print(playDFObj) # List of Tuples moreInfo = [(15, 13, 180000, 'Nissin') , (99, 2, 195200, 'Jio') , (51, 7, 15499, 'Lays') , (31, 17, 654000, 'AmbujaC') , (12, 5, 201000, 'AsianP') , (35, 14, 741000, 'Airtel') ] # Creation of DataFrame object moreinfoObj = sc.DataFrame(moreInfo, columns=['JersyN', 'PLayingSince' , 'Salary', 'Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI']) print('DataFrame 2 : ') print(moreinfoObj) # Merge two Dataframes on basis of common column by default INNER JOIN mergedDataf = playDFObj.merge(moreinfoObj, on='JersyN',suffixes=('_Price', '_Companies')) print('After merging: ') print(mergedDataf)
Output : DataFrame 1 : JersyN Name Team Age Sponsered I 15 Smith Pune 17 12000 II 99 Rana Mumbai 20 2000 III 51 Jaydev Kolkata 22 25640 IV 31 Shikhar Hyderabad 28 85410 V 12 Sanju Rajasthan 21 63180 VI 35 Raina Gujarat 18 62790 DataFrame 2 : JersyN PLayingSince Salary Sponsered I 15 13 180000 Nissin II 99 2 195200 Jio III 51 7 15499 Lays IV 31 17 654000 AmbujaC V 12 5 201000 AsianP VI 35 14 741000 Airtel After merging: JersyN Name Team ... PLayingSince Salary Sponsered_Companies 0 15 Smith Pune ... 13 180000 Nissin 1 99 Rana Mumbai ... 2 195200 Jio 2 51 Jaydev Kolkata ... 7 15499 Lays 3 31 Shikhar Hyderabad ... 17 654000 AmbujaC 4 12 Sanju Rajasthan ... 5 201000 AsianP 5 35 Raina Gujarat ... 14 741000 Airtel
Merging Dataframe different columns :
Now let’s take a scenario of changing name of JersyN column of a dataframe and try to merge it with another dataframe.
import pandas as sc # List of Tuples players = [(15,'Smith','Pune', 17,12000), (99,'Rana', 'Mumbai', 20,2000), (51,'Jaydev','Kolkata', 22,25640), (31,'Shikhar','Hyderabad', 28,85410), (12,'Sanju','Rajasthan', 21,63180), (35,'Raina','Gujarat', 18,62790) ] # Creation of DataFrame object playDFObj = sc.DataFrame(players, columns=['JersyN','Name', 'Team', 'Age','Salary'], index=['I', 'II', 'III', 'IV', 'V', 'VI']) print('DataFrame 1 : ') print(playDFObj) # List of Tuples moreInfo = [(15, 13, 180000, 'Nissin') , (99, 2, 195200, 'Jio') , (51, 7, 15499, 'Lays') , (31, 17, 654000, 'AmbujaC') , (12, 5, 201000, 'AsianP') , (35, 14, 741000, 'Airtel') ] # Creation of DataFrame object moreinfoObj = sc.DataFrame(moreInfo, columns=['JersyN', 'PLayingSince' , 'Salary', 'Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI']) print('DataFrame 2 : ') print(moreinfoObj) # Rename column JersyN to ShirtN moreinfoObj.rename(columns={'JersyN': 'ShirtN'}, inplace=True) # Merge two Dataframes on basis of common column by default INNER JOIN mergedDataf = playDFObj.merge(moreinfoObj, left_on='JersyN', right_on='ShirtN') print('After merging: ') print(mergedDataf)
Output ; DataFrame 1 : JersyN Name Team Age Salary I 15 Smith Pune 17 12000 II 99 Rana Mumbai 20 2000 III 51 Jaydev Kolkata 22 25640 IV 31 Shikhar Hyderabad 28 85410 V 12 Sanju Rajasthan 21 63180 VI 35 Raina Gujarat 18 62790 DataFrame 2 : JersyN PLayingSince Salary Sponsered I 15 13 180000 Nissin II 99 2 195200 Jio III 51 7 15499 Lays IV 31 17 654000 AmbujaC V 12 5 201000 AsianP VI 35 14 741000 Airtel After merging: JersyN Name Team Age ... ShirtN PLayingSince Salary Sponsered_y 0 15 Smith Pune 17 ... 15 13 180000 Nissin 1 99 Rana Mumbai 20 ... 99 2 195200 Jio 2 51 Jaydev Kolkata 22 ... 51 7 15499 Lays 3 31 Shikhar Hyderabad 28 ... 31 17 654000 AmbujaC 4 12 Sanju Rajasthan 21 ... 12 5 201000 AsianP 5 35 Raina Gujarat 18 ... 35 14 741000 Airtel [6 rows x 9 columns]
Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.
Read more Articles on Python Data Analysis Using Padas
- How to merge Dataframes using Dataframe.merge() in Python?
- How to merge Dataframes by index using Dataframe.merge()?
- Count NaN or missing values in DataFrame
- Count rows in a dataframe | all or those only that satisfy a condition
- 6 Different ways to iterate over rows in a Dataframe & Update while iterating row by row
- Loop or Iterate over all or certain columns of a DataFrame
- How to display full Dataframe i.e. print all rows & columns without truncation