How to merge Dataframes by index using Dataframe.merge() in Python ?
In this article we are going to see how we can merge two dataframes by using index of both the dataframes or by suing index of one dataframe and some columns of the other dataframe, and how we can keep a merged dataframe with similar indices. So, let’s start the exploring the topic.
DataFrame.merge()
SYNTAX :
DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None, index_left=False, index_right=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)
For this short example, we will only focus on some arguments :
- On : This is the column name on which the merge is supposed to be done.
- Left_on : The column names of the left dataframe which are to be merged
- Right_on : The column names of the right dataframe which are to be merged
- index_left : It takes a boolean value whose default values is false. If it is true then it will choose indices from the left dataframe as join key.
- index_right : It takes a boolean value whose default values is false. If it is true then it will choose indices from the right dataframe as join key.
To demonstrate we will be taking the following two dataframes :
Left dataframe :
  Regd    Name  Age    City     Exp 0   10    Jill     16.0    Tokyo     10 1   11    Rachel  38.0    Texas      5 2   12    Kirti    39.0   New York   7 3   13    Veena   40.0    Texas     21 4   14    Lucifer  NaN    Texas     30 5   15    Pablo   30.0  New York   7 6   16    Lionel   45.0  Colombia  11
Right dataframe :
  Regd    Exp    Wage   Bonus 0   10    Junior  75000   2000 1   11    Senior  72200   1000 2   12    Expert  90999   1100 3   13    Expert  90000   1000 4   14    Junior  20000   2000 5   15    Junior  50000   1500 6   16    Senior  81000   1000
Merging two Dataframes by index of both the dataframes :
Here you might have noticed we have a common column named ‘Regd’ . So we can merge both the dataframes by passing left_index and right_index as true in the function.
# Program : # Importing the module import pandas as pd import numpy as np #Left Dataframe students = [(10,'Jill', 16, 'Tokyo', 10), (11,'Rachel', 38, 'Texas', 5), (12,'Kirti', 39, 'New York', 7), (13,'Veena', 40, 'Texas', 21), (14,'Lucifer', np.NaN, 'Texas', 30), (15,'Pablo', 30, 'New York', 7), (16,'Lionel', 45, 'Colombia', 11) ] lDfObj = pd.DataFrame(students, columns=['Regd','Name','Age','City','Exp'],index=['a', 'b', 'c', 'd', 'e', 'f', 'g']) #Right dataframe wage = [(10, 'Junior', 75000, 2000) , (11, 'Senior', 72200, 1000) , (12, 'Expert', 90999, 1100) , (13, 'Expert', 90000, 1000) , (14, 'Junior', 20000, 2000) , (15, 'Junior', 50000, 1500) , (16, 'Senior', 81000, 1000)] rDfObj = pd.DataFrame(wage, columns=['Regd','Exp','Wage','Bonus'] , index=['a', 'b', 'c', 'd', 'e', 'f', 'g']) #Merging both the dataframes newDF = lDfObj.merge(rDfObj, left_index=True, right_index=True) #printing the merged dataframe print("The merged dataframe is-") print(newDF)
Output : The merged dataframe is-   Regd_x  Name   Age    City     Exp_x Regd_y  Exp_y   Wage   Bonus a     10     Jill     16.0   Tokyo     10     10     Junior   75000   2000 b     11    Rachel  38.0   Texas      5    11     Senior   72200   1000 c     12     Kirti    39.0   New York  7    12      Expert  90999   1100 d     13    Veena  40.0   Texas      21     13     Expert   90000   1000 e     14    Lucifer  NaN    Texas      30     14     Junior   20000   2000 f     15     Pablo   30.0   New York  7    15     Junior   50000   1500 g     16    Lionel   45.0   Colombia  11     16     Senior   81000   1000
Finally, if we want to merge a dataframe by the index of the first dataframe with some column from the other dataframe, we can also do that.
Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.
Read more Articles on Python Data Analysis Using Padas
- How to merge Dataframes using Dataframe.merge() in Python?
- How to merge Dataframes on specific columns or on index in Python?
- Count NaN or missing values in DataFrame
- Count rows in a dataframe | all or those only that satisfy a condition
- 6 Different ways to iterate over rows in a Dataframe & Update while iterating row by row
- Loop or Iterate over all or certain columns of a DataFrame
- How to display full Dataframe i.e. print all rows & columns without truncation