Mayank Gupta

Pandas: Dataframe.fillna()

Dataframe.fillna() in Dataframes using Python

In this article, we will discuss how to use Dataframe.fillna() method with examples, like how to replace NaN values in a complete dataframe or some specific rows/columns

Dataframe.fillna()

Dataframe.fillna() is used to fill NaN values with some other values in Dataframe. This method widely came into use when there are fewer NaN values in any column so instead of dropping the whole column we replace the NaN or missing values of that column with some other values.

Syntax: DataFrame.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)

Parameters

1) Value: This parameter contains the values that we want to fill instead of NaN values. By default value is None.

2) method: The method parameter is used when the value doesn’t pass. There are different methods like backfill,bfill, etc. By default method is None.

3) axis: axis=1 means fill NaN values in columns and axis=0 means fill NaN values in rows.

4) inplace: It is a boolean which makes the changes in dataframe itself if True.

Different methods to use Dataframe.fillna() method

  • Method 1: Replace all NaN values in Dataframe

In this method, we normally pass some value in the value parameter and all the NaN values will be replaced with that value. Let see this with the help of an example.

import pandas as pd
import numpy as np
students = [('Raj', 24, 95) ,
            ('Rahul', np.NaN,97) ,
            ('Aadi', 22,81) ,
            ('Abhay', np.NaN,np.NaN) ,
            ('Ajjet', 21,74),
            ('Amar',np.NaN,np.NaN),
            ('Aman',np.NaN,76)]
# Create a DataFrame object
df = pd.DataFrame(  students, 
                    columns=['Name', 'Age','Marks'])
print("Original Dataframe\n")
print(df,'\n')
new_df=df.fillna(0)
print("New Dataframe\n")
print(new_df)

Output

Original Dataframe

    Name   Age  Marks
0    Raj  24.0   95.0
1  Rahul   NaN   97.0
2   Aadi  22.0   81.0
3  Abhay   NaN    NaN
4  Ajjet  21.0   74.0
5   Amar   NaN    NaN
6   Aman   NaN   76.0 

New Dataframe

    Name   Age  Marks
0    Raj  24.0   95.0
1  Rahul   0.0   97.0
2   Aadi  22.0   81.0
3  Abhay   0.0    0.0
4  Ajjet  21.0   74.0
5   Amar   0.0    0.0
6   Aman   0.0   76.0

Here we see that we replace all NaN values with 0.

  • Method 2- Replace all NaN values in specific columns

In this method, we replace all NaN values with some other values but only in specific columns not on the whole dataframe.

import pandas as pd
import numpy as np
students = [('Raj', 24, 95) ,
            ('Rahul', np.NaN,97) ,
            ('Aadi', 22,81) ,
            ('Abhay', np.NaN,np.NaN) ,
            ('Ajjet', 21,74),
            ('Amar',np.NaN,np.NaN),
            ('Aman',np.NaN,76)]
# Create a DataFrame object
df = pd.DataFrame(  students, 
                    columns=['Name', 'Age','Marks'])
print("Original Dataframe\n")
print(df,'\n')
df['Age'].fillna(0,inplace=True)
print("New Dataframe\n")
print(df)

Output

Original Dataframe

    Name   Age  Marks
0    Raj  24.0   95.0
1  Rahul   NaN   97.0
2   Aadi  22.0   81.0
3  Abhay   NaN    NaN
4  Ajjet  21.0   74.0
5   Amar   NaN    NaN
6   Aman   NaN   76.0 

New Dataframe

    Name   Age  Marks
0    Raj  24.0   95.0
1  Rahul   0.0   97.0
2   Aadi  22.0   81.0
3  Abhay   0.0    NaN
4  Ajjet  21.0   74.0
5   Amar   0.0    NaN
6   Aman   0.0   76.0

Here we see that the NaN value only in the Age column replaces with 0. Here we use inplace=’true’ because we want changes to be made in the original dataframe.

  • Method 3- Replace NaN values of one column with values of other columns

Here we pass the column in the value parameter of which we want the value to be copied.Let see this with help of an example.

import pandas as pd
import numpy as np
students = [('Raj', 24, 95) ,
            ('Rahul', np.NaN,97) ,
            ('Aadi', 22,81) ,
            ('Abhay', np.NaN,87) ,
            ('Ajjet', 21,74),
            ('Amar',np.NaN,76),
            ('Aman',np.NaN,76)]
# Create a DataFrame object
df = pd.DataFrame(  students, 
                    columns=['Name', 'Age','Marks'])
print("Original Dataframe\n")
print(df,'\n')
df['Age'].fillna(value=df['Marks'],inplace=True)
print("New Dataframe\n")
print(df)

Output

Original Dataframe

    Name   Age  Marks
0    Raj  24.0     95
1  Rahul   NaN     97
2   Aadi  22.0     81
3  Abhay   NaN     87
4  Ajjet  21.0     74
5   Amar   NaN     76
6   Aman   NaN     76 

New Dataframe

    Name   Age  Marks
0    Raj  24.0     95
1  Rahul  97.0     97
2   Aadi  22.0     81
3  Abhay  87.0     87
4  Ajjet  21.0     74
5   Amar  76.0     76
6   Aman  76.0     76

Here we see NaN values of the Age column are replaced with non NaN value of the Marks Column.

  • Method 4-Replace NaN values in specific rows

To replace NaN values in a row we need to use .loc[‘index name’] to access a row in a dataframe, then we will call the fillna() function on that row. Let see this with help of an example.

import pandas as pd
import numpy as np
students = [('Raj', 24, 95) ,
            ('Rahul', np.NaN,97) ,
            ('Aadi', 22,81) ,
            ('Abhay', np.NaN,87) ,
            ('Ajjet', 21,74),
            ('Amar',np.NaN,76),
            ('Aman',np.NaN,76)]
# Create a DataFrame object
df = pd.DataFrame(  students, 
                    columns=['Name', 'Age','Marks'])
print("Original Dataframe\n")
print(df,'\n')
df.loc[1]=df.loc[1].fillna(value=0)
print("New Dataframe\n")
print(df)

Output

Original Dataframe

    Name   Age  Marks
0    Raj  24.0     95
1  Rahul   NaN     97
2   Aadi  22.0     81
3  Abhay   NaN     87
4  Ajjet  21.0     74
5   Amar   NaN     76
6   Aman   NaN     76 

New Dataframe

    Name   Age  Marks
0    Raj  24.0     95
1  Rahul   0.0     97
2   Aadi  22.0     81
3  Abhay   NaN     87
4  Ajjet  21.0     74
5   Amar   NaN     76
6   Aman   NaN     76

So these are some of the ways to use Dataframe.fillna().

Get Rows And Columns Names In Dataframe Using Python

Methods to get rows and columns names in dataframe

In this we will study different methods to get rows and column names in a dataframe.

Methods to get column name in dataframe

  • Method 1: By iterating over columns

In this method, we will simply be iterating over all the columns and print the names of each column. Point to remember that dataframe_name. columns give a list of columns.Let see this with the help of an example.

import pandas as pd
import numpy as np
students = [('Raj', 24, 'Mumbai', 95) , 
            ('Rahul', 21, 'Delhi' , 97) , 
            ('Aadi', 22, 'Kolkata', 81) , 
            ('Abhay', 24,'Rajasthan' ,76) , 
              ('Ajjet', 21, 'Delhi' , 74)] 
# Create a DataFrame object 
df = pd.DataFrame( students, columns=['Name', 'Age', 'City', 'Marks']) 
print("Original Dataframe\n") 
print(df,'\n')
print(df.columns,'\n')
print("columns are:")
for column in df.columns:
  print(column,end=" ")

Output

Original Dataframe

    Name  Age       City  Marks
0    Raj   24     Mumbai     95
1  Rahul   21      Delhi     97
2   Aadi   22    Kolkata     81
3  Abhay   24  Rajasthan     76
4  Ajjet   21      Delhi     74 

Index(['Name', 'Age', 'City', 'Marks'], dtype='object') 

columns are:
Name Age City Marks 

Here we see that df. columns give a list of columns and by iterating over this list we can easily get column names.

  • Method 2-Using columns.values

columns. values return an array of column names. Let see this with the help of an example.

import pandas as pd
import numpy as np
students = [('Raj', 24, 'Mumbai', 95) , 
            ('Rahul', 21, 'Delhi' , 97) , 
            ('Aadi', 22, 'Kolkata', 81) , 
            ('Abhay', 24,'Rajasthan' ,76) , 
              ('Ajjet', 21, 'Delhi' , 74)] 
# Create a DataFrame object 
df = pd.DataFrame( students, columns=['Name', 'Age', 'City', 'Marks']) 
print("Original Dataframe\n") 
print(df,'\n')
print("columns are:")
print(df.columns.values,'\n')

Output

Original Dataframe

    Name  Age       City  Marks
0    Raj   24     Mumbai     95
1  Rahul   21      Delhi     97
2   Aadi   22    Kolkata     81
3  Abhay   24  Rajasthan     76
4  Ajjet   21      Delhi     74 

columns are:
['Name' 'Age' 'City' 'Marks'] 
  • Method 3- using tolist() method

Using tolist() method with values with given the list of columns. Let see this with the help of an example.

import pandas as pd
import numpy as np
students = [('Raj', 24, 'Mumbai', 95) , 
            ('Rahul', 21, 'Delhi' , 97) , 
            ('Aadi', 22, 'Kolkata', 81) , 
            ('Abhay', 24,'Rajasthan' ,76) , 
              ('Ajjet', 21, 'Delhi' , 74)] 
# Create a DataFrame object 
df = pd.DataFrame( students, columns=['Name', 'Age', 'City', 'Marks']) 
print("Original Dataframe\n") 
print(df,'\n')
print("columns are:")
print(df.columns.values.tolist(),'\n')

Output

Original Dataframe

    Name  Age       City  Marks
0    Raj   24     Mumbai     95
1  Rahul   21      Delhi     97
2   Aadi   22    Kolkata     81
3  Abhay   24  Rajasthan     76
4  Ajjet   21      Delhi     74 

columns are:
['Name', 'Age', 'City', 'Marks'] 
  • Method 4- Access specific column name using index

As we know that columns. values give an array of columns and we can access array elements using an index. So in this method, we use this concept. Let see this with the help of an example.

import pandas as pd
import numpy as np
students = [('Raj', 24, 'Mumbai', 95) , 
            ('Rahul', 21, 'Delhi' , 97) , 
            ('Aadi', 22, 'Kolkata', 81) , 
            ('Abhay', 24,'Rajasthan' ,76) , 
              ('Ajjet', 21, 'Delhi' , 74)] 
# Create a DataFrame object 
df = pd.DataFrame( students, columns=['Name', 'Age', 'City', 'Marks']) 
print("Original Dataframe\n") 
print(df,'\n')
print("columns at second index:")
print(df.columns.values[2],'\n')

Output

Original Dataframe

    Name  Age       City  Marks
0    Raj   24     Mumbai     95
1  Rahul   21      Delhi     97
2   Aadi   22    Kolkata     81
3  Abhay   24  Rajasthan     76
4  Ajjet   21      Delhi     74 

columns at second index:
City 

So these are the methods to get column names.

Method to get rows name in dataframe

  • Method 1-Using index.values

As columns., values give a list or array of columns similarly index. values give a list of array of indexes. Let see this with the help of an example.

import pandas as pd
import numpy as np
students = [('Raj', 24, 'Mumbai', 95) , 
            ('Rahul', 21, 'Delhi' , 97) , 
            ('Aadi', 22, 'Kolkata', 81) , 
            ('Abhay', 24,'Rajasthan' ,76) , 
              ('Ajjet', 21, 'Delhi' , 74)] 
# Create a DataFrame object 
df = pd.DataFrame( students, columns=['Name', 'Age', 'City', 'Marks']) 
print("Original Dataframe\n") 
print(df,'\n')
print("Rows are:")
print(df.index.values,'\n')

Output

Original Dataframe

    Name  Age       City  Marks
0    Raj   24     Mumbai     95
1  Rahul   21      Delhi     97
2   Aadi   22    Kolkata     81
3  Abhay   24  Rajasthan     76
4  Ajjet   21      Delhi     74 

Rows are:
[0 1 2 3 4] 
  • Method 2- Get Row name at a specific index

As we know that index. values give an array of indexes and we can access array elements using an index. So in this method, we use this concept. Let see this with the help of an example.

import pandas as pd
import numpy as np
students = [('Raj', 24, 'Mumbai', 95) , 
('Rahul', 21, 'Delhi' , 97) , 
('Aadi', 22, 'Kolkata', 81) , 
('Abhay', 24,'Rajasthan' ,76) , 
('Ajjet', 21, 'Delhi' , 74)] 
# Create a DataFrame object 
df = pd.DataFrame( students, columns=['Name', 'Age', 'City', 'Marks']) 
print("Original Dataframe\n") 
print(df,'\n')
print("Row at index 2:")
print(df.index.values[2],'\n')

Output

Original Dataframe

    Name  Age       City  Marks
0    Raj   24     Mumbai     95
1  Rahul   21      Delhi     97
2   Aadi   22    Kolkata     81
3  Abhay   24  Rajasthan     76
4  Ajjet   21      Delhi     74 

Row at index 2:
2 
  • Method 3-By iterating over indices

As dataframe_names.columns give a list of columns similarly dataframe_name.index gives the list of indexes. Hence we can simply be iterating over all lists of indexes and print rows names. Let see this with help of an example.

import pandas as pd
import numpy as np
students = [('Raj', 24, 'Mumbai', 95) , 
            ('Rahul', 21, 'Delhi' , 97) , 
            ('Aadi', 22, 'Kolkata', 81) , 
            ('Abhay', 24,'Rajasthan' ,76) , 
              ('Ajjet', 21, 'Delhi' , 74)] 
# Create a DataFrame object 
df = pd.DataFrame( students, columns=['Name', 'Age', 'City', 'Marks']) 
print("Original Dataframe\n") 
print(df,'\n')
print("List of indexes:")
print(df.index,'\n')
print("Indexes or rows names are:")
for row in df.index:
  print(row,end=" ")

Output

Original Dataframe

    Name  Age       City  Marks
0    Raj   24     Mumbai     95
1  Rahul   21      Delhi     97
2   Aadi   22    Kolkata     81
3  Abhay   24  Rajasthan     76
4  Ajjet   21      Delhi     74 

List of indexes:
RangeIndex(start=0, stop=5, step=1) 

Indexes or rows names are:
0 1 2 3 4 

So these are the methods to get rows and column names in the dataframe using python.

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas – Select items from a Dataframe

Python : How to Get all Keys with Maximum Value in a Dictionary

Method to get all the keys with maximum value in a dictionary in python

In this article we will discuss about different methods to get all the keys with maximum value in a dictionary. Let us see all these methods one by one.

  • Method 1-Using max() function and d.get

As the name suggests max function is used to find the maximum value. Let us see what is max() function in python and how it works.

max() function

The max() function returns the item with the highest value or the item with the highest value in an iterable. Normally we pass iterables in the max() function to get the max value in iterable but as we know dictionary has both keys and values so we have to pass an extra argument in the max() function to get the keys with max value in a dictionary.

Syntax: max(iterable,key=d.get) where d denotes the name of the dictionary.

It returns the item with maximum value in the Iterable.

So with the help of the max() function and d.get argument we can easily find the key with max value in a dictionary.

d = {"a": 1, "b": 2, "c": 3,"d":4}
max_key = max(d, key=d.get)
print(max_key)

Output

d

Here we see that corresponding to key “d” we get max value so this function return d as the output.

There is a small problem with this method. Let see the problem and see the method to solve the problem.

Problem

The problem with this method is that if in the dictionary multiple keys have max value then this method only returns the key that occurs first in the dictionary. Let see this with the help of an example.

d = {"a": 1, "b": 2, "c":4,"d":4}
max_key = max(d, key=d.get)
print(max_key)

Output

c

In the dictionary, we see that the dictionary has max value 4 corresponds to key c and d still function only return c as the output because c comes first in the dictionary. So this is the main problem with this method.

So if there is a problem so solution also exists. As this problem occurs with all the methods so after study all the methods we can discuss the solution to the problem.

Method 2-Using max function() and operator

As we have already discussed max() function now we will discuss the operator module in python and how to use it in our program to get key with maximum value.

operator module

The operator module exports a set of efficient functions corresponding to the intrinsic operators of Python. For example, operator.add(x,y) is equivalent to the expression x+y.

Let see with an example how we can achieve our objective with the help of max and operator module.

import operator
d={"a":1,"b":2,"c":3,"d":4}
max_key = max(d.items(), key = operator.itemgetter(1))[0]
print(max_key)

Output

d

So here is an example of how with using the max() function and operator module we get a key with maximum value in the dictionary. Here also a similar problem arises that if we have multiple keys with max value so the function returns the only key with the first occurrence.

The solution to the above problem

In both the method we see that how we only get one key if we have multiple keys with max value. So let discuss the solution. We can be done this task with the help of a single iteration over the dictionary. As we get almost one key that corresponds to the max value in the dictionary. So we can take the help of the key to get max value. Now we can iterate over the dictionary and check the key having max value and store these keys in a list and then print the list. Let see this with the help of an example.

d = {"a": 1, "b": 2, "c":4,"d":4}

max_key = max(d, key=d.get)
val=d[max_key]
l=[]
for key in d:
    if(d[key]==4):
        l.append(key)
print(l)

Output

['c', 'd']

So here we see how we can easily print a list of keys having max value.

So these are the methods to get key with max value in a python dictionary.

Python: Count Nan and Missing Values in Dataframe Using Pandas

Method to count Nan and missing value in data frames using pandas

In this article, we will discuss null values in data frames and calculate them in rows, columns, and in total. Let discuss nan or missing values in the dataframe.

NaN or Missing values

The full form of NaN is Not A Number.It is used to represent missing values in the dataframe. Let see this with an example.

students= {'students': ['Raj', 'Rahul', 'Mayank', 'Ajay', 'Amar'],
      'Marks':[90,np.nan,87,np.nan,19]}
df = pd.DataFrame(num, columns=['students','Marks'])
print(df)

Output

       students  Marks
0      Raj   90.0
1    Rahul    NaN
2   Mayank   87.0
3     Ajay    NaN
4     Amar   19.0

Here we see that there are NaN inside the Marks column that is used to represent missing values.

Reason to count Missing values or NaN values in Dataframe

One of the main reasons to count missing values is that missing values in any dataframe affects the accuracy of prediction. If there are more missing values in the dataframe then our prediction or result highly effect. Hence we calculate missing values. If there are the high count of missing values we can drop them else we can leave them as it is in dataframe.

Method to count NaN or missing values

To use count or missing value first we use a function isnull(). This function replaces all NaN value with True and non-NaN values with False which helps us to calculate the count of NaN or missing values. Let see this with the help of an example.

students= {'students': ['Raj', 'Rahul', 'Mayank', 'Ajay', 'Amar'],
      'Marks':[90,np.nan,87,np.nan,19]}
df = pd.DataFrame(num, columns=['students','Marks'])
df.isnull()

Output

  students  Marks
0      False   False
1     False   True
2     False   False
3     False   NaN
4     False   True

We get our dataframe something like this. Now we can easily calculate the count of NaN or missing values in the dataframe.

Count NaN or missing values in columns

With the help of .isnull().sum() method, we can easily calculate the count of NaN or missing values. isnull() method converts NaN values to True and non-NaN values to false and then the sum() method calculates the number of false in respective columns. Let see this with an example.

students= {'students': ['Raj', 'Rahul', 'Mayank', 'Ajay', 'Amar'],
      'Marks':[90,np.nan,87,np.nan,19]}
df = pd.DataFrame(num, columns=['students','Marks'])
df.isnull().sum()

Output

students    0
Marks       2
dtype: int64

As we also see in the dataframe that we have no NaN or missing values in the students column but we have 2 in the Marks column.

Count NaN or missing values in Rows

For this, we can iterate through each row using for loop and then using isnull().sum() method calculates NaN or missing values in all the rows. Let see this with an example.

students= {'students': ['Raj', 'Rahul', 'Mayank', 'Ajay', 'Amar'],
      'Marks':[90,np.nan,87,np.nan,19]}
df = pd.DataFrame(num, columns=['students','Marks'])
for i in range(len(df.index)) :
    print("Nan in row ", i , " : " ,  df.iloc[i].isnull().sum())

Output

Nan in row  0  :  0
Nan in row  1  :  1
Nan in row  2  :  0
Nan in row  3  :  1
Nan in row  4  :  0

Count total NaN or missing values in dataframe

In the above two examples, we see how to calculate missing values or NaN in rows or columns. Now we see how to calculate the total missing value in the dataframe For this we have to simply use isnull().sum().sum() method and we get our desired output. Let see this with help of an example.

students= {'students': ['Raj', 'Rahul', 'Mayank', 'Ajay', 'Amar'],
      'Marks':[90,np.nan,87,np.nan,19]}
df = pd.DataFrame(num, columns=['students','Marks'])
print("Total NaN values: ",df.isnull().sum().sum())

Output

Total NaN values:  2

So these are the methods tp count NaN or missing values in dataframes.

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas

Pandas: How to create an empty DataFrame and append rows & columns to it in python

Methods to create an empty data frame and append rows and column to it

In this article, we discuss a dataframe how we can create an empty dataframe and after creating an empty dataframe how can we append rows and columns in it.

Before understanding this concept let us understand some basic concepts and terminologies.

Dataframe

Dataframe is a 2D data structure in python that store or represent the data in the 2D form or simply say in tabular form. The tabular form consists of rows, columns, and actual data. To create a dataframe or to use the dataframe we have to import the pandas package in our program.

As we cannot use dataframe without pandas let see what pandas in python are.

Pandas

Pandas is a package in python that is used to analyze data in a very easy way. The reason why pandas is so famous is that it is very easy to use. But we can not directly use the pandas package in our program. To use this package first we have to import it.

DataFrame()

This is the method that is widely used in this article. Let us take a brief about this method.DataFrame() is a constructor that is used to create dataframes in pandas.

Syntax: pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)

Note:As we import pandas as pd in our program so we use pd.DataFrame() instead of pandas.DataFrame().

Now we see some theories and definitions related to pandas and dataframes let us see how we can practically implement it in our program.

In our dataframe definition, we see discuss that dataframe is consists of rows/index, columns, and data. Now think we want an empty dataframe that can be possible in 3 cases. First when there is no row and no column in the dataframe, Second when there is the only column and there when we have both rows and columns but the data value is NAN. Let us see these cases or methods one by one.

  • Method 1-Create an empty dataframe without any column and rows and then append them one by one

Let us see this method with the help of an example

import pandas as pd
df=pd.DataFrame()
print(df)

Output

Empty DataFrame
Columns: []
Index: []
Here we see that with the Dataframe() constructor we can easily create our dataframe. But our dataframe is empty as we didn’t pass any argument inside DataFrame() constructor. Now as we create our empty dataframe we can easily add columns and data to it. Let see how we can achieve this with the help of an example.
df['Name']=['Raj','Rahul','Aman']
df['Marks']=[100,98,77]
print(df)

Output

     Name  Marks
0    Raj    100
1  Rahul     98
2   Aman     77

Here Name and Marks are columns of the dataframe. Now, remember dictionary we can access and assign elements in a dictionary using a key similarly we done this task here but the pattern here is different.

  • Method 2-Create a dataframe with only a column and then append rows or indexes in it

Let us discuss this method with the help of an example.

df=pd.DataFrame(columns=['Name','Marks'])
print(df)

Output

Empty DataFrame
Columns: [Name, Marks]
Index: []

Here we see that we easily create empty dataframe bypassing columns in DataFrame() constructor. Now we have our columns so we can append rows/index in our dataframe using the append() method.

df = df.append({'Name' : 'Raj', 'Marks' : 100}, 
                ignore_index = True)
df = df.append({'Name' : 'Rahul', 'Marks' : 98},
                ignore_index = True)
df = df.append({'Name' : 'Aman', 'Marks' : 77},
               ignore_index = True)
print(df)

Output

     Name Marks
0    Raj   100
1  Rahul    98
2   Aman    77

Here we see if we have information about columns in the dataframe then we can easily add rows and data easily using the append() method. As the append() method does not change the actual dataframe so we assign the value returned by the .append() method in our original dataframe otherwise our dataframe will remain unchanged.

Note: append() method returns a new dataframe object

  • Method 3- Create an empty dataframe with column name and index/rows but no data

Let us see this method with the help of an example.

df=pd.DataFrame(columns=['Name','Marks'],index = [1,2,3])
print(df)

Output

  Name Marks
1  NaN   NaN
2  NaN   NaN
3  NaN   NaN

Here we see that we have created an empty dataframe that have both rows and column by simply passing column and index in DataFrame() constructor. Now we see how we can add data to it.

df.loc[1] = ['Raj', 100]
df.loc[2] = ['Rahul', 98]
df.loc[3] = ['Aman', 77]
print(df)

Output

      Name Marks
1    Raj   100
2  Rahul    98
3   Aman    77

If we have rows and indexes then we can add data in our dataframe using loc. loc is used to access groups of rows and columns by values.

So these are the methods to create an empty dataframe and add rows and columns to it.

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas – Creating Dataframe Objects:

Python : Join / Merge Two or More Dictionaries

Methods to merge two or more dictionaries in python

In this article, we discuss how to merge two or more dictionaries in python.

  • Method 1- Using update() method

Before understanding this let us understand how the update() method works in python.

Update() method in python

update() method is to update one of the dictionaries with the key and value of another dictionary. If the key is not present in one dictionary then that key will be added in the dictionary otherwise if the key is already present then that key-value pair is updated in the dictionary.

Syntax: d1.update(d2)

Let us understand this with the help of an example.

d1={"a":1,"b":2,"c":3,"e":5}
d2={"c":4,"f":6}
d1.update(d2)
print(d1)

Output

{'a': 1, 'b': 2, 'c': 4, 'e': 5, 'f': 6}

Explanation

In this example we see that key “c” is present in both d1 and d2 hence this value at this key is updated while other key-value normally add in the dictionary. The second thing we noticed that it is an in-place method that means no new dictionary is returned by the method and the changes are done in the dictionary itself.

We see that the update method easily merges two dictionaries. So this is how the update method work.

  • Method 2-Using **kwargs

Before understanding this method let us see how **kwargs works in python.

**Kwargs

**Kwargs in python means keyword argument i.e. is used to pass a keyworded, variable-length argument list. **  allows us to pass multiple arguments to a function. This argument creates a dictionary inside the function and then expands it. Let us understand this with an example.

d1={"a":1,"b":2,"c":3,"e":5}
d2={"c":4,"f":6}
d3={**d1,**d2}
print(d3)

Output

{'a': 1, 'b': 2, 'c': 4, 'e': 5, 'f': 6}

Explanation

**d1 & **d2 expanded the contents of both the dictionaries to a collection of key-value pairs.

d3={"a":1,"b":2,"c":3,"e":5,"c":4,"f":6}

This method work in this way. When we use ** with a dictionary it expands like this as shown above. Here we also see that key “c” is common in both the dictionary hence key-value pair of one dictionary gets updated with another dictionary.

Note: We can pass as many as an argument in this method.

d1={"a":1,"b":2,"c":3,"e":5}
d2={"c":4,"f":6}
d3={"g":7,"h":8}
d4={"i":9,"c":10,"k":11}
d5={**d1,**d2,**d3,**d4}
print(d5)

Output

{'a': 1, 'b': 2, 'c': 10, 'e': 5, 'f': 6, 'g': 7, 'h': 8, 'i': 9, 'k': 11}

Here we pass 4 arguments and we get the perfect result. This is also one of the main advantages of this method.

So these are the methods to merge two or more dictionaries in python.

Problem with these methods and their solution

In all the method that we discussed till now, we have faced an issue that if the key we get in two or more dictionaries then the key-value get updated. This can be a major issue if we want to take account of all the key-value pairs in the dictionary. There is no specific method to solve this problem but with our knowledge of python programming, we can solve this issue and also make a user-defined method for this.

d1={"a":1,"b":2,"c":3,"e":5}
d2={"c":4,"f":6}
d3 = {**d1, **d2}
for key, value in d3.items():
    if key in d1 and key in d2:
        d3[key] = [value , d1[key]]
print(d3)

Output

{'a': 1, 'b': 2, 'c': [4, 3], 'e': 5, 'f': 6}

Explanation:

First, we de-serialize the contents of the dictionary to a collection of key/value pairs and store it in d3 as seen before. Then we traverse through the elements of the dictionary d3 and check if we get the same key multiple times. If yes then we can store them in the list and our work will be done.