Author name: Satyabrata Jena

Create Numpy Array of different shapes & initialize with identical values using numpy.full() in Python

Creating Numpy Array of different shapes & initialize with identical values using numpy.full()

In this article we will see how we can create a numpy array of different shapes but initialized with identical values. So, let’s start the explore the concept to understand it well.

numpy.full() :

Numpy module provides a function numpy.full() to create a numpy array of given shape and initialized with a given value.

Syntax : numpy.full(shape, given_value, dtype=None, order='C')

Where,

  • shape : Represents shape of the array.
  • given_value : Represents initialization value.
  • dtype : Represents the datatype of elements(Optional).

But to use Numpy we have to include following module i.e.

import numpy as np
Let’s see the below example to understand the concept.

Example-1 : Create a 1D Numpy Array of length 8 and all elements initialized with value 2

Here array length is 8 and array elements to be initialized with 2.

Let’s see the below the program.

import numpy as np
# 1D Numpy Array created of length 8 & all elements initialized with value 2
sample_arr = np.full(8,2)
print(sample_arr)
Output :
[2 2 2 2 2 2 2 2]

Example-2 : Create a 2D Numpy Array of 3 rows | 4 columns and all elements initialized with value 5

Here 2D array of row 3 and column 4 and array elements to be initialized with 5.

Let’s see the below the program.

import numpy as np
#Create a 2D Numpy Array of 3 rows & 4 columns. All intialized with value 5
sample_arr = np.full((3,4), 5)
print(sample_arr)
Output :
[[5 5 5 5]
[5 5 5 5]
[5 5 5 5]]

Example-3 : Create a 3D Numpy Array of shape (3,3,4) & all elements initialized with value 1

Here initialized value is 1.

Let’s see the below the program.

import numpy as np
# Create a 3D Numpy array & all elements initialized with value 1
sample_arr = np.full((3,3,4), 1)
print(sample_arr)
Output :

[[[1 1 1 1]
[1 1 1 1]
[1 1 1 1]]

[[1 1 1 1]
[1 1 1 1]
[1 1 1 1]]

[[1 1 1 1]
[1 1 1 1]
[1 1 1 1]]]

Example-4 : Create initialized Numpy array of specified data type

Here, array length is 8 and value is 4 and data type is float.

import numpy as np
# 1D Numpy array craeted & all float elements initialized with value 4
sample_arr = np.full(8, 4, dtype=float)
print(sample_arr)
Output :
[4. 4. 4. 4. 4. 4. 4. 4.]

Create Numpy Array of different shapes & initialize with identical values using numpy.full() in Python Read More »

numpy.arange() : Create a Numpy Array of evenly spaced numbers in Python

Creating a Numpy Array of evenly spaced numbers in Python

This article is all about In this article creating a Numpy array of evenly spaced numbers over a given interval using the function numpy.arrange().

numpy.arrange() :

Numpy module in python provides a function numpy.arrange() to create an Numpy Array of evenly space elements within a given interval. Actually it returns an evenly spaced array of numbers from the range-array starting from start point to stop point with equal step intervals.

Syntax- numpy.arange([start, ]stop, [step, ]dtype=None)

Where 

  • start : Represents the start value of range. (As it’s optional, so if not provided it will consider default value as 0)
  • stop : Represents the end value of range. (Actually it doesn’t include this value but it acts as an end point indicator)
  • step : Represents the spacing between two adjacent values. (As it’s optional, so if not provided it will consider default value as 1)
  • dtype : Represents data type of elements.

But to use this Numpy module we have to import following module i.e.

import numpy as np
Let’s see below examples to understand the concept well.

Example-1 : Create a Numpy Array containing numbers from 4 to 20 but at equal interval of 2

Here start is 4, stop is 20 and step is 2.
So let’s see the below program.
import numpy as np
# Start = 4, Stop = 20, Step Size = 2
#achieving the result using arrange()
sample_arr = np.arange(4, 20, 2)
#printing the output
print(sample_arr)
Output :
[ 4 6 8 10 12 14 16 18]

Example-2 : Create a Numpy Array containing numbers from 1 to 15 but at default interval of 1

Here start is 1, stop is 15 and step is default i.e 1.
So let’s see the below program.
import numpy as np
# Start = 1, Stop = 15, Step Size = 1(default)
#achieving the result using arrange()
sample_arr = np.arange(1, 15)
#printing the output
print(sample_arr)
Output :
[ 1 2 3 4 5 6 7 8 9 10 11 12 13 14]

Example-3 : Create a Numpy Array containing numbers upto 15 but start and step is default

Here start is default i.e 0, stop is 15 and step is default i.e 1.
So let’s see the below program.
import numpy as np
# Start = 0(default), Stop = 15, Step Size = 1(default)
#achieving the result using arrange()
sample_arr = np.arange(15)
#printing the output
print(sample_arr)
Output :
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14]

numpy.arange() : Create a Numpy Array of evenly spaced numbers in Python Read More »

Find max value & its index in Numpy Array | numpy.amax()

Finding max value and it’s index in Numpy Array

In this article we will discuss about how we can find max value and the index of that max value in Numpy array using numpy.amx().

numpy.amax( ) :

Syntax-numpy.amax(arr, axis=None, out=None, keepdims=<no value>, initial=<no value>)

Parameters :

  1. arr: Numpy array
  2. axis: This is an optional parameter unless provided flattens the array.

(Default: Returns the array of max values)

  1. Axis = 0: Returns array containing max values of each columns
  2. Axis = 1: Returns array containing max values of each rows

Let’s see one by one how to find it in 1D and 2D Numpy array.

Maximum value & its index in a 1D Numpy Array:

Let’s create a 1D numpy array from a list given below and find the maximum values and its index

Find maximum value:

To find the maximum value in the array, we can use numpy.amax( ) function and pass the array as function to it.

[10,5,19,56,87,96,74,15,50,12,98]
import numpy as np

# Finding the maximum value inside an array using amax( )
arr = np.array([10, 5, 19, 56, 87, 96, 74, 15, 50, 12, 98])
maxElem = np.amax(arr)
print("Max element : ", maxElem)
Output :
Max element :  98

Find index of maximum value :

To get the index of the max value in the array, we will we have to use the where( ) function from the numpy library.

CODE:

import numpy as np

# Index of the maximum element
arr = np.array([10, 5, 19, 56, 87, 96, 74, 15, 50, 12, 98])
maxElem = np.amax(arr)
print("Max element : ", maxElem)
res = np.where(arr == np.amax(arr))
print("Returned result  :", res)
print("List of Indices of maximum element :", res[0])
Output :
Max element :  98
Returned result  : (array([10], dtype=int32),)
List of Indices of maximum element : [10]

Here, when we called the where( ) function, it returned us with a tuple of  arrays containing the indices of all the  values that follow our conditions.

Find maximum value & its index in a 2D Numpy Array

We will use the following 2D array to demonstrate.

{50,59,54}

{45,46,78}

{98,20,24}

Find max value in complete 2D numpy array :

When we are going to find the max value in a 2D numpy array, we can either do it by finding a single value or we can find column wise or row wise.

CODE:

import numpy as np

arr = np.array([[50, 59, 54], [45, 46, 78], [98, 20, 24]])
# Get the maximum value from the 2D array

maxValue = np.amax(arr)
print("The maximum value inside the array is", maxValue)
Output :
The maximum value inside the array is 98

Column or Row wise value

To find the max value per each row, we can  pass axis =1 and for columns we can use axis =0.

CODE:

import numpy as np

arr = np.array([[50, 59, 54], [45, 46, 78], [98, 20, 24]])
# Get the maximum valuevin rows
maxRows = np.amax(arr, axis=1)
# Get the maximum valuevin columns
maxColumns = np.amax(arr, axis=0)

print(
    "The maximum values in rows are : ",
    maxRows,
    " and the maximum value in columns are : ",
    maxColumns,
)
Output :
The maximum values in rows are :  [59 78 98]  and the maximum value in columns are :  [98 59 78]

Find index of maximum value from 2D numpy array:

CODE:

import numpy as np

arr = np.array([[50, 59, 54], [45, 46, 78], [98, 20, 24]])
# Get the index of max value inside the 2D array
res = np.where(arr == np.amax(arr))
print("Tuple :", res)
print("Now Coordinates of max value in 2D array :")
# zipping both the arrays to find the coordinates
Coordinates = list(zip(res[0], res[1]))

for elem in Coordinates:
    print(elem)

Output :
Tuple : (array([2], dtype=int32), array([0], dtype=int32))
Now Coordinates Of max value in 2D array :
(2, 0)

numpy.amax() & NaN :

amax( ) also propagates the NaN values , which means if there is a NaN value present in the numpy array, then the max value returned by the function will be NaN.

import numpy as np

arr = np.array([[50, 59, np.NaN], [45, 46, 78], [98, 20, 24]])
# amax( ) propagating the NaN values

print("The max element in the numpy array is :", np.amax(arr))
Output :
The max element in the numpy array is : nan

 

 

 

Find max value & its index in Numpy Array | numpy.amax() Read More »

Pandas : How to Merge Dataframes using Dataframe.merge() in Python – Part 1

Merging Dataframes using Dataframe.merge() in Python

In this article, we will learn to merge two different DataFrames into a single one using function Dataframe.merge().

Dataframe.merge() :

Dataframe class of Python’s Pandas library provide a function i.e. merge() which helps in merging of two DataFrames.

Syntax:- DataFrame.merge(right, how='inner', on=None, leftOn=None, rightOn=None, left_index=False, right_index=False, sort=False, suffix=('_x', '_y'), copy=True, indicate=False, validate=None)

Arguments:-

  • right : A dataframe that is to be merged with the calling dataframe.
  • how : (Merge type). Some values are : left, right, outer, inner. It’s default value is ‘inner’. If the two dataframes contains different columns, then based how value, columns will be considered accordingly for merging.
  • on : It is the column name on which merge will be done. If not provided then merged done on basis of indexes.
  • left_on : Column in left dataframe where merging is to be done.
  • right_on : Column in right datframe, where merging is to be done.
  • left_index : (bool), default is False (If found True index index from left dataframe selected as join key)
  • right_index : (bool), default is False (If found True index index from right dataframe selecte as join key)
  • suffixes : tuple of (str, str), default (‘_x’, ‘_y’)
  • Suffix that is to be applied on overlapping columns in left and right dataframes respectively.

Let’s see one by one

Merge DataFrames on common columns (Default Inner Join) :

If we have two DataFrames of two common columns, by directly calling merge()  function the two columns will be merged considering common columns as join keys and the dissimilar columns would just be copied from one dataframe to another dataframe.

Let’s see the below program to understand it clearly.

import pandas as sc
# List of Tuples
players = [(15,'Smith','Pune', 17,12000),
            (99,'Rana', 'Mumbai', 20,2000),
            (51,'Jaydev','Kolkata', 22,25640),
            (31,'Shikhar','Hyderabad', 28,85410),
            (12,'Sanju','Rajasthan', 21,63180),
            (35,'Raina','Gujarat', 18,62790)
            ]
# Creation of DataFrame object
playDFObj = sc.DataFrame(players, columns=['JersyN','Name', 'Team', 'Age','Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 1 : ')
print(playDFObj)
moreInfo = [(15, 13, 180000, 12000) ,
           (99, 2, 195200, 2000) ,
           (51, 7, 15499, 25640) ,
           (31, 17, 654000, 85410) ,
           (12, 5, 201000, 63180) ,
           (35, 14, 741000, 62790)
            ]
# Creation of DataFrame object
moreinfoObj = sc.DataFrame(moreInfo, columns=['JersyN', 'PLayingSince' , 'Salary', 'Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 2 : ')
print(moreinfoObj)
# Merge two Dataframes on basis of common column by default INNER JOIN
mergedDataf = playDFObj.merge(moreinfoObj)
print(mergedDataf)
Output :
DataFrame 1 :
     JersyN     Name       Team  Age  Sponsered
I        15    Smith       Pune   17      12000
II       99     Rana     Mumbai   20       2000
III      51   Jaydev    Kolkata   22      25640
IV       31  Shikhar  Hyderabad   28      85410
V        12    Sanju  Rajasthan   21      63180
VI       35    Raina    Gujarat   18      62790
DataFrame 2 :
     JersyN  PLayingSince  Salary  Sponsered
I        15            13  180000      12000
II       99             2  195200       2000
III      51             7   15499      25640
IV       31            17  654000      85410
V        12             5  201000      63180
VI       35            14  741000      62790
   JersyN     Name       Team  Age  Sponsered  PLayingSince  Salary
0      15    Smith       Pune   17      12000            13  180000
1      99     Rana     Mumbai   20       2000             2  195200
2      51   Jaydev    Kolkata   22      25640             7   15499
3      31  Shikhar  Hyderabad   28      85410            17  654000
4      12    Sanju  Rajasthan   21      63180             5  201000
5      35    Raina    Gujarat   18      62790            14  741000

What is Inner Join ?

In above case, inner join occured for key columns i.e. ‘JersyN’ & ‘Sponsered’. During inner join the common columns of two dataframes are picked and merged. We can also explicitly do inner join by passing how argument with values as inner. After implementing both the cases will have same result.

Merge Dataframes using Left Join :

What is left join ?

While merging columns we can include all rows from left DataFrame and NaN from which values are missing in right DataFrame.

Let’s see the below program to understand it clearly.

import pandas as sc
# List of Tuples
players = [(15,'Smith','Pune', 17,12000),
            (99,'Rana', 'Mumbai', 20,2000),
            (51,'Jaydev','Kolkata', 22,25640),
            (31,'Shikhar','Hyderabad', 28,85410),
            (12,'Sanju','Rajasthan', 21,63180),
            (35,'Raina','Gujarat', 18,62790)
            ]
# Creation of DataFrame object
playDFObj = sc.DataFrame(players, columns=['JersyN','Name', 'Team', 'Age','Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 1 : ')
print(playDFObj)
moreInfo = [(15, 13, 180000, 12000) ,
           (99, 2, 2000) ,
           (51, 7, 15499, 25640) ,
           (31, 17, 654000) ,
           (12, 5, 201000, 63180) ,
           (35, 14, 741000, 62790)
            ]
# Creation of DataFrame object
moreinfoObj = sc.DataFrame(moreInfo, columns=['JersyN', 'PLayingSince' , 'Salary', 'Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 2 : ')
print(moreinfoObj)
# Merge two Dataframes on basis of common column by default INNER JOIN
mergedDataf = playDFObj.merge(moreinfoObj, how='left')
print('After merging: ')
print(mergedDataf)
Output :
DataFrame 1 :
     JersyN     Name       Team  Age  Sponsered
I        15    Smith       Pune   17      12000
II       99     Rana     Mumbai   20       2000
III      51   Jaydev    Kolkata   22      25640
IV       31  Shikhar  Hyderabad   28      85410
V        12    Sanju  Rajasthan   21      63180
VI       35    Raina    Gujarat   18      62790
DataFrame 2 :
     JersyN  PLayingSince  Salary  Sponsered
I        15            13  180000    12000.0
II       99             2    2000        NaN
III      51             7   15499    25640.0
IV       31            17  654000        NaN
V        12             5  201000    63180.0
VI       35            14  741000    62790.0
After merging:
   JersyN     Name       Team  Age Sponsered  PLayingSince    Salary
0      15    Smith       Pune   17     12000          13.0  180000.0
1      99     Rana     Mumbai   20      2000           NaN       NaN
2      51   Jaydev    Kolkata   22     25640           7.0   15499.0
3      31  Shikhar  Hyderabad   28     85410           NaN       NaN
4      12    Sanju  Rajasthan   21     63180           5.0  201000.0
5      35    Raina    Gujarat   18     62790          14.0  741000.0

Merge DataFrames using Right Join :

What is Right join ?

While merging columns we can include all rows from right DataFrame and NaN from which values are missing in left DataFrame.

Let’s see the below program to understand it clearly.

import pandas as sc
# List of Tuples
players = [(15,'Smith','Pune', 17,12000),
            (99,'Rana', 'Mumbai', 20,2000),
            (51,'Jaydev','Kolkata', 22,25640),
            (31,'Shikhar','Hyderabad', 28,85410),
            (12,'Sanju','Rajasthan', 21,63180),
            (35,'Raina','Gujarat', 18,62790)
            ]
# Creation of DataFrame object
playDFObj = sc.DataFrame(players, columns=['JersyN','Name', 'Team', 'Age','Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 1 : ')
print(playDFObj)
moreInfo = [(15, 13, 180000, 12000) ,
           (99, 2, 2000) ,
           (51, 7, 15499, 25640) ,
           (31, 17, 654000) ,
           (12, 5, 201000, 63180) ,
           (35, 14, 741000, 62790)
            ]
# Creation of DataFrame object
moreinfoObj = sc.DataFrame(moreInfo, columns=['JersyN', 'PLayingSince' , 'Salary', 'Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 2 : ')
print(moreinfoObj)
# Merge two Dataframes on basis of common column by default INNER JOIN
mergedDataf = playDFObj.merge(moreinfoObj, how='right')
print('After merging: ')
print(mergedDataf)
Output :
DataFrame 1 :
     JersyN     Name       Team  Age  Sponsered
I        15    Smith       Pune   17      12000
II       99     Rana     Mumbai   20       2000
III      51   Jaydev    Kolkata   22      25640
IV       31  Shikhar  Hyderabad   28      85410
V        12    Sanju  Rajasthan   21      63180
VI       35    Raina    Gujarat   18      62790
DataFrame 2 :
     JersyN  PLayingSince  Salary  Sponsered
I        15            13  180000    12000.0
II       99             2    2000        NaN
III      51             7   15499    25640.0
IV       31            17  654000        NaN
V        12             5  201000    63180.0
VI       35            14  741000    62790.0
After merging:
   JersyN    Name       Team   Age  Sponsered  PLayingSince  Salary
0      15   Smith       Pune  17.0    12000.0            13  180000
1      51  Jaydev    Kolkata  22.0    25640.0             7   15499
2      12   Sanju  Rajasthan  21.0    63180.0             5  201000
3      35   Raina    Gujarat  18.0    62790.0            14  741000
4      99     NaN        NaN   NaN        NaN             2    2000
5      31     NaN        NaN   NaN        NaN            17  654000

Merge DataFrames using Outer Join :

What is Outer join ?

While merging columns of two dataframes, we can even include all rows of two DataFrames and add NaN for the values missing in left or right DataFrame.

Let’s see the below program to understand it clearly.

import pandas as sc
# List of Tuples
players = [(15,'Smith','Pune', 17,12000),
            (99,'Rana', 'Mumbai', 20,2000),
            (51,'Jaydev','Kolkata', 22,25640),
            (31,'Shikhar','Hyderabad', 28,85410),
            (12,'Sanju','Rajasthan', 21,63180),
            (35,'Raina','Gujarat', 18,62790)
            ]
# Creation of DataFrame object
playDFObj = sc.DataFrame(players, columns=['JersyN','Name', 'Team', 'Age','Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 1 : ')
print(playDFObj)
moreInfo = [(15, 13, 180000, 12000) ,
           (99, 2, 2000) ,
           (51, 7, 15499, 25640) ,
           (31, 17, 654000) ,
           (12, 5, 201000, 63180) ,
           (35, 14, 741000, 62790)
            ]
# Creation of DataFrame object
moreinfoObj = sc.DataFrame(moreInfo, columns=['JersyN', 'PLayingSince' , 'Salary', 'Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 2 : ')
print(moreinfoObj)
# Merge two Dataframes on basis of common column by default INNER JOIN
mergedDataf = playDFObj.merge(moreinfoObj, how='outer')
print('After merging: ')
print(mergedDataf)
Output :
DataFrame 1 :
     JersyN     Name       Team  Age  Sponsered
I        15    Smith       Pune   17      12000
II       99     Rana     Mumbai   20       2000
III      51   Jaydev    Kolkata   22      25640
IV       31  Shikhar  Hyderabad   28      85410
V        12    Sanju  Rajasthan   21      63180
VI       35    Raina    Gujarat   18      62790
DataFrame 2 :
     JersyN  PLayingSince  Salary  Sponsered
I        15            13  180000    12000.0
II       99             2    2000        NaN
III      51             7   15499    25640.0
IV       31            17  654000        NaN
V        12             5  201000    63180.0
VI       35            14  741000    62790.0
After merging:
   JersyN     Name       Team   Age  Sponsered  PLayingSince    Salary
0      15    Smith       Pune  17.0    12000.0          13.0  180000.0
1      99     Rana     Mumbai  20.0     2000.0           NaN       NaN
2      51   Jaydev    Kolkata  22.0    25640.0           7.0   15499.0
3      31  Shikhar  Hyderabad  28.0    85410.0           NaN       NaN
4      12    Sanju  Rajasthan  21.0    63180.0           5.0  201000.0
5      35    Raina    Gujarat  18.0    62790.0          14.0  741000.0
6      99      NaN        NaN   NaN        NaN           2.0    2000.0
7      31      NaN        NaN   NaN        NaN          17.0  654000.0

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas

Pandas : How to Merge Dataframes using Dataframe.merge() in Python – Part 1 Read More »

Pandas : Merge Dataframes on specific columns or on index in Python – Part 2

Merge Dataframes on specific columns or on index in Python

In this article, we will learn to merge dataframes on basis of given columns or index.

Dataframe.merge() :

Dataframe class of Python’s Pandas library provide a function i.e. merge() which helps in merging of two DataFrames.

Syntax: DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)

Arguments:-

  • right : A dataframe that is to be merged with the calling dataframe.
  • how : (Merge type). Some values are : left, right, outer, inner. Its default value is ‘inner’. If the two dataframes contains different columns, then based how value, columns will be considered accordingly for merging.
  • on : It is the column name on which merge will be done. If not provided then merged done on basis of indexes.
  • left_on : Column in left dataframe where merging is to be done.
  • right_on : Column in right dataframe, where merging is to be done.
  • left_index : (bool), default is False (If found True index index from left dataframe selected as join key)
  • right_index : (bool), default is False (If found True index index from right dataframe selected as join key)
  • suffixes : tuple of (str, str), default (‘_x’, ‘_y’) (Suffix that is to be applied on overlapping columns in left and right dataframes respectively.)

Merging Dataframe on a given column name as join key :

Let’s take a scenario where the columns names are same, but contents are different i.e. one column data is of int type and other column data is of string type. And if we apply merge() on them without passing any argument, it wouldn’t merge here. Here, we can merge dataframes on a single column by passing on argument in merge() function.

And as both dataframes have common column i.e. sponsered, so after merging columns are named by default. It will splitted by taking a suffix  i.e. Sponsered_x and Sponsered_y as left and right dataframe respectively.

import pandas as sc
# List of Tuples
players = [(15,'Smith','Pune', 17,12000),
            (99,'Rana', 'Mumbai', 20,2000),
            (51,'Jaydev','Kolkata', 22,25640),
            (31,'Shikhar','Hyderabad', 28,85410),
            (12,'Sanju','Rajasthan', 21,63180),
            (35,'Raina','Gujarat', 18,62790)
            ]
# Creation of DataFrame object
playDFObj = sc.DataFrame(players, columns=['JersyN','Name', 'Team', 'Age','Salary'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 1 : ')
print(playDFObj)
# List of Tuples
moreInfo = [(15, 13, 180000, 'Nissin') ,
           (99, 2, 195200, 'Jio') ,
           (51, 7, 15499, 'Lays') ,
           (31, 17, 654000, 'AmbujaC') ,
           (12, 5, 201000, 'AsianP') ,
           (35, 14, 741000, 'Airtel')
            ]
# Creation of DataFrame object
moreinfoObj = sc.DataFrame(moreInfo, columns=['JersyN', 'PLayingSince' , 'Salary', 'Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 2 : ')
print(moreinfoObj)
# Merge two Dataframes on basis of common column by default INNER JOIN
mergedDataf = playDFObj.merge(moreinfoObj, on='JersyN')
print('After merging: ')
print(mergedDataf)
Output :

DataFrame 1 : 
 JersyN   Name    Team       Age    Salary
I    15    Smith      Pune         17    12000
II   99    Rana       Mumbai    20    2000
III  51    Jaydev    Kolkata      22   25640
IV  31   Shikhar   Hyderabad 28   85410
V  12    Sanju      Rajasthan   21   63180
VI  35    Raina     Gujarat       18   62790
DataFrame 2 : 
   JersyN PLayingSince   Salary       Sponsered
I    15             13            180000       Nissin
II    99            2              195200         Jio
III   51            7              15499          Lays
IV   31          17              654000     AmbujaC
V   12           5                201000       AsianP
VI   35          14              741000         Airtel
After merging: 
  JersyN     Name    Team            Age Salary_x    PLayingSince Salary_y Sponsered
0  15          Smith    Pune            17       12000       13              180000   Nissin
1  99          Rana     Mumbai       20       2000          2               195200   Jio
2  51          Jaydev   Kolkata        22       25640        7               15499    Lays
3  31          Shikhar  Hyderabad   28      85410       17              654000  AmbujaC
4  12          Sanju     Rajasthan     21      63180        5               201000  AsianP
5  5           Raina    Gujarat           18      62790       14              741000  Airtel

Merging Dataframe on a given column with suffix for similar column names :

In previous example, for common columns with dissimilar contents suffix x & y are added. We can also add our own custom suffix.

import pandas as sc
# List of Tuples
players = [(15,'Smith','Pune', 17,12000),
            (99,'Rana', 'Mumbai', 20,2000),
            (51,'Jaydev','Kolkata', 22,25640),
            (31,'Shikhar','Hyderabad', 28,85410),
            (12,'Sanju','Rajasthan', 21,63180),
            (35,'Raina','Gujarat', 18,62790)
            ]
# Creation of DataFrame object
playDFObj = sc.DataFrame(players, columns=['JersyN','Name', 'Team', 'Age','Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 1 : ')
print(playDFObj)
# List of Tuples
moreInfo = [(15, 13, 180000, 'Nissin') ,
           (99, 2, 195200, 'Jio') ,
           (51, 7, 15499, 'Lays') ,
           (31, 17, 654000, 'AmbujaC') ,
           (12, 5, 201000, 'AsianP') ,
           (35, 14, 741000, 'Airtel')
            ]
# Creation of DataFrame object
moreinfoObj = sc.DataFrame(moreInfo, columns=['JersyN', 'PLayingSince' , 'Salary', 'Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 2 : ')
print(moreinfoObj)
# Merge two Dataframes on basis of common column by default INNER JOIN
mergedDataf = playDFObj.merge(moreinfoObj, on='JersyN',suffixes=('_Price', '_Companies'))
print('After merging: ')
print(mergedDataf)
Output :
DataFrame 1 : 
  JersyN Name  Team          Age Sponsered
I 15       Smith   Pune           17   12000
II 99      Rana     Mumbai     20   2000
III 51     Jaydev   Kolkata      22   25640
IV 31    Shikhar  Hyderabad 28   85410
V 12     Sanju     Rajasthan   21   63180
VI 35    Raina     Gujarat      18    62790
DataFrame 2 : 
  JersyN PLayingSince Salary     Sponsered
I   15      13                 180000   Nissin
II  99     2                    195200    Jio
III  51     7                   15499      Lays
IV  31   17                   654000   AmbujaC
V  12     5                    201000    AsianP
VI  35   14                   741000     Airtel
After merging: 
JersyN Name Team ... PLayingSince Salary Sponsered_Companies
0 15 Smith Pune ... 13 180000 Nissin
1 99 Rana Mumbai ... 2 195200 Jio
2 51 Jaydev Kolkata ... 7 15499 Lays
3 31 Shikhar Hyderabad ... 17 654000 AmbujaC
4 12 Sanju Rajasthan ... 5 201000 AsianP
5 35 Raina Gujarat ... 14 741000 Airtel

Merging Dataframe different columns :

Now let’s take a scenario of changing name of JersyN column of a dataframe and try to merge it with another dataframe.

import pandas as sc
# List of Tuples
players = [(15,'Smith','Pune', 17,12000),
            (99,'Rana', 'Mumbai', 20,2000),
            (51,'Jaydev','Kolkata', 22,25640),
            (31,'Shikhar','Hyderabad', 28,85410),
            (12,'Sanju','Rajasthan', 21,63180),
            (35,'Raina','Gujarat', 18,62790)
            ]
# Creation of DataFrame object
playDFObj = sc.DataFrame(players, columns=['JersyN','Name', 'Team', 'Age','Salary'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 1 : ')
print(playDFObj)
# List of Tuples
moreInfo = [(15, 13, 180000, 'Nissin') ,
           (99, 2, 195200, 'Jio') ,
           (51, 7, 15499, 'Lays') ,
           (31, 17, 654000, 'AmbujaC') ,
           (12, 5, 201000, 'AsianP') ,
           (35, 14, 741000, 'Airtel')
            ]
# Creation of DataFrame object
moreinfoObj = sc.DataFrame(moreInfo, columns=['JersyN', 'PLayingSince' , 'Salary', 'Sponsered'], index=['I', 'II', 'III', 'IV', 'V', 'VI'])
print('DataFrame 2 : ')
print(moreinfoObj)
# Rename column JersyN to ShirtN
moreinfoObj.rename(columns={'JersyN': 'ShirtN'}, inplace=True)
# Merge two Dataframes on basis of common column by default INNER JOIN
mergedDataf = playDFObj.merge(moreinfoObj, left_on='JersyN', right_on='ShirtN')
print('After merging: ')
print(mergedDataf)
Output ;
DataFrame 1 : 
 JersyN Name Team         Age   Salary
I 15    Smith     Pune           17   12000
II 99   Rana      Mumbai      20    2000
III 51  Jaydev   Kolkata        22    25640
IV 31  Shikhar  Hyderabad  28    85410
V 12   Sanju     Rajasthan     21   63180
VI 35  Raina     Gujarat        18    62790
DataFrame 2 : 
   JersyN  PLayingSince   Salary    Sponsered
I   15              13           180000    Nissin
II   99             2             195200    Jio
III  51             7             15499     Lays
IV  31           17             654000   AmbujaC
V  12             5              201000   AsianP
VI  35           14             741000   Airtel
After merging: 
JersyN Name Team        Age ... ShirtN PLayingSince Salary Sponsered_y
0 15 Smith   Pune            17 ...   15           13             180000   Nissin
1 99 Rana     Mumbai      20 ...   99            2              195200   Jio
2 51 Jaydev  Kolkata       22 ...    51            7              15499    Lays
3 31 Shikhar Hyderabad 28 ...    31            17            654000  AmbujaC
4 12 Sanju    Rajasthan    21 ...   12             5             201000  AsianP
5 35 Raina    Gujarat       18 ...    35            14            741000  Airtel

[6 rows x 9 columns]

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas

Pandas : Merge Dataframes on specific columns or on index in Python – Part 2 Read More »

Pandas : Convert Dataframe column into an index using set_index() in Python

Converting Dataframe column into an index using set_index() in Python

In this article we will learn to convert an existing column of Dataframe to a index including various cases. We can implement this using set_index() function of Pandas Dataframe class.

DataFrame.set_index() :

Syntax:- DataFrame.set_index(self, keys, drop=True, append=False, inplace=False, verify_integrity=False)

Arguments:

  • Keys: Column names that we want to set as an index of dataframe.
  • drop: (bool), default is True
  1. Where found True, after converting as an index column is deleted
  2. Where found False, then column is not deleted
  • append: (bool), default is False (If passed as True, then adds the given column is added to the existing index, and if passed as False, then current Index is replaced with it.)
  • inplace: (bool), in default is False (If passed as True then makes changes in the calling dataframe object otherwise if it is False, then returns a copy of modified dataframe.)
  • verify_integrity: (bool), default is False
  1. If True, searches for duplicate entries in new index.
  • Dataframe has a default index and we can give a name e.g. SL
import pandas as sc
# List of Tuples
players = [('Smith', 15, 'Pune', 170000),
            ('Rana', 99, 'Mumbai', 118560),
            ('Jaydev', 51, 'Kolkata', 258741),
            ('Shikhar', 31, 'Hyderabad', 485169),
            ('Sanju', 12, 'Rajasthan', 150000),
            ('Raina', 35, 'Gujarat', 250000)
            ]
# Creation of DataFrame object
playDFObj = sc.DataFrame(players, columns=['Name', 'JerseyN', 'Team', 'Salary'])
# Renaming index of dataframe as 'SL'
playDFObj.index.rename('SL', inplace=True)
print('Original Dataframe: ')
print(playDFObj)
Output :
Original Dataframe: 
Name JerseyN Team Salary
SL 
0 Smith 15 Pune 170000
1 Rana 99 Mumbai 118560
2 Jaydev 51 Kolkata 258741
3 Shikhar 31 Hyderabad 485169
4 Sanju 12 Rajasthan 150000
5 Raina 35 Gujarat 250000

Converting a column of Dataframe into an index of the Dataframe :

Let’s try to convert of column Name into index of dataframe. We can implement this by passing that column name into set_index. Here the column names would be converted to ‘Name’ deleting old index.

Here it only it changes is made in the copy of dataframe without modifying original dataframe.

import pandas as sc
# List of Tuples
players = [('Smith', 15, 'Pune', 170000),
            ('Rana', 99, 'Mumbai', 118560),
            ('Jaydev', 51, 'Kolkata', 258741),
            ('Shikhar', 31, 'Hyderabad', 485169),
            ('Sanju', 12, 'Rajasthan', 150000),
            ('Raina', 35, 'Gujarat', 250000)
            ]
# Creation DataFrame object
playDFObj = sc.DataFrame(players, columns=['Name', 'JerseyN', 'Team', 'Salary'])
# Renaming index of dataframe as 'SL'
playDFObj.index.rename('SL', inplace=True)
print('Original Dataframe: ')
print(playDFObj)
# set column 'Name' as the index of the Dataframe
modifplayDF = playDFObj.set_index('Name')
print('Modified Dataframe of players:')
print(modifplayDF)
Output :
Original Dataframe: 
Name JerseyN Team Salary
SL 
0 Smith   15  Pune            170000
1 Rana     99  Mumbai      118560
2 Jaydev  51  Kolkata        258741
3 Shikhar 31  Hyderabad 485169
4 Sanju    12  Rajasthan   150000
5 Raina    35  Gujarat       250000

Modified Dataframe of players:
JerseyN Team Salary
Name 
Smith    15 Pune           170000
Rana     99 Mumbai      118560
Jaydev  51 Kolkata        258741
Shikhar 31 Hyderabad  485169
Sanju    12 Rajasthan    150000
Raina    35 Gujarat        250000

Converting a column of Dataframe into index without deleting the column :

In this case we will try to keep the column name and also index as ‘Name’ by passing drop argument as false.

import pandas as sc
# List of Tuples
players = [('Smith', 15, 'Pune', 170000),
            ('Rana', 99, 'Mumbai', 118560),
            ('Jaydev', 51, 'Kolkata', 258741),
            ('Shikhar', 31, 'Hyderabad', 485169),
            ('Sanju', 12, 'Rajasthan', 150000),
            ('Raina', 35, 'Gujarat', 250000)
            ]
# Creation of DataFrame object
playDFObj = sc.DataFrame(players, columns=['Name', 'JerseyN', 'Team', 'Salary'])
playDFObj.index.rename('ID', inplace=True)
# keep column name and index as 'Name'
modifplayDF = playDFObj.set_index('Name', drop=False)
print('Modified Dataframe of players:')
print(modifplayDF)
Output :
Modified Dataframe of players:
Name  JerseyN       Team  Salary
Name                                       
Smith      Smith       15       Pune  170000
Rana        Rana       99     Mumbai  118560
Jaydev    Jaydev       51    Kolkata  258741
Shikhar  Shikhar       31  Hyderabad  485169
Sanju      Sanju       12  Rajasthan  150000
Raina      Raina       35    Gujarat  250000

Appending a Dataframe column of into index to make it Multi-Index Dataframe :

In above cases the index ‘SL’ is replaced. If we want to keep it we have to pass append argument as True.

import pandas as sc
# List of Tuples
players = [('Smith', 15, 'Pune', 170000),
            ('Rana', 99, 'Mumbai', 118560),
            ('Jaydev', 51, 'Kolkata', 258741),
            ('Shikhar', 31, 'Hyderabad', 485169),
            ('Sanju', 12, 'Rajasthan', 150000),
            ('Raina', 35, 'Gujarat', 250000)
            ]
# Creation DataFrame object
playDFObj = sc.DataFrame(players, columns=['Name', 'JerseyN', 'Team', 'Salary'])
playDFObj.index.rename('SL', inplace=True)
# Making a mulit-index dataframe
modifplayDF = playDFObj.set_index('Name', append=True)
print('Modified Dataframe of players:')
print(modifplayDF)
Output :
Modified Dataframe of players:
JerseyN       Team  Salary
SL Name                              
0  Smith         15       Pune  170000
1  Rana          99     Mumbai  118560
2  Jaydev        51    Kolkata  258741
3  Shikhar       31  Hyderabad  485169
4  Sanju         12  Rajasthan  150000
5  Raina         35    Gujarat  250000

Checking for duplicates in the new index :

If we wanted to check index doesn’t contain any duplicate values after converting a column to the index by passing verify_integrity as True in set_index(). If any duplicate value found error will be raised.

import pandas as sc
# List of Tuples
players = [('Smith', 15, 'Pune', 170000),
            ('Rana', 99, 'Mumbai', 118560),
            ('Jaydev', 51, 'Kolkata', 258741),
            ('Shikhar', 31, 'Mumbai', 485169),
            ('Sanju', 12, 'Rajasthan', 150000),
            ('Raina', 35, 'Gujarat', 250000)
            ]
# Creation of DataFrame object
playDFObj = sc.DataFrame(players, columns=['Name', 'JerseyN', 'Team', 'Salary'])
# Rename index of dataframe as 'SL'
playDFObj.index.rename('SL', inplace=True)
modifplayDF = playDFObj.set_index('Team', verify_integrity=True)
print(modifplayDF)
Output :
ValueError: Index has duplicate keys

Modifying existing Dataframe by converting into index :

 We can also make changes in existing dataframe. We can implement this by assign two methods-

  1. Assign the returned dataframe object to original dataframe variable where the variable would point to updated dataframe.
  2. Passing argument inplace as True.
import pandas as sc
# List of Tuples
players = [('Smith', 15, 'Pune', 170000),
            ('Rana', 99, 'Mumbai', 118560),
            ('Jaydev', 51, 'Kolkata', 258741),
            ('Shikhar', 31, 'Hyderabad', 485169),
            ('Sanju', 12, 'Rajasthan', 150000),
            ('Raina', 35, 'Gujarat', 250000)
            ]
# Creation DataFrame object
playDFObj = sc.DataFrame(players, columns=['Name', 'JerseyN', 'Team', 'Salary'])
playDFObj.index.rename('SL', inplace=True)
playDFObj.set_index('Name', inplace=True)
print('Contenets of original dataframe :')
print(playDFObj)
Output :
Contenets of original dataframe :
JerseyN Team Salary
Name 
Smith 15 Pune 170000
Rana 99 Mumbai 118560
Jaydev 51 Kolkata 258741
Shikhar 31 Hyderabad 485169
Sanju 12 Rajasthan 150000
Raina 35 Gujarat 250000

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas – Modify a Dataframe

Pandas : Convert Dataframe column into an index using set_index() in Python Read More »

Pandas Dataframe: Get minimum values in rows or columns & their index position

Get minimum values in rows or columns & their index position

In this article we will learn to find minimum values in the rows & columns of a Dataframe and also get index position of minimum values.

DataFrame.min() :

A member function is provided by Python’s Pandas library i.e. DataFrame.min()  which can find the minimum value in a dataframe.

Syntax:- DataFrame.min(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Some Arguments:

  1. axis- It is the axis along which minimum elements is to be searched. It is index 0 for along the rows and index 1 for along the columns.
  2. skipna- (bool) It will skip NaN or Null. It’s default is True i.e. it will be skipped if not provided.

Now, we will see the implementation of these one by one.

Get minimum values in every row & column of the Dataframe :

Get minimum values of every column :

To find the minimum element in every column of dataframe we have to only call min() member function without any argument with the DataFrame object. It will return a series with column name as index label and minimum value of each column in values.

import pandas as sc
import numpy as dc
# List of Tuples
matrix = [(10, 20, 15),
          (35, dc.NaN, 21),
          (18, 58, 65),
          (11, 52, dc.NaN),
          (98, 34, 99)
          ]
# Creation of DataFrame object
datafObj = sc.DataFrame(matrix, index=list('12345'), columns=list('abc'))
print('Original Dataframe :')
print(datafObj)
# Get a series that contains minimum values in each column of dataframe
minValues = datafObj.min()
print('Minimum value in each column of dataframe are : ')
print(minValues)
Output :
Original Dataframe :
a     b     c
1  10  20.0  15.0
2  35   NaN  21.0
3  18  58.0  65.0
4  11  52.0   NaN
5  98  34.0  99.0
Minimum value in each column of dataframe are :
a    10.0
b    20.0
c    15.0
dtype: float64

Get minimum values of every row :

To find the minimum values in each row in DataFrame we have to call min() member function and pass argument axis=1 with DataFrame object.

import pandas as sc
import numpy as dc
# List of Tuples
matrix = [(10, 20, 15),
          (35, dc.NaN, 21),
          (18, 58, 65),
          (11, 52, dc.NaN),
          (98, 34, 99)
          ]
# Creation of DataFrame object
datafObj = sc.DataFrame(matrix, index=list('12345'), columns=list('abc'))
print('Original Dataframe :')
print(datafObj)
# Get a series that contains minimum element in each rows of dataframe
minValues = datafObj.min(axis=1)
print('Minimum value in each row of dataframe are : ')
print(minValues)
Output :
Original Dataframe :
a     b     c
1  10  20.0  15.0
2  35   NaN  21.0
3  18  58.0  65.0
4  11  52.0   NaN
5  98  34.0  99.0
Minimum value in each row of dataframe are :
1    10.0
2    21.0
3    18.0
4    11.0
5    34.0
dtype: float64

In above cases we saw that it has skipped NaN, if we want we can also include NaN.

Get minimum values of every column without skipping NaN :

To get minimum value of every column without skipping NaN we have to pass skipna=False.

import pandas as sc
import numpy as dc
# List of Tuples
matrix = [(10, 20, 15),
          (35, dc.NaN, 21),
          (18, 58, 65),
          (11, 52, dc.NaN),
          (98, 34, 99)
          ]
# Creation of DataFrame object
datafObj = sc.DataFrame(matrix, index=list('12345'), columns=list('abc'))
# Get a series that contains minimum elements in each column including NaN
minValues = datafObj.min(skipna=False)
print('Minimum value in each column including NaN of dataframe are : ')
print(minValues)
Output :
Minimum value in each column including NaN of dataframe are :
a    10.0
b     NaN
c     NaN
dtype: float64

Get minimum values of a single column or selected columns :

We can get minimum value of single column by calling min() member function by selecting that single column from given dataframe.

import pandas as sc
import numpy as dc
# List of Tuples
matrix = [(10, 20, 15),
          (35, dc.NaN, 21),
          (18, 58, 65),
          (11, 52, dc.NaN),
          (98, 34, 99)
          ]
# Creation of DataFrame object
datafObj = sc.DataFrame(matrix, index=list('12345'), columns=list('abc'))
# Get minimum element of  a single column 'y'
minValues = datafObj['c'].min()
print("minimum value in column 'c' is : " , minValues)
Output :
minimum value in column 'c' is :  15.0

We can also get minimum value of selected columns by passing list of those columns.

import pandas as sc
import numpy as dc
# List of Tuples
matrix = [(10, 20, 15),
          (35, dc.NaN, 21),
          (18, 58, 65),
          (11, 52, dc.NaN),
          (98, 34, 99)
          ]
# Creation of DataFrame object
datafObj = sc.DataFrame(matrix, index=list('12345'), columns=list('abc'))
# Get minimum value of a 'a' & 'b' columns of dataframe
minValues = datafObj[['a', 'b']].min()
print("minimum value in column 'a' & 'b' are : ")
print(minValues)
Output :
minimum value in column 'a' & 'b' are :
a    10.0
b    20.0
dtype: float64

Get row index label or position of minimum values of every column :

DataFrame.idxmin() :

We can also get the position of minimum value of DataFrame using pandas library function i.e. idxmin().

Syntax:- DataFrame.idxmin(axis=0, skipna=True)

Get row index label of minimum value in every column :

We can create a series which contains column names as index and row index labels where minimum element is found.

import pandas as sc
import numpy as dc
# List of Tuples
matrix = [(10, 20, 15),
          (35, dc.NaN, 21),
          (18, 58, 65),
          (11, 52, dc.NaN),
          (98, 34, 99)
          ]
# Creation of DataFrame object
datafObj = sc.DataFrame(matrix, index=list('12345'), columns=list('abc'))
# Get the index position of minimum values in every column of dataframe
minValuesIndex = datafObj.idxmin()
print("min values of columns are at row index position :")
print(minValuesIndex)
Output :
min values of columns are at row index position :
a    1
b    1
c    1
dtype: object

Get Column names of minimum value in every row

We can also create a series which contains row index labels as index and column names as values where each row has minimum value.

import pandas as sc
import numpy as dc
# List of Tuples
matrix = [(10, 20, 15),
          (35, dc.NaN, 21),
          (18, 58, 65),
          (11, 52, dc.NaN),
          (98, 34, 99)
          ]
# Creation of DataFrame object
datafObj = sc.DataFrame(matrix, index=list('12345'), columns=list('abc'))
# Get minimum value of elements in row at respective column
minValuesIndex = datafObj.idxmin(axis=1)
print(" Minimum value in row at respective column of dataframe :")
print(minValuesIndex)
Output :
Minimum value in row at respective column of dataframe :
1    a
2    c
3    a
4    a
5    b
dtype: object

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas – Select items from a Dataframe

Pandas Dataframe: Get minimum values in rows or columns & their index position Read More »

Python : How to make a class Iterable & create Iterator Class for it ?

How to make a class Iterable & create Iterator Class for it ?

We are going to see how we can make a class iterable and also creating an iterator class for it.

What’s the need to make a Custom class Iterable ?

User defined classes are not iterable by default. To make the class objects iterable we have to make the class iterable and also create an iterator class for it.

Let’s try to iterate the class with a for loop

class School:
    """
    Contains List of Junior and senior school students
    """

    def __init__(self):
        self._juniorStudents = list()
        self._seniorStudents = list()

    def addJuniorStudents(self, members):
        self._juniorStudents += members

    def addSeniorStudents(self, members):
        self._seniorStudents += members


def main():
    # Create school class object
    school = School()
    # Add name of junior school students
    school.addJuniorStudents(["Seema", "Jenny", "Morris"
])
    # Add name of senior school students
    school.addSeniorStudents(["Ritika", "Ronnie", "Aaditya"
])
    iter(school)
        for member in school:
        print(member)
# The iter() will throw an error saying school is not iterable

The above code will throw an error as the School class is not iterable yet.

How to make your Custom Class Iterable | The Iterator Protocol :

In order to make the class iterable, we need to override the iter( ) function inside the class so that the function returns the object of the iterator class which is associated to the iterable class.

class SchoolIterator:
    """Iterator class"""

    def __init__(self, school):
        # School object reference
        self._school = school
        # index variable to keep track
        self._index = 0

    def __next__(self):
        """'Returns the next value from school object's lists"""
        if self._index < (
            len(self._school._juniorStudents) + len(self._school._seniorStudents)
        ):
            if self._index < len(
                self._school._juniorStudents
            ):  # Check if junior student members are fully iterated or not
                result = (self._school._juniorStudents[self._index], "junior")
            else:
                result = (
                    self._school._seniorStudents[
                        self._index - len(self._school._juniorStudents)
                    ],
                    "senior",
                )
            self._index += 1
            return result
        # Iteration ends
        raise StopIteration


class School:
    """
    Contains List of Junior and senior school students
    """

    def __init__(self):
        self._juniorStudents = list()
        self._seniorStudents = list()

    def addJuniorStudents(self, members):
        self._juniorStudents += members

    def addSeniorStudents(self, members):
        self._seniorStudents += members

    def __iter__(self):
        """Returns Iterator object"""
        return SchoolIterator(self)


def main():
    # Create school class object
    school = School()
    # Add name of junior school students
    school.addJuniorStudents(["Seema", "Jenny", "Morris"
])
    # Add name of senior school students
    school.addSeniorStudents(["Ritika", "Ronnie", "Aaditya"
])
    iterator = iter(school)
    print(iterator)

main()
Output :
<__main__.SchoolIterator object at 0x01BC6B10>

The __iter__( ) has been overridden in the School class which now returns the object from the schoolIterator class. And when we call iter( ) function on the school class it will call __iter__( ) function on the object.

How to create an Iterator Class :

In order to create an iterator class, we have to override the __next__( ) function so that every time we call a function, it should return the next iterable class until there are no elements. If there are no next elements, then it should raise the StopIteration.

After that we need to make the class object return the next element from the School class Object’s data member

CODE:

class SchoolIterator:
    """Iterator class"""

    def __init__(self, school):
        # School object reference
        self._school = school
        # index variable to keep track
        self._index = 0

    def __next__(self):
        """'Returns the next value from school object's lists"""
        if self._index < (
            len(self._school._juniorStudents) + len(self._school._seniorStudents)
        ):
            if self._index < len(
                self._school._juniorStudents
            ):  # Check if junior student members are fully iterated or not
                result = (self._school._juniorStudents[self._index], "junior")
            else:
                result = (
                    self._school._seniorStudents[
                        self._index - len(self._school._juniorStudents)
                    ],
                    "senior",
                )
            self._index += 1
            return result
        # Iteration ends
        raise StopIteration


class School:
    """
    Contains List of Junior and senior school students
    """

    def __init__(self):
        self._juniorStudents = list()
        self._seniorStudents = list()

    def addJuniorStudents(self, members):
        self._juniorStudents += members

    def addSeniorStudents(self, members):
        self._seniorStudents += members

    def __iter__(self):
        """Returns Iterator object"""
        return SchoolIterator(self)


def main():
    # Create school class object
    school = School()
    # Add name of junior school students
    school.addJuniorStudents(["Seema", "Jenny", "Morris"])
    # Add name of senior school students
    school.addSeniorStudents(["Ritika", "Ronnie", "Aaditya"])
    iterator = iter(school)
    print(iterator)
    while True:
        try:
            # Get next element from SchoolIterator object using iterator object
            elem = next(iterator)
            # Print the element
            print(elem)
        except StopIteration:
            break


main()
Output :
<__main__.SchoolIterator object at 0x01A96B10>
('Seema', 'junior')
('Jenny', 'junior')
('Morris', 'junior')
('Ritika', 'senior')
('Ronnie', 'senior')
('Aaditya', 'senior')

The Working

The iter( ) function calls the overridden __iter__( ) function on the school objects, which would return, the SchoolIterator object. Upon calling the next( ) function, it would call our overridden function  __next__( ) internally. The _index variable is being used here to keep track of the iterated elements. So every time we call the function it iterates the objects and in the need it raises the StopIteration.

class SchoolIterator:
    """Iterator class"""

    def __init__(self, school):
        # School object reference
        self._school = school
        # index variable to keep track
        self._index = 0

    def __next__(self):
        """'Returns the next value from school object's lists"""
        if self._index < (
            len(self._school._juniorStudents) + len(self._school._seniorStudents)
        ):
            if self._index < len(
                self._school._juniorStudents
            ):  # Check if junior student members are fully iterated or not
                result = (self._school._juniorStudents[self._index], "junior")
            else:
                result = (
                    self._school._seniorStudents[
                        self._index - len(self._school._juniorStudents)
                    ],
                    "senior",
                )
            self._index += 1
            return result
        # Iteration ends
        raise StopIteration


class School:
    """
    Contains List of Junior and senior school students
    """

    def __init__(self):
        self._juniorStudents = list()
        self._seniorStudents = list()

    def addJuniorStudents(self, members):
        self._juniorStudents += members

    def addSeniorStudents(self, members):
        self._seniorStudents += members

    def __iter__(self):
        """Returns Iterator object"""
        return SchoolIterator(self)


def main():
    # Create school class object
    school = School()
    # Add name of junior school students
    school.addJuniorStudents(["Seema", "Jenny", "Morris"])
    # Add name of senior school students
    school.addSeniorStudents(["Ritika", "Ronnie", "Aaditya"])
    iterator = iter(school)
#Using for loop
    print(iterator)
    for member in school:
        print(member)


main()
Output :

<__main__.SchoolIterator object at 0x7fefbe6f34f0>
('Seema', 'junior')
('Jenny', 'junior')
('Morris', 'junior')
('Ritika', 'senior')
('Ronnie', 'senior')
('Aaditya', 'senior')
>

Python : How to make a class Iterable & create Iterator Class for it ? Read More »

numpy.zeros() & numpy.ones() | Create a numpy array of zeros or ones

Create a 1D / 2D Numpy Arrays of zeros or ones

We are going to how we can create variou types of numpy arrays.

numpy.zeros( ) :

The numpy module in python makes it able to create a numpy array all initialized with 0’s.

Syntax: numpy.zeros(shape, dtype=, order=)

Parameters :

  1. shape : The shape of the numpy array.(single Int or a sequence)
  2. dtype : It is an optional parameter that takes the data type of elements.(default value is float32)
  3. order : It is also an optional parameter which defines the order in which the array will be stored(‘C’ for column-major which is also the default value and ‘F’ for row-major)

Flattened numpy array filled with all zeros :

Below code is the implementation for that.

import numpy as np

#Creating a numpy array containing all 0's
zeroArray = np.zeros(5)
print("The array contents are : ", zeroArray)
Output : 
The array contents are :  [0. 0. 0. 0. 0.]

Creating a 2D numpy array with 5 rows & 6 columns, filled with 0’s :

To create an array with 5 rows and 6 columns filled with all 0’s, we need to pass 5 and 6 as parameters into the function.

Below code is the implementation for that.

import numpy as np

# Creating a 5X6 numpy array containing all 0's
zeroArray = np.zeros((5, 6))
print("The array contents are : ", zeroArray)
Output :
The array contents are :  [[0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]]

It created a zero numpy array of 5X6 size for us.

numpy.ones( ) :

Just like the numpy.zeros( ), numpy.ones( ) is used to initialize the array elements to 1. It has same syntax.

Syntax - numpy.ones(shape, dtype=float, order='C')

Creating a flattened numpy array filled with all Ones :

Below code is the implementation for that.

import numpy as np

# Creating a numpy array containing all 1's
oneArray = np.ones(5)
print("The array contents are : ", oneArray)
Output :
The array contents are :  [1. 1. 1. 1. 1.]

Creating a 2D numpy array with 3 rows & 4 columns, filled with 1’s :

To create a 2D numpy array with 3 rows and 4 columns filled with 1’s, we have to pass (3,4) into the function.

Below code is the implementation for that.

import numpy as np

# Creating a 3X4 numpy array containing all 1's
oneArray = np.ones((3, 4))
print("The array contents are : ", oneArray)
print("Data Type of elements in  Array : ", oneArray.dtype)
Output :
The array contents are :  [[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]
Data Type of elements in  Array :  float64

Let’s see how we can set the datatype to integer.

import numpy as np

# Creating a 3X4 numpy array containing all 1's int64 datatype
oneArray = np.ones((3, 4), dtype=np.int64)
print("The array contents are : ", oneArray)
print("Data Type of elements in  Array : ", oneArray.dtype)
Output :
The array contents are :  [[1 1 1 1]
 [1 1 1 1]
 [1 1 1 1]]
Data Type of elements in  Array :  int64

 

numpy.zeros() & numpy.ones() | Create a numpy array of zeros or ones Read More »

Pandas: Sum rows in Dataframe ( all or certain rows)

Sum rows in Dataframe ( all or certain rows) in Python

In this article we will discuss how we can merge rows into a dataframe and add values ​​as a new queue to the same dataframe.

So, let’s start exploring the topic.

First, we will build a Dataframe,

import pandas as pd
import numpy as np
# The List of Tuples
salary_of_employees = [('Amit', 2000, 2050, 1099, 2134, 2111),
                    ('Rabi', 2122, 3022, 3456, 3111, 2109),
                    ('Abhi', np.NaN, 2334, 2077, np.NaN, 3122),
                    ('Naresh', 3050, 3050, 2010, 2122, 1111),
                    ('Suman', 2023, 2232, 3050, 2123, 1099),
                    ('Viroj', 2050, 2510, np.NaN, 3012, 2122),
                    ('Nabin', 4000, 2000, 2050, np.NaN, 2111)]
# By Creating a DataFrame object from list of tuples
test = pd.DataFrame(salary_of_employees,
                  columns=['Name',  'Jan', 'Feb', 'March', 'April', 'May'])
# To Set column Name as the index of dataframe
test.set_index('Name', inplace=True)
print(test)
Output :
             Jan           Feb        March    April         May
Name 
Amit     2000.0     2050       1099.0    2134.0     2111
Rabi      2122.0    3022       3456.0     3111.0    2109
Abhi      NaN       2334       2077.0     NaN       3122
Naresh  3050.0     3050      2010.0     2122.0    1111
Suman  2023.0     2232      3050.0     2123.0    1099
Viroj     2050.0     2510       NaN        3012.0    2122
Nabin   4000.0    2000       2050.0     NaN        2111

This Dataframe contains employee salaries from January to May. We’ve created a column name as a data name index. Each line of this dataframe contains the employee’s salary from January to May.

Get the sum of all rows in a Pandas Dataframe :

Let’s say in the above dataframe, we want to get details about the total salary paid each month. Basically, we want a Series that contains the total number of rows and columns eg. each item in the Series should contain a total column value.

Let’s see how we can find that series,

import pandas as pd
import numpy as np
# The List of Tuples
salary_of_employees = [('Amit', 2000, 2050, 1099, 2134, 2111),
                    ('Rabi', 2122, 3022, 3456, 3111, 2109),
                    ('Abhi', np.NaN, 2334, 2077, np.NaN, 3122),
                    ('Naresh', 3050, 3050, 2010, 2122, 1111),
                    ('Suman', 2023, 2232, 3050, 2123, 1099),
                    ('Viroj', 2050, 2510, np.NaN, 3012, 2122),
                    ('Nabin', 4000, 2000, 2050, np.NaN, 2111)]
# By Creating a DataFrame object from list of tuples
test = pd.DataFrame(salary_of_employees,
                  columns=['Name',  'Jan', 'Feb', 'March', 'April', 'May'])
# To Set column Name as the index of dataframe
test.set_index('Name', inplace=True)


#By getting sum of all rows in the Dataframe as a Series
total = test.sum()
print('Total salary paid in each month:')
print(total)
Output :
Total salary paid in each month:
Jan 15245.0
Feb 17198.0
March 13742.0
April 12502.0
May 13785.0
dtype: float64

We have called the sum() function in dataframe without parameter. So, it automatically considered the axis as 0 and added all the columns wisely i.e. added all values ​​to each column and returned a string item containing those values. Each item in this series item contains the total amount paid in monthly installments and the name of the month in the index label for that entry.

We can add this Series as a new line to the dataframe i.e.

import pandas as pd
import numpy as np
# The List of Tuples
salary_of_employees = [('Amit', 2000, 2050, 1099, 2134, 2111),
                    ('Rabi', 2122, 3022, 3456, 3111, 2109),
                    ('Abhi', np.NaN, 2334, 2077, np.NaN, 3122),
                    ('Naresh', 3050, 3050, 2010, 2122, 1111),
                    ('Suman', 2023, 2232, 3050, 2123, 1099),
                    ('Viroj', 2050, 2510, np.NaN, 3012, 2122),
                    ('Nabin', 4000, 2000, 2050, np.NaN, 2111)]
# By Creating a DataFrame object from list of tuples
test = pd.DataFrame(salary_of_employees,
                  columns=['Name',  'Jan', 'Feb', 'March', 'April', 'May'])
# To Set column Name as the index of dataframe
test.set_index('Name', inplace=True)




# By getting sum of all rows as a new row in Dataframe
total = test.sum()
total.name = 'Total'
# By assignimg sum of all rows of DataFrame as a new Row
test = test.append(total.transpose())
print(test)
Output :
                 Jan        Feb         March         April          May
Name 
Amit        2000.0   2050.0    1099.0        2134.0     2111.0
Rabi        2122.0    3022.0    3456.0       3111.0     2109.0
Abhi       NaN       2334.0     2077.0       NaN        3122.0
Naresh   3050.0    3050.0     2010.0       2122.0    1111.0
Suman    2023.0   2232.0     3050.0       2123.0    1099.0
Viroj      2050.0     2510.0     NaN          3012.0     2122.0
Nabin    4000.0    2000.0     2050.0       NaN         2111.0
Total     15245.0   17198.0   13742.0    12502.0    13785.0

Added a new line to the dataframe and ‘Total’ reference label. Each entry in this line contains the amount of details paid per month.

How did it work?

We have passed the Series to create a Dataframe in one line. All references in the series became columns in the new dataframe. Then add this new data name to the original dataframe. The result was that I added a new line to the dataframe.

Get Sum of certain rows in Dataframe by row numbers :

In the previous example we added all the rows of data but what if we want to get a total of only a few rows of data? As with the data above we want the total value in the top 3 lines eg to get the total monthly salary for only 3 employees from the top,

import pandas as pd
import numpy as np
# The List of Tuples
salary_of_employees = [('Amit', 2000, 2050, 1099, 2134, 2111),
                    ('Rabi', 2122, 3022, 3456, 3111, 2109),
                    ('Abhi', np.NaN, 2334, 2077, np.NaN, 3122),
                    ('Naresh', 3050, 3050, 2010, 2122, 1111),
                    ('Suman', 2023, 2232, 3050, 2123, 1099),
                    ('Viroj', 2050, 2510, np.NaN, 3012, 2122),
                    ('Nabin', 4000, 2000, 2050, np.NaN, 2111)]
# By Creating a DataFrame object from list of tuples
test = pd.DataFrame(salary_of_employees,
                  columns=['Name',  'Jan', 'Feb', 'March', 'April', 'May'])
# To Set column Name as the index of dataframe
test.set_index('Name', inplace=True)


#By getting sum of values of top 3 DataFrame rows,
sumtabOf = test.iloc[0:3].sum()
print(sumtabOf)
Output :
Jan      4122.0
Feb      7406.0
March    6632.0
April    5245.0
May      7342.0
dtype: float64

We selected the first 3 lines of the data file and called the total () for that. Returns a series containing the total monthly salary paid to selected employees only which means for the first three lines of the actual data list.

Get the sum of specific rows in Pandas Dataframe by index/row label :

Unlike the previous example, we can select specific lines with the reference label and find the value of values ​​in those selected lines only i.e.

import pandas as pd
import numpy as np
# The List of Tuples
salary_of_employees = [('Amit', 2000, 2050, 1099, 2134, 2111),
                    ('Rabi', 2122, 3022, 3456, 3111, 2109),
                    ('Abhi', np.NaN, 2334, 2077, np.NaN, 3122),
                    ('Naresh', 3050, 3050, 2010, 2122, 1111),
                    ('Suman', 2023, 2232, 3050, 2123, 1099),
                    ('Viroj', 2050, 2510, np.NaN, 3012, 2122),
                    ('Nabin', 4000, 2000, 2050, np.NaN, 2111)]
# By Creating a DataFrame object from list of tuples
test = pd.DataFrame(salary_of_employees,
                  columns=['Name',  'Jan', 'Feb', 'March', 'April', 'May'])
# To Set column Name as the index of dataframe
test.set_index('Name', inplace=True)


# By getting sum of 3 DataFrame rows (selected by index labels)
sumtabOf = test.loc[['Amit', 'Naresh', 'Viroj']].sum()
print(sumtabOf)
Output :
Jan      7100.0
Feb      7610.0
March    3109.0
April    7268.0
May      5344.0
dtype: float64

We have selected 3 lines of data name with the reference label namely ‘Amit’, ‘Naresh’ and ‘Viroj’. We then added the queue values ​​for these selected employees only. Return a series with the total amount of salary paid per month to those selected employees per month only wisely.

Conclusion:

So in the above cases we found out that to sum the multiple rows given in a dataframe.

Pandas: Sum rows in Dataframe ( all or certain rows) Read More »