Author name: Satyabrata Jena

Python Pandas : Replace or change Column & Row index names in DataFrame

Replacing or changing Column & Row index names in DataFrame

In this article we will discuss

  • How to change column names or
  • Row Index names in the DataFrame object.

First, create an object with a database name for student records i.e.

import pandas as pd
students_record = [ ('Amit', 27, 'Kolkata') ,
                    ('Mini', 24, 'Chennai' ) ,
                    ('Nira', 34, 'Mumbai') ]
# By creating a DataFrame object
do = pd.DataFrame(students_record, columns = ['Name' , 'Age', 'City'], index=['x', 'y', 'z']) 
print(do)
Output :
   Name   Age     City
x  Amit     27    Kolkata
y  Mini     24    Chennai
z  Nira     34     Mumbai

Change Column Names in DataFrame :

The DataFrame item contains Attribute columns which is the Index item and contains the Data Labels in the Dataframe column. We can find column name detection in this Index item i.e.

import pandas as pd
students_record = [ ('Amit', 27, 'Kolkata') ,
                    ('Mini', 24, 'Chennai' ) ,
                    ('Nira', 34, 'Mumbai') ]
# By creating a DataFrame object
do = pd.DataFrame(students_record, columns = ['Name' , 'Age', 'City'], index=['x', 'y', 'z']) 


# By getting ndArray of all column names 
column_Name_Arr = do.columns.values
print(column_Name_Arr)
Output :
['Name' 'Age' 'City']

Any modifications to this ndArray (df.column.values) will change the actual DataFrame. For example let’s change the column name to index 0 i.e.

import pandas as pd
students_record = [ ('Amit', 27, 'Kolkata') ,
                    ('Mini', 24, 'Chennai' ) ,
                    ('Nira', 34, 'Mumbai') ]
# By creating a DataFrame object
do = pd.DataFrame(students_record, columns = ['Name' , 'Age', 'City'], index=['x', 'y', 'z']) 


# By getting ndArray of all column names 
column_Name_Arr = do.columns.values
# By Modifying a Column Name
column_Name_Arr[0] = 'Name_Vr'
print(column_Name_Arr)
Output :
['Name_Vr' 'Age' 'City']

Change Row Index in DataFrame

The content of the data items is as follows,

To get a list of all the line references names from the dataFrame object, there use the attribute index instead of columns i.e. df.index.values

It returns the ndarray of all line references to the data file. Any modifications to this ndArray (df.index.values) will modify the actual DataFrame. For example let’s change the name of the line indicator to 0 i.e.replace with ‘j’.

This change will be reflected in the linked DataFrame object again. Now the content of the DataFrame object is,

But if we change it to the list before changing the changes it will not be visible in the original DataFrame object. For example create a list of copies of Row Index Names of DataFrame i.e.

The whole activities of the program is given below.

import pandas as pd
students_record = [ ('Amit', 27, 'Kolkata') ,
                    ('Mini', 24, 'Chennai' ) ,
                    ('Nira', 34, 'Mumbai') ]
# By creating a DataFrame object
do = pd.DataFrame(students_record, columns = ['Name' , 'Age', 'City'], index=['x', 'y', 'z']) 


# For getting a list of all the column names 
index_Name_Arr = do.index.values
print(index_Name_Arr)


#For Modifying a Row Index Name
index_Name_Arr[0] = 'j'
print(index_Name_Arr)


#For getting a copy list of all the column names 
index_Names = list(do.index.values)
print(index_Names)
print(do)
Output :
['x' 'y' 'z']
['j' 'y' 'z']
['j', 'y', 'z']
  Name  Age     City
j  Amit   27  Kolkata
y  Mini   24  Chennai
z  Nira   34   Mumbai

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas – Modify a Dataframe

Python Pandas : Replace or change Column & Row index names in DataFrame Read More »

6 Ways to check if all values in Numpy Array are zero (in both 1D & 2D arrays) – Python

Check if all values in Numpy Array are zero (in both 1D & 2D arrays) in Python

In this article we will discuss about different ways to  check if all values in a numpy array are 0 i.e in both 1D and 2D arrays.

So let’s start exploring the topic.

Method 1: Using numpy.all() to check if a 1D Numpy array contains only 0 :

In this method we will check that each element of the array will be compared with the particular element i.e zero. And a a result it will return a bool array containing True or False.

# program :

import numpy as np

# 1D numpy array created from a list
arr = np.array([0, 0, 0, 0, 0, 0])

# Checking if all elements in array are zero
check_zero = np.all((arr == 0))
if check_zero:
    print('All the elements of array are zero')
else:
    print('All the elements of the array are not zero')
Output :
All the elements of array are zero

Method 2: Using numpy.any() to check if a 1D Numpy array contains only 0 :

We can use the numpy.any() function to check that if the array contains only zeros by looking for any non zero value. All the elements received by numpy.any() function gets typecast to bool values i.e. 0 to False and others as True. So if all the values in the array are zero then it will return False and then by using not we can confirm our array contains only zeros or not.

# program :

import numpy as np

# 1D numpy array created from a list
arr = np.array([0, 0, 0, 0, 0, 0])

# Checking if all elements in array are zero
check_zero = not np.any(arr)
if check_zero:
    print('All the elements of array are zero')
else:
    print('All the elements of the array are not zero')
Output : 
All the elements of array are zero

Method 3: Using numpy.count_nonzero() to check if a 1D Numpy array contains only 0 :

numpy.count_nonzero()  function returns a count of non-zero values in the array. So by using it we can check if the array contains any non zero value or not. If any non zero element found in the array then all the elements of the array are not zero and if no non zero value found then then all the elements in the array are zero.

# program :

import numpy as np

# 1D numpy array created from a list
arr = np.array([0, 0, 0, 0, 0, 0])

# Checking if all elements in array are zero
# Count non zero items in array
count_non_zeros = np.count_nonzero(arr)
if count_non_zeros==0:
    print('All the elements of array are zero')
else:
    print('All the elements of the array are not zero')
Output :
All the elements of array are zero

Method 4: Using for loop to check if a 1D Numpy array contains only 0 :

By iterating over all the elements of the array we also check that array contains only zeros or not.

# program :

import numpy as np

# 1D numpy array created from a list
arr = np.array([0, 0, 0, 0, 0, 0])

# Checking if all elements in array are zero
def check_zero(arr):
    # iterating of the array
    # and checking if any element is not equal to zero
    for elem in arr:
        if elem != 0:
            return False
    return True
result = check_zero(arr)

if result:
    print('All the elements of array are zero')
else:
    print('All the elements of the array are not zero')
Output :
All the elements of array are zero

Method 5: Using List Comprehension to check if a 1D Numpy array contains only 0 :

By using List Comprehension also we can iterate over each element in the numpy array and then we can create a list of values which are non zero.And if the list contains any element then we can confirm all the values of the numpy array were not zero.

# program :

import numpy as np

# 1D numpy array created from a list
arr = np.array([0, 0, 0, 0, 0, 0])

# Iterating over each element of array 
# And create a list of non zero items from array
result = len([elem for elem in arr if elem != 0])
# from this we can knoew that if our list contains no element then the array contains all zero values.

if result==0:
    print('All the elements of array are zero')
else:
    print('All the elements of the array are not zero')
Output :
All the elements of array are zero

Method 6: Using min() and max() to check if a 1D Numpy array contains only 0 :

If the minimum and maximum value in the array are same and i.e zero then we can confirm the array contains only zeros.

# program :

import numpy as np

# 1D numpy array created from a list
arr = np.array([0, 0, 0, 0, 0, 0])

if arr.min() == 0 and arr.max() == 0:
    print('All the elements of array are zero')
else:
    print('All the elements of the array are not zero')
Output :
All the elements of array are zero

Check if all elements in a 2D numpy array or matrix are zero :

Using the first technique that is by using numpy.all() function we can check the 2D array contains only zeros or not.

# program :

import numpy as np

# 2D numpy array created 
arr_2d = np.array([[0, 0, 0],
                   [0, 0, 0],
                   [0, 0, 0]])

# Checking if all 2D numpy array contains only 0
result = np.all((arr_2d == 0))

if result:
    print('All elemnets of the 2D array are zero')
else:
    print('All elemnets of the 2D array are not zero')
Output : 
All the elements of the 2D array are zero

6 Ways to check if all values in Numpy Array are zero (in both 1D & 2D arrays) – Python Read More »

Pandas: Create Series from dictionary in python

Creating Series from dictionary in python

In this article we will discuss about different ways to convert a dictionary in python to a Pandas Series object.

Series class provides a constructor in Pandas i.e

Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)

Where,

  • data : It represents array-like, Iterable sequence where all items in this iterable sequence will be added as values in the Series.
  • index : It represents array-like, Iterable sequence where all values in this iterable sequence will be added as indices in the Series.
  • dtype : It represents datatype of the output series.

Create a Pandas Series from dict in python :

By passing the dictionary to the Series class Constructor i.e. Series(). we can get a new Series object where all the keys in the dictionary will become the indices of the Series object, and all the values from the key-value pairs in the dictionary will converted into the values of the Series object.

So, let’s see the example.

# Program :

import pandas as pd
# Dictionary 
dict = {
    'C': 56,
    "A": 23,
    'D': 43,
    'E': 78,
    'B': 11
}
# Converting a dictionary to a Pandas Series object.
# Where dictionary keys will be converted into index of Series &
# values of dictionar will become values in Series.
series_object = pd.Series(dict)
print('Contents of Pandas Series: ')
print(series_object)
Output :
Contents of Pandas Series: 
C  56
A  23
D  43
E  78
B  11
dtype: int64

Where the index of the series object contains the keys of the dictionary and the values of the series object contains the values of the dictionary.

Create Pandas series object from a dictionary with index in a specific order :

In the above example we observed the indices of the series object are in the same order as the keys of the dictionary. In this example we will see how to convert the dictionary into series object with some other order.

So, let’s see the example.

# Program :

import pandas as pd
# Dictionary 
dict = {
    'C': 6,
    "A": 3,
    'D': 4,
    'E': 8,
    'B': 1
}
# Creating Series from dict, but pass the index list separately
# Where dictionary keys will be converted into index of Series &
# values of dictionar will become values in Series.
# But the order of indices will be some other order
series_object = pd.Series(dict,
                       index=['E', 'D', 'C', 'B', 'A'])
print('Contents of Pandas Series: ')
print(series_object)
Output :
Contents of Pandas Series: 
E  8
D  4
C  6
B  1
A  3
dtype: int64

Create a Pandas Series object from specific key-value pairs in a dictionary :

In above examples we saw Series object is created from all the items in the dictionary as we pass the dictionary as the only argument in the series constructor. But now we will see how we will see how we can convert specific key-value pairs from dictionary to the Series object.

So, let’s see the example.

# Program :

import pandas as pd
# Dictionary 
dict = {
    'C': 6,
    "A": 3,
    'D': 4,
    'E': 8,
    'B': 1
}
# Creating Series from dict, but pass the index list separately
# Where dictionary keys will be converted into index of Series &
# values of dictionar will become values in Series.
# But here we have passed some specific key-value pairs of dictionary
series_object = pd.Series(dict,
                       index=['E', 'D', 'C'])
print('Contents of Pandas Series: ')
print(series_object)
Output :
Contents of Pandas Series: 
E 8
D 4
C 6
dtype: int64

 

Pandas: Create Series from dictionary in python Read More »

Pandas : Sort a DataFrame based on column names or row index labels using Dataframe.sort_index()

Sorting a DataFrame based on column names or row index labels using Dataframe.sort_index() in Python

In this article we will discuss how we organize the content of data entered based on column names or line reference labels using Dataframe.sort_index ().

Dataframe.sort_index():

In the Python Pandas Library, the Dataframe section provides a member sort sort_index () to edit DataFrame based on label names next to the axis i.e.

DataFrame.sort_index(axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, by=None)

Where,

  • axis :If the axis is 0, then the data name will be sorted based on the line index labels. The default is 0
  • ascending :If the axis is 1, then the data name will be sorted based on column names.
  • inplace : If the type of truth in the rise of another type arrange in order. The default is true
  • na_position :  If True, enable localization in Dataframe. Determines NaN status after filter i.e. irst puts NaNs first, finally puts NaNs at the end.

It returns the edited data object. Also, if the location dispute is untrue then it will return a duplicate copy of the provided data, instead of replacing the original Dataframe. While, if the internal dispute is true it will cause the current file name to be edited.

Let’s understand some examples,

# Program :

import pandas as pd
# List of Tuples
students = [ ('Rama', 31, 'canada') ,
             ('Symon', 23, 'Chennai' ) ,
             ('Arati', 16, 'Maharastra') ,
             ('Bhabani', 32, 'Kolkata' ) ,
             ('Modi', 33, 'Uttarpradesh' ) ,
             ('Heeron', 39, 'Hyderabad' )
              ]
# Create a DataFrame object
dfObj = pd.DataFrame(students, columns=['Name', 'Marks', 'City'], index=['b', 'a', 'f', 'e', 'd', 'c'])
print(dfObj)
Output :
    Name   Marks  City
b  Rama     31   canada
a  Symon   23   Chennai
f   Arati      16   Maharastra
e  Bhabani  32  Kolkata
d  Modi      33  Uttarpradesh
c  Heeron  39   Hyderabad

Now let’s see how we organize this DataFrame based on labels i.e. columns or line reference labels,

Sort rows of a Dataframe based on Row index labels :

Sorting by line index labels we can call sort_index() in the data name item.

import pandas as pd
# The List of Tuples
students = [ ('Rama', 31, 'canada') ,
             ('Symon', 23, 'Chennai' ) ,
             ('Arati', 16, 'Maharastra') ,
             ('Bhabani', 32, 'Kolkata' ) ,
             ('Modi', 33, 'Uttarpradesh' ) ,
             ('Heeron', 39, 'Hyderabad' )
              ]
# To create DataFrame object 
dfObj = pd.DataFrame(students, columns=['Name', 'Marks', 'City'], index=['b', 'a', 'f', 'e', 'd', 'c'])
# By sorting the rows of dataframe based on row index label names
modDFObj = dfObj.sort_index()
print(' Dataframes are in sorted oreder of index value given:')
print(modDFObj)
Output :
Dataframes are in sorted oreder of index value given:
    Name    Marks        City
a Symon      23         Chennai
b Rama        31         canada
c Heeron     39         Hyderabad
d Modi        33         Uttarpradesh
e Bhabani    32         Kolkata
f Arati          16         Maharastra

As we can see in the output lines it is sorted based on the reference labels now. Instead of changing the original name data backed up an edited copy of the dataframe.

Sort rows of a Dataframe in Descending Order based on Row index labels :

Sorting based on line index labels in descending order we need to pass the argument = False in sort_index() function in the data object object.

import pandas as pd
# The List of Tuples
students = [ ('Rama', 31, 'canada') ,
             ('Symon', 23, 'Chennai' ) ,
             ('Arati', 16, 'Maharastra') ,
             ('Bhabani', 32, 'Kolkata' ) ,
             ('Modi', 33, 'Uttarpradesh' ) ,
             ('Heeron', 39, 'Hyderabad' )
              ]
# To create DataFrame object 
dfObj = pd.DataFrame(students, columns=['Name', 'Marks', 'City'], index=['b', 'a', 'f', 'e', 'd', 'c'])
# By sorting the rows of dataframe in descending order based on row index label names
conObj = dfObj.sort_index(ascending=False)
print('The Contents of Dataframe are sorted in descending Order based on Row Index Labels are of :')
print(conObj)
The Contents of Dataframe are sorted in descending Order based on Row Index Labels are of :
     Name       Marks          City
f      Arati          16          Maharastra
e     Bhabani     32          Kolkata
d     Modi        33           Uttarpradesh
c     Heeron     39           Hyderabad
b     Rama       31           canada
a     Symon     23           Chennai

As we can see in the output lines it is sorted by destructive sequence based on the current reference labels. Also, instead of changing the original data name it restored the edited copy of the data.

Sort rows of a Dataframe based on Row index labels in Place :

Filtering a local data name instead of finding the default copy transfer inplace = True in sort_index () function in the data object object to filter the data name with local reference label labels i.e.

import pandas as pd
# The List of Tuples
students = [ ('Rama', 31, 'canada') ,
             ('Symon', 23, 'Chennai' ) ,
             ('Arati', 16, 'Maharastra') ,
             ('Bhabani', 32, 'Kolkata' ) ,
             ('Modi', 33, 'Uttarpradesh' ) ,
             ('Heeron', 39, 'Hyderabad' )
              ]
# To create DataFrame object 
dfObj = pd.DataFrame(students, columns=['Name', 'Marks', 'City'], index=['b', 'a', 'f', 'e', 'd', 'c'])
#By sorting the rows of dataframe in Place based on row index label names
dfObj.sort_index(inplace=True)
print('The Contents of Dataframe are sorted in Place based on Row Index Labels are of :')
print(dfObj)
Output :
The Contents of Dataframe are sorted in Place based on Row Index Labels are of :
     Name     Marks      City
a    Symon     23       Chennai
b     Rama     31        canada
c   Heeron     39     Hyderabad
d     Modi     33     Uttarpradesh
e  Bhabani     32       Kolkata
f    Arati       16       Maharastra

Sort Columns of a Dataframe based on Column Names :

To edit DataFrame based on column names we can say sort_index () in a DataFrame object with an axis= 1 i.e.

import pandas as pd
# The List of Tuples
students = [ ('Rama', 31, 'canada') ,
             ('Symon', 23, 'Chennai' ) ,
             ('Arati', 16, 'Maharastra') ,
             ('Bhabani', 32, 'Kolkata' ) ,
             ('Modi', 33, 'Uttarpradesh' ) ,
             ('Heeron', 39, 'Hyderabad' )
              ]
# To create DataFrame object 
dfObj = pd.DataFrame(students, columns=['Name', 'Marks', 'City'], index=['b', 'a', 'f', 'e', 'd', 'c'])
# By sorting a dataframe based on column names
conObj = dfObj.sort_index(axis=1)
print('The Contents are of Dataframe sorted based on Column Names are in the type :')
print(conObj)

Output :
The Contents are of Dataframe sorted based on Column Names are in the type :
           City          Marks     Name
b        canada         31      Rama
a       Chennai         23     Symon
f      Maharastra     16     Arati
e       Kolkata          32     Bhabani
d  Uttarpradesh     33     Modi
c     Hyderabad     39      Heeron

As we can see, instead of changing the original data name it returns a fixed copy of the data data based on the column names.

Sort Columns of a Dataframe in Descending Order based on Column Names :

By sorting DataFrame based on column names in descending order, we can call sort_index () in the DataFrame item with axis = 1 and ascending = False i.e.

import pandas as pd
# The List of Tuples
students = [ ('Rama', 31, 'canada') ,
             ('Symon', 23, 'Chennai' ) ,
             ('Arati', 16, 'Maharastra') ,
             ('Bhabani', 32, 'Kolkata' ) ,
             ('Modi', 33, 'Uttarpradesh' ) ,
             ('Heeron', 39, 'Hyderabad' )
              ]
# To create DataFrame object 
dfObj = pd.DataFrame(students, columns=['Name', 'Marks', 'City'], index=['b', 'a', 'f', 'e', 'd', 'c'])
# By sorting a dataframe in descending order based on column names
conObj = dfObj.sort_index(ascending=False, axis=1)
print('The Contents of Dataframe sorted in Descending Order based on Column Names are of :')
print(conObj)
Output :
The Contents of Dataframe sorted in Descending Order based on Column Names are of :
Name  Marks          City
b     Rama     31        canada
a    Symon     23       Chennai
f    Arati     16    Maharastra
e  Bhabani     32       Kolkata
d     Modi     33  Uttarpradesh
c   Heeron     39     Hyderabad

Instead of changing the original data name restore the edited copy of the data based on the column names (sorted by order)

Sort Columns of a Dataframe in Place based on Column Names :

Editing a local data name instead of obtaining an approved copy pass input = True and axis = 1 in sort_index () function in the dataframe object to filter the local data name by column names i.e.

import pandas as pd
# The List of Tuples
students = [ ('Rama', 31, 'canada') ,
             ('Symon', 23, 'Chennai' ) ,
             ('Arati', 16, 'Maharastra') ,
             ('Bhabani', 32, 'Kolkata' ) ,
             ('Modi', 33, 'Uttarpradesh' ) ,
             ('Heeron', 39, 'Hyderabad' )
              ]
# To create DataFrame object 
dfObj = pd.DataFrame(students, columns=['Name', 'Marks', 'City'], index=['b', 'a', 'f', 'e', 'd', 'c'])
# By sorting a dataframe in place based on column names
dfObj.sort_index(inplace=True, axis=1)
print('The Contents of Dataframe sorted in Place based on Column Names are of:')
print(dfObj)

Output :
The Contents of Dataframe sorted in Place based on Column Names are of:
City  Marks     Name
b        canada     31     Rama
a       Chennai     23    Symon
f    Maharastra     16    Arati
e       Kolkata     32  Bhabani
d  Uttarpradesh     33     Modi
c     Hyderabad     39   Heeron

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas – Modify a Dataframe

Pandas : Sort a DataFrame based on column names or row index labels using Dataframe.sort_index() Read More »

Python : Get Last Modification date & time of a file. | os.stat() | os.path.getmtime()

Getting the Last Modification date & time of a file. | os.stat() | os.path.getmtime() in Python.

 We will see how we can get the modification date & time of a file and in our desired formats.

Get last modification time of a file using os.stat( ) :

Syntax-os.stat(filePath)

The above function returns os.stat_result that contains stats about the file .

To get the modification time, we have to use the ST_MTIME that will provide us with the modification time in seconds. We will pass it to a function time.ctime( ) which will return the info in a readable manner.

import os
import time
import stat
#Path of the file
fileStats = os.stat ( 'file.txt' )
#Passing the stats so that we get it in a readable manner
modTime = time.ctime ( fileStats [ stat.ST_MTIME ] )
print("Modified Time : ", modTime )
Output :
Modified Time :  Thu May 13 19:02:47 2021

Get last modification time of a file using os.path.getmtime() :

We can also use another one of python’s os module function i.e. os.path.getmtime( )

Syntax-os.path.getmtime( filePath)

The function returns the number of seconds elapsed after the modification of the file .We have to convert it into a proper format.

  • Get last modification time using os.path.getmtime() & time.localtime( ) :

import os
import time
#Path of the file going inside the function that returns the number of seconds elpased since modification
modTimeInSeconds = os.path.getmtime ( 'file.txt' )
#Passing the stats so that we get the modification time in a readable manner
modTime = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(modTimeInSeconds))
print("Modified Time : ", modTime )
Output:
Modified Time :  2021-05-13 19:02:47

The function time.localtime( ) converts the seconds to a strruct_time which when passed into strftime( ) returns the timestamp in readable format.

Also we can set the format in strftime( ) to get only the modification date.

import os
import time
#Path of the file going inside the function that returns the number of seconds elpased since modification
modTimeInSeconds = os.path.getmtime ( 'file.txt' )
#Passing the stats so that we get the modification time in a readable manner
modTime = time.strftime('%d/%m/%Y',time.localtime(modTimeInSeconds))
print("Modified Time : ", modTime )
Output :
Modified Time :  13/05/2021
  • Get last modification time using os.path.getmtime() & datetime.fromtimestamp() :

We can also find the modification time of the file without using time.localtime() using datetime.fromtimestamp().

import os
import time
import datetime
#Path of the file going inside the function that returns the number of seconds elpased since modification
modTimeInSeconds = os.path.getmtime ( 'file.txt' )
#Passing the stats so that we get the modification time in a readable manner
modTime = datetime.datetime.fromtimestamp(modTimeInSeconds).strftime('%Y-%m-%d %H:%M:%S')
print("Modified Time : ", modTime )
Output :
Modified Time :  2021-05-13 19:02:47
  • Get last modification time of a file in UTC Timezone :

To obtain the last modification time, we can use datetime.utcfromtimestamp( )

import os
import time
import datetime
#Path of the file going inside the function that returns the number of seconds elpased since modification
modTimeInSeconds = os.path.getmtime ( 'file.txt' )
#Passing the stats so that we get the modification time in a readable manner
modTime = datetime.datetime.utcfromtimestamp(modTimeInSeconds).strftime('%Y-%m-%d %H:%M:%S')
print("Modified Time : ", modTime )
Output :
Modified Time :  2021-05-13 13:32:47

 

 

Python : Get Last Modification date & time of a file. | os.stat() | os.path.getmtime() Read More »

deque vs vector : What to choose ?

Deque vs Vector

In this article, we are going to see the difference between the STL sequential containers std::deque and std::vector with their appropriate usage.

VECTOR :

  • Vectors are dynamic arrays that can shrink and expand when an element is added.
  • The container handles the memory automatically.
  • The data can be inserted at the middle or at the end.
  • The elements are stored in contiguous storage.

DEQUE :

  • Deques or double-ended queues are sequence containers that shrink and expand from both the ends.
  • Data can be inserted from the start, middle and ends.
  • The data is not stored in contiguous storage locations always.

What’s the difference ?

  1. While vector is like Dynamic array. Deque is the data structure implementation of the double-ended queue.
  2. While in vector insertion and deletion at end has good performance, however insertion and deletion from middle performs badly. But deque provides the same performance like vector insertion and deletion at end for both end and middle. Also has good performance with insertion and deletion at start.
  3. While vectors store elements contiguous which makes it faster to operate at the end faster than deques. The deques are not stored contiguous but are a list of various locations where the elements are stored so overall operations at first, mid and end positions are faster than vectors as it does not have to shift elements to store an element.

Appropriate place to use :

When there are list like operations, where additions and deletion only happen at the bottom, vectors are suitable to use. In case we want to operate on the top position as well it is suitable to use deques for the purpose.

 

deque vs vector : What to choose ? Read More »

Pandas : 4 Ways to check if a DataFrame is empty in Python

How to check if a dataframe is empty in Python ?

In this article we will see different ways to check if a dataframe is empty in python.

Approach-1 : Check whether dataframe is empty using Dataframe.empty :

There is an empty attribute provided by dataframe class in python.

Syntax - Dataframe.empty

If it returns True then the dataframe is empty.

# Program :

import pandas as pd

# empty Dataframe created
dfObj = pd.DataFrame(columns=['Date', 'UserName', 'Action'])

# Checking if Dataframe is empty or not
# using empty attribute
if dfObj.empty == True:
    print('DataFrame is empty')
else:
    print('DataFrame is not empty')
Output :
DataFrame is not empty

Even it contains NaN then also it returns the dataframe is empty.

# Program :

import pandas as pd
import numpy as np

# List of Tuples
students = [(np.NaN, np.NaN, np.NaN),
            (np.NaN, np.NaN, np.NaN),
            (np.NaN, np.NaN, np.NaN)
           ]

# Dataframe object created
dfObj = pd.DataFrame(columns=['Your Name', 'Your Age', 'Your City'])

# Checking if Dataframe is empty or not
# using empty attribute
if dfObj.empty == True:
    print('DataFrame is empty')
else:
    print('DataFrame is not empty')
Output :
DataFrame is empty

Approach-2 : Check if dataframe is empty using Dataframe.shape :

There is an shape attribute provided by dataframe class in python.

Syntax- Dataframe.shape

shape attribute return a tuple containing dimension of dataframe. Like if in the dataframe there is 3 rows and 4 columns then it will return (3,4). If the dataframe is empty then it will return 0 at 0th index.

# Create an empty Dataframe
dfObj = pd.DataFrame(columns=['Date', 'UserName', 'Action'])
# Check if Dataframe is empty using dataframe's shape attribute
if dfObj.shape[0] == 0:
    print('DataFrame is empty')
else:
    print('DataFrame is not empty')
Output :
DataFrame is empty

Approach-3 : Check if dataframe is empty by checking length of index :

Dataframe.index represents indices of Dataframe. If the dataframe is empty then size will be 0.

# Program :

import pandas as pd
import numpy as np

# empty Dataframe object created
dfObj = pd.DataFrame(columns=['Date', 'UserName', 'Action'])
# checking if length of index is 0 or not
if len(dfObj.index.values) == 0:
    print('DataFrame is empty')
else:
    print('DataFrame is not empty')
Output :
DataFrame is empty

Approach-4 : Check if dataframe is empty by using len on Datafarme :

Directly by calling the len() function we can also check the dataframe is empty or not. If the length of dataframe is 0 then it the dataframe is empty.

# Program :

import pandas as pd
import numpy as np

# empty Dataframe object created
dfObj = pd.DataFrame(columns=['Date', 'UserName', 'Action'])

# checking if length of dataframe is 0 or not
# by calling len()
if len(dfObj) == 0:
    print('DataFrame is empty')
else:
    print('DataFrame is not empty')
Output :
DataFrame is not empty

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas – Find Elements in a Dataframe

Pandas : 4 Ways to check if a DataFrame is empty in Python Read More »

Python : How to delete a directory recursively using shutil.rmtree()

How to delete a directory recursively using shutil.rmtree() in Python ?

In this article, we will discuss about how we can delete an empty directory and also all contents of a directory including sub directories using shutil.rmtree().

Delete an empty directory using os.rmdir() :

The os module of python provide a function i.e. os.rmdir(pathOfDir) which can delete an empty directory. It can also give rise to certain errors in some scenarios like:

  • When directory is not empty : OSError: [WinError 145] The directory is not empty:
  • When the given path not pointing to any of the directory : NotADirectoryError: [WinError 267] The directory name is invalid:
  • When the given path has no directory : FileNotFoundError: [WinError 2] The system cannot find the file specified:
#Program :

import os

# Deleting a empty directory using function os.rmdir() and handle the exceptions
try:
 os.rmdir('/somedirec/log2')

except:
 print('Not found directory')
Output :
Not found directory

Delete all files in a directory & sub-directories recursively using shutil.rmtree() :

The shutil module of python provides a function i.e. shutil.rmtree() to delete all the contents present in a directory.

 Syntax : shutil.rmtree(path, ignore_errors=False, onerror=None)
import shutil

dirPath = '/somedirec/logs5/';

# trying to delete all contents of a directory and handle exceptions
try:
 shutil.rmtree(dirPath)

except:
 print('Not found directory')
Output :
Not found directory

Here in this case if found, all contents of directory '/somedir/logs/' will be deleted. If any of the files in directory has read only attributes then user can’t delete the file and exception i.e. PermissionError will be raised therefore we can’t delete any remaining required files.

shutil.rmtree() & ignore_errors :

In previous scenario if we failed to delete any file, then we can’t delete further required files in the directory. Now, we can ignore the errors by passing ignore_errors=True in shutil.rmtree().

It will skip the file which raised the exception and then going forward it can delete all of the remaining files.

Let there be a scenario where we can’t delete a file in log directory, then we can use the following syntax so that this exception can be ignored:

shutil.rmtree(dirPath, ignore_errors=True)

Passing callbacks in shutil.rmtree() with onerror :

Instead of ignoring errors, there may be a case where we have to handle the errors.

Syntax :  shutil.rmtree(path, ignore_errors=False, onerror=None)

In onerror parameter, callback function can be called to handle the errors

shutil.rmtree(dirPath, onerror=handleError)

The callable function passed must be callable like:

def handleError(func, path, exc_info):
   pass

where,

  • function : It is the function which raised exception.
  • path : The path that passed which raises the exception while removal.
  • excinfo : The exception information is returned

Now, if any exception is raised while deleting a file, then callback will handle the error. Then, shutil.rmtree() can continue deleting remaining files.

Now let a case where we want to delete all the contents of a directory '/somediec/logs', and there is somewhere a file in logs directory which shows permission issues, and we can’t delete all. So callback will be passed in oneerror parameter of that file.

In the callback if it accesses issue, then we will change the file permission and then call calling function i.e. rmtree() with path of file. Then the file will be deleted and rmtree() will further delete remaining contents in the directory.

import os
import shutil
import stat

#Error handling function will try to change permission and call calling function again

def handleError(func, path, exc_info):
 print('Handling Error for file ' , path)
 print(exc_info)

# check that if the file is accessing the issue
if not os.access(path, os.W_OK):
 print('helloWorld')
 # trying to change the permission of file
 os.chmod(path, stat.S_IWUSR)
 # Tring to call the calling function again
 func(path)

 

 

 

Python : How to delete a directory recursively using shutil.rmtree() Read More »

Pandas : Check if a value exists in a DataFrame using in & not in operator | isin()

How to check if a value exists in a DataFrame using in and not in operator | isin() in Python ?

In this article we are going to discuss different ways to check if a value is present in the dataframe.

We will be using the following dataset as example

       Name        Age        City           Marks
0     Jill             16         Tokyo            154
1     Rachel       38        Texas             170
2     Kirti           39         New York      88
3     Veena        40        Texas            190
4     Lucifer       35        Texas             59
5     Pablo        30        New York       123
6     Lionel        45        Colombia      189

Check if a single element exists in DataFrame using in & not in operators :

Dataframe class has a member Dataframe.values  that gives us all the values in an numpy representation. We will be using that with the in and not operator to check if the value is present in the dataframe.

Using in operator to check if an element exists in dataframe :

We will be checking if the value 190 exists in the dataset using in operator.

#Program :

import pandas as pd
#Example data
employees = [
('Jill',    16,     'Tokyo',    154),
('Rachel',  38,     'Texas',    170),
('Kirti',   39,     'New York',  88),
('Veena',   40,     'Texas',    190),
('Lucifer', 35, 'Texas',     59),
('Pablo',   30,     'New York', 123),
('Lionel',  45,     'Colombia', 189)]

# DataFrame object was created
dfObj = pd.DataFrame(employees, columns=['Name', 'Age', 'City', 'Marks'])
if 190 in dfObj.values:
    print('Element exists in Dataframe')
Output :
Element exists in Dataframe

Using not in operator to check if an element doesn’t exists in dataframe :

We will be checking if ‘Leo’ is present in the dataframe using not operator.

# Program :

import pandas as pd
#Example data
employees = [
('Jill',    16,     'Tokyo',    154),
('Rachel',  38,     'Texas',    170),
('Kirti',   39,     'New York',  88),
('Veena',   40,     'Texas',    190),
('Lucifer', 35, 'Texas',     59),
('Pablo',   30,     'New York', 123),
('Lionel',  45,     'Colombia', 189)]
# DataFrame object was created
dfObj = pd.DataFrame(employees, columns=['Name', 'Age', 'City', 'Marks'])
if 'Leo' not in dfObj.values:
    print('Element does not exists in Dataframe')

Output :
Element does not exists in Dataframe

Checking if multiple elements exists in DataFrame or not using in operator :

To check for multiple elements, we have to write a function.

# Program :

import pandas as pd

def checkForValues(_dfObj, listOfValues):
    #The function would check for the list of values in our dataset
    result = {}
    #Iterate through the elementes
    for elem in listOfValues:
        # Check if the element exists in the dataframe values
        if elem in _dfObj.values:
            result[elem] = True
        else:
            result[elem] = False
    # Returns a dictionary containig the vvalues and their existence in boolean        
    return result

#Example data
employees = [
('Jill',    16,     'Tokyo',    154),
('Rachel',  38,     'Texas',    170),
('Kirti',   39,     'New York',  88),
('Veena',   40,     'Texas',    190),
('Lucifer', 35,     'Texas',     59),
('Pablo',   30,     'New York', 123),
('Lionel',  45,     'Colombia', 189)]
# DataFrame object was created
dfObj = pd.DataFrame(employees, columns=['Name', 'Age', 'City', 'Marks'])

#Check for the existence of values
Tresult = checkForValues(dfObj, [30, 'leo', 190])
print('The values existence inside the dataframe are ')
print(Tresult)
Output :
The values existence inside the dataframe are
{30: True, 'leo': False, 190: True}

Rather than writing a whole function, we can also achieve this using a smaller method using dictionary comprehension.

# Program :

import pandas as pd

#Example data
employees = [
('Jill',    16,     'Tokyo',    154),
('Rachel',  38,     'Texas',    170),
('Kirti',   39,     'New York',  88),
('Veena',   40,     'Texas',    190),
('Lucifer', 35,     'Texas',     59),
('Pablo',   30,     'New York', 123),
('Lionel',  45,     'Colombia', 189)]
# DataFrame object was created
dfObj = pd.DataFrame(employees, columns=['Name', 'Age', 'City', 'Marks'])
listOfValues = [30, 'leo', 190]
#using dictionary comprehension check for given values
result = {elem: True if elem in dfObj.values else False for elem in listOfValues}
print('The values existence inside the dataframe are ')
print(result)
Output :
The values existence inside the dataframe are
{30: True, 'leo': False, 190: True}

Checking if elements exists in DataFrame using isin() function :

We can also check if a value exists inside a dataframe or not using the isin( ) function.

Syntax : DataFrame.isin(self, values)

Where,

  • Values : It takes the values to check for inside the dataframe.

Checking if a single element exist in Dataframe using isin() :

# Program :

import pandas as pd

#Example data
employees = [
('Jill',    16,     'Tokyo',    154),
('Rachel',  38,     'Texas',    170),
('Kirti',   39,     'New York',  88),
('Veena',   40,     'Texas',    190),
('Lucifer', 35,     'Texas',     59),
('Pablo',   30,     'New York', 123),
('Lionel',  45,     'Colombia', 189)]
# DataFrame object was created
dfObj = pd.DataFrame(employees, columns=['Name', 'Age', 'City', 'Marks'])
#Checking ofr a single element susing isin( ) function
boolDf = dfObj.isin([38])
print(boolDf)
Output :
   Name  Age   City  Marks
0  False  False  False  False
1  False   True  False  False
2  False  False  False  False
3  False  False  False  False
4  False  False  False  False
5  False  False  False  False
6  False  False  False  False

Here the isin( ) operator returned a boolean dataframe of the same number of elements, where the elements that matched our values is True and rest all are false.

We can add this to the any( ) function that only shows the true values and pass it into another any( ) function making it a series to pinpoint if our element exists or not.

# Program :

import pandas as pd

#Example data
employees = [
('Jill',    16,     'Tokyo',    154),
('Rachel',  38,     'Texas',    170),
('Kirti',   39,     'New York',  88),
('Veena',   40,     'Texas',    190),
('Lucifer', 35,     'Texas',     59),
('Pablo',   30,     'New York', 123),
('Lionel',  45,     'Colombia', 189)]
# DataFrame object was created
dfObj = pd.DataFrame(employees, columns=['Name', 'Age', 'City', 'Marks'])
# Check if the value is inisde the dataframe using both isin() and any( ) function
result = dfObj.isin(['Lionel']).any().any()
if result:
    print('The value exists inside the datframe')
Output :
Any of the Element exists in Dataframe

Checking if any of the given values exists in the Dataframe :

# Program :

import pandas as pd

#Example data
employees = [
('Jill',    16,     'Tokyo',    154),
('Rachel',  38,     'Texas',    170),
('Kirti',   39,     'New York',  88),
('Veena',   40,     'Texas',    190),
('Lucifer', 35,     'Texas',     59),
('Pablo',   30,     'New York', 123),
('Lionel',  45,     'Colombia', 189)]
# DataFrame object was created
dfObj = pd.DataFrame(employees, columns=['Name', 'Age', 'City', 'Marks'])
# Check if all the values are inisde the dataframe using both isin() and any( ) function
result = dfObj.isin([81, 'Lionel', 190,]).any().any()
if result:
    print('Any of the Element exists in Dataframe')
Output :
Any of the Element exists in Dataframe

This program only prints if any of the given values are existing inside the dataframe.

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas – Find Elements in a Dataframe

Pandas : Check if a value exists in a DataFrame using in & not in operator | isin() Read More »

Python : *args | How to pass multiple arguments to function ?

Passing multiple arguments to function in Python.

In this article we will discuss how we can pass multiple arguments  to function using *args in Python.

Let’s say if we want to calculate average of four numbers by a function e.g avg(n1,n2,n3,n4) then we can calculate average of any four numbers  by passing some numbers arguments.

Like

# Program :

def avg(n1,n2,n3,n4):
    # function to calculate average of 4 numbers
    return (n1+n2+n3+n4)/4
average = avg(10,20,30,40)
print(average)
Output :
25.0

But what if we want to calculate average of 10 numbers then we can’t take the above function. Here, in this article we shall define a function in python which can accepts any number of arguments. So, let’s start the topic to know how we can achieve this.

Defining a function that can accept variable length arguments :

We can give any number of arguments in function by prefixing function with symbol ‘*‘.

# Program :

def calcAvg(*args):
    '''Accepts variable length arguments and calculate average of n numbers'''
    # to get the count of total arguments passed
    argNums = len(args)
    if argNums > 0 :
        sum_Nums = 0
        # to calculate average from arguments passed
        for ele in args :
            sum_Nums += ele
        return sum_Nums / argNums
        print(sum_Nums)
    else:
        return 
if __name__ == '__main__':
    avg_Num = calcAvg(10,20,30,40,50)
    print("Average is " , avg_Num)
Output :
Average is 30.0

Important points about *args :

Positioning of parameter *args :

Along with *args we can also add other parameters. But it should be make sure that *args should be after formal arguments.

Let’s see the representation of that.

# Program :

def publishError(startPar, endPar, *args):
# Accepts variable length arguments and publish error
    print(startPar)
    for el in args :
        print("Error : " , el)
    print(endPar)    
publishError("[Begin]" , "[Ends]" , "Unknown error")
Output :
[Begin]                                                             
Error :  Unknown error                                               
[Ends]

Variable length arguments can be of any type :

In *arg we can not only pass variable number of arguments, but also it can be of any data type.

# Programs :

def publishError(startPar, endPar, *args):
# Accepts variable length arguments and publish error
    print(startPar)
    for el in args :
        print("TypeError : " , el)
    print(endPar)    
publishError("[Begin]" , "[Ends]" , [10, 6.5, 8], ('Holla','Hello'), "")
Output :
[Begin]                                                             
TypeError :  [10, 6.5, 8]                                           
TypeError :  ('Holla', 'Hello')                                     
TypeError :                                                         
[Ends]

 

Python : *args | How to pass multiple arguments to function ? Read More »