Python Archives - Page 153 of 165

Python : Get Last Modification date & time of a file. | os.stat() | os.path.getmtime()

Getting the Last Modification date & time of a file. | os.stat() | os.path.getmtime() in Python.

We will see how we can get the modification date & time of a file and in our desired formats.

Get last modification time of a file using os.stat( ) :

Syntax-os.stat(filePath)

The above function returns os.stat_result that contains stats about the file .

To get the modification time, we have to use the ST_MTIME that will provide us with the modification time in seconds. We will pass it to a function time.ctime( ) which will return the info in a readable manner.

import os
import time
import stat
#Path of the file
fileStats = os.stat ( 'file.txt' )
#Passing the stats so that we get it in a readable manner
modTime = time.ctime ( fileStats [ stat.ST_MTIME ] )
print("Modified Time : ", modTime )

Output :
Modified Time :  Thu May 13 19:02:47 2021

Get last modification time of a file using os.path.getmtime() :

We can also use another one of python’s os module function i.e. os.path.getmtime( )

Syntax-os.path.getmtime( filePath)

The function returns the number of seconds elapsed after the modification of the file .We have to convert it into a proper format.

Get last modification time using os.path.getmtime() & time.localtime( ) :

import os
import time
#Path of the file going inside the function that returns the number of seconds elpased since modification
modTimeInSeconds = os.path.getmtime ( 'file.txt' )
#Passing the stats so that we get the modification time in a readable manner
modTime = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(modTimeInSeconds))
print("Modified Time : ", modTime )

Output:
Modified Time :  2021-05-13 19:02:47

The function time.localtime( ) converts the seconds to a strruct_time which when passed into strftime( ) returns the timestamp in readable format.

Also we can set the format in strftime( ) to get only the modification date.

import os
import time
#Path of the file going inside the function that returns the number of seconds elpased since modification
modTimeInSeconds = os.path.getmtime ( 'file.txt' )
#Passing the stats so that we get the modification time in a readable manner
modTime = time.strftime('%d/%m/%Y',time.localtime(modTimeInSeconds))
print("Modified Time : ", modTime )

Output :
Modified Time :  13/05/2021

Get last modification time using os.path.getmtime() & datetime.fromtimestamp() :

We can also find the modification time of the file without using time.localtime() using datetime.fromtimestamp().

import os
import time
import datetime
#Path of the file going inside the function that returns the number of seconds elpased since modification
modTimeInSeconds = os.path.getmtime ( 'file.txt' )
#Passing the stats so that we get the modification time in a readable manner
modTime = datetime.datetime.fromtimestamp(modTimeInSeconds).strftime('%Y-%m-%d %H:%M:%S')
print("Modified Time : ", modTime )

Output :
Modified Time :  2021-05-13 19:02:47

Get last modification time of a file in UTC Timezone :

To obtain the last modification time, we can use datetime.utcfromtimestamp( )

import os
import time
import datetime
#Path of the file going inside the function that returns the number of seconds elpased since modification
modTimeInSeconds = os.path.getmtime ( 'file.txt' )
#Passing the stats so that we get the modification time in a readable manner
modTime = datetime.datetime.utcfromtimestamp(modTimeInSeconds).strftime('%Y-%m-%d %H:%M:%S')
print("Modified Time : ", modTime )

Output :
Modified Time :  2021-05-13 13:32:47

Python : Get Last Modification date & time of a file. | os.stat() | os.path.getmtime() Read More »

deque vs vector : What to choose ?

Python / By Satyabrata Jena

Deque vs Vector

In this article, we are going to see the difference between the STL sequential containers std::deque and std::vector with their appropriate usage.

VECTOR :

Vectors are dynamic arrays that can shrink and expand when an element is added.
The container handles the memory automatically.
The data can be inserted at the middle or at the end.
The elements are stored in contiguous storage.

DEQUE :

Deques or double-ended queues are sequence containers that shrink and expand from both the ends.
Data can be inserted from the start, middle and ends.
The data is not stored in contiguous storage locations always.

What’s the difference ?

While vector is like Dynamic array. Deque is the data structure implementation of the double-ended queue.
While in vector insertion and deletion at end has good performance, however insertion and deletion from middle performs badly. But deque provides the same performance like vector insertion and deletion at end for both end and middle. Also has good performance with insertion and deletion at start.
While vectors store elements contiguous which makes it faster to operate at the end faster than deques. The deques are not stored contiguous but are a list of various locations where the elements are stored so overall operations at first, mid and end positions are faster than vectors as it does not have to shift elements to store an element.

Appropriate place to use :

When there are list like operations, where additions and deletion only happen at the bottom, vectors are suitable to use. In case we want to operate on the top position as well it is suitable to use deques for the purpose.

deque vs vector : What to choose ? Read More »

Pandas : 4 Ways to check if a DataFrame is empty in Python

Python / By Satyabrata Jena

How to check if a dataframe is empty in Python ?

In this article we will see different ways to check if a dataframe is empty in python.

Approach-1 : Check whether dataframe is empty using Dataframe.empty :

There is an empty attribute provided by dataframe class in python.

Syntax - Dataframe.empty

If it returns True then the dataframe is empty.

# Program :

import pandas as pd

# empty Dataframe created
dfObj = pd.DataFrame(columns=['Date', 'UserName', 'Action'])

# Checking if Dataframe is empty or not
# using empty attribute
if dfObj.empty == True:
    print('DataFrame is empty')
else:
    print('DataFrame is not empty')

Output :
DataFrame is not empty

Even it contains NaN then also it returns the dataframe is empty.

# Program :

import pandas as pd
import numpy as np

# List of Tuples
students = [(np.NaN, np.NaN, np.NaN),
            (np.NaN, np.NaN, np.NaN),
            (np.NaN, np.NaN, np.NaN)
           ]

# Dataframe object created
dfObj = pd.DataFrame(columns=['Your Name', 'Your Age', 'Your City'])

# Checking if Dataframe is empty or not
# using empty attribute
if dfObj.empty == True:
    print('DataFrame is empty')
else:
    print('DataFrame is not empty')

Output :
DataFrame is empty

Approach-2 : Check if dataframe is empty using Dataframe.shape :

There is an shape attribute provided by dataframe class in python.

Syntax- Dataframe.shape

shape attribute return a tuple containing dimension of dataframe. Like if in the dataframe there is 3 rows and 4 columns then it will return (3,4). If the dataframe is empty then it will return 0 at 0th index.

# Create an empty Dataframe
dfObj = pd.DataFrame(columns=['Date', 'UserName', 'Action'])
# Check if Dataframe is empty using dataframe's shape attribute
if dfObj.shape[0] == 0:
    print('DataFrame is empty')
else:
    print('DataFrame is not empty')

Output :
DataFrame is empty

Approach-3 : Check if dataframe is empty by checking length of index :

Dataframe.index represents indices of Dataframe. If the dataframe is empty then size will be 0.

# Program :

import pandas as pd
import numpy as np

# empty Dataframe object created
dfObj = pd.DataFrame(columns=['Date', 'UserName', 'Action'])
# checking if length of index is 0 or not
if len(dfObj.index.values) == 0:
    print('DataFrame is empty')
else:
    print('DataFrame is not empty')

Output :
DataFrame is empty

Approach-4 : Check if dataframe is empty by using len on Datafarme :

Directly by calling the len() function we can also check the dataframe is empty or not. If the length of dataframe is 0 then it the dataframe is empty.

# Program :

import pandas as pd
import numpy as np

# empty Dataframe object created
dfObj = pd.DataFrame(columns=['Date', 'UserName', 'Action'])

# checking if length of dataframe is 0 or not
# by calling len()
if len(dfObj) == 0:
    print('DataFrame is empty')
else:
    print('DataFrame is not empty')

Output :
DataFrame is not empty

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas – Find Elements in a Dataframe

Pandas : 4 Ways to check if a DataFrame is empty in Python Read More »

numpy.insert() – Python | Definition, Syntax, Parameters, Example of Python Numpy.insert() Function

Python / By Shikha Mishra

In this tutorial, we will discuss what is python numpy.insert() and how to use numpy.insert()? Also, you can get a good grip on numpy.insert() function in Python by using the example prevailing in this tutorial. Let’s tap on the direct links available here for quick reference on insert an element into NumPy Array in python.

Python numpy.insert()
Syntax
Parameters
Return Values
numpy.insert() function Example
Insert an element into a NumPy array at a given index position
Insert multiple elements into a NumPy array at the given index
Insert multiple elements at multiple indices in a NumPy array
Insert a row into a 2D Numpy array
Insert a column into a 2D Numpy array

Python numpy.insert()

Python Numpy library provides a function to insert elements in an array. If the insertion is not done in place and the function returns a new array. Moreover, if the axis is not given, the input array is flattened.

Syntax:

numpy.insert(arr, index, values, axis=None)

Parameters:

arr: array_like object
- The array which we give as an input.
index: int, slice or sequence of ints
- The index before which insertion is to be made
values: array_like object
- The array of values to be inserted
axis: int, optional
- The axis along which to insert. If not given, the input array is flattened

Return Values:

out: ndarray
- A copy of arr with given values inserted are given indices.
  - If axis is None, then it returns a flattened array.
  - If axis is 1, then insert column-wise.
  - If axis is 0, then insert row-wise.
- It doesn’t modify the actual array, rather it returns a copy of the given array with inserted values.

Let’s understand with some of the below-given examples:

numpy.insert() function Example

import numpy as np 
a = np.array([[1,2],[3,4],[5,6]]) 

print 'First array:' 
print a 
print '\n'  

print 'Axis parameter not passed. The input array is flattened before insertion.'
print np.insert(a,3,[11,12]) 
print '\n'  
print 'Axis parameter passed. The values array is broadcast to match input array.'

print 'Broadcast along axis 0:' 
print np.insert(a,1,[11],axis = 0) 
print '\n'  

print 'Broadcast along axis 1:' 
print np.insert(a,1,11,axis = 1)

Output:

First array:
[[1 2]
[3 4]
[5 6]]

Axis parameter not passed. The input array is flattened before insertion.
[ 1 2 3 11 12 4 5 6]

Axis parameter passed. The values array is broadcast to match input array.
Broadcast along axis 0:
[[ 1 2]
[11 11]
[ 3 4]
[ 5 6]]

Broadcast along axis 1:
[[ 1 11 2]
[ 3 11 4]
[ 5 11 6]]

Do Refer:

Insert an element into a NumPy array at a given index position

Let’s take an array of integers and we want to insert an element 14 at the index position 3. For that, we will call the insert() with an array, index position, and element to be inserted.

import numpy as np
# Create a Numpy Array of integers
arr = np.array([8, 12, 5, 9, 13])
# Insert an element 14 at index position 3
new_arr = np.insert(arr, 3, 14)
print('New Array: ', new_arr)
print('Original Array: ', arr)

Output:

New Array: [ 8 12 5 14 9 13]
Original Array: [ 8 12 5 9 13]

Insert multiple elements into a NumPy array at the given index

In this, we are going to insert multiple elements, for this we pass the elements as a sequence along with the index position.

import numpy as np
# Create a Numpy Array of integers
arr = np.array([8, 12, 5, 9, 13])
# Insert three element at index position 3
new_arr = np.insert(arr, 3, (10, 10, 10))
print('New Array: ', new_arr)

Output:

New Array: [ 8 12 5 10 10 10 9 13]

Insert multiple elements at multiple indices in a NumPy array

In this, we are going to insert multiple elements at multiple indices.

import numpy as np
# Create a Numpy Array of integers
arr = np.array([8, 12, 5, 9, 13])
# Insert three element index position 0, 1 and 2
new_arr = np.insert(arr, (0,1,2), (21, 31, 41))
print('New Array: ', new_arr)

Output:

New Array: [21 8 31 12 41 5 9 13]

So in the above example, you can see that we have added (21,31,41) at (0,1,2) position.

Insert a row into a 2D Numpy array

In this, we are going to insert a row in the array, so we have to pass the axis as 0 and the values as a sequence.

import numpy as np
# Create 2D Numpy array of hard coded numbers
arr = np.array([[2, 3, 4],
                [7, 5, 7],
                [6, 3, 9]])
# Insert a row at index 1
new_arr = np.insert(arr, 1, (4, 4, 4), axis=0)
print(new_arr)

Output:

[[2 3 4]
[4 4 4]
[7 5 7]
[6 3 9]]

Insert a column into a 2D Numpy array

In this, we are going to insert a column in the array, for this we need to pass the axis as 1 and the values as a sequence

import numpy as np
# Create 2D Numpy array of hard coded numbers
arr = np.array([[2, 3, 4],
                [7, 5, 7],
                [6, 3, 9]])
# Insert a column at index 1
new_arr = np.insert(arr, 1, (5, 5, 5), axis=1)
print(new_arr)

Output:

[[2 5 3 4]
[7 5 5 7]
[6 5 3 9]]

So you can see that it inserted a column at index 1.

Here is another way to do the same,

import numpy as np
 # Create 2D Numpy array of hard coded numbers
 arr = np.array([[2, 3, 4],
                 [7, 5, 7], 
                 [6, 3, 9]]) 
# Insert a column at index 1 
new_arr = np.insert(arr, 1,5, axis=1) 
print(new_arr)

Output:

[[2 5 3 4]
[7 5 5 7]
[6 5 3 9]]

Conclusion

In this article, you have seen different uses of numpy.insert(). Thank you!

numpy.insert() – Python | Definition, Syntax, Parameters, Example of Python Numpy.insert() Function Read More »

Pandas: 6 Different ways to iterate over rows in a Dataframe & Update while iterating row by row

Python / By Shikha Mishra

In this tutorial, we will review & make you understand six different techniques to iterate over rows. Later we will also explain how to update the contents of a Dataframe while iterating over it row by row.

Iterate over rows of a dataframe using DataFrame.iterrows()
Iterate over rows of a dataframe using DataFrame.itertuples()
Named Tuples without index
Named Tuples with custom names
Iterate over rows in dataframe as Dictionary
Iterate over rows in dataframe using index position and iloc
Iterate over rows in dataframe in reverse using index position and iloc
Iterate over rows in dataframe using index labels and loc[]
Update contents a dataframe While iterating row by row

Let’s first create a dataframe which we will use in our example,

import pandas as pd
empoyees = [('Shikha', 34, 'Mumbai', 5) ,
           ('Rekha', 31, 'Delhi' , 7) ,
           ('Shishir', 16, 'Punjab', 11)
            ]
# Create a DataFrame object
empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c'])
print(empDfObj)

Output:

Name  Age    City        Experience
a  Shikha 34     Mumbai  5
b Rekha   31     Delhi      7
c Shishir  16     Punjab   11

Iterate over rows of a dataframe using DataFrame.iterrows()

Dataframe class implements a member function iterrows() i.e. DataFrame.iterrows(). Now, we will use this function to iterate over rows of a dataframe.

DataFrame.iterrows()

DataFrame.iterrows() returns an iterator that iterator iterate over all the rows of a dataframe.

For each row, it returns a tuple containing the index label and row contents as series.

Let’s use it in an example,

import pandas as pd
empoyees = [('Shikha', 34, 'Mumbai', 5) ,
           ('Rekha', 31, 'Delhi' , 7) ,
           ('Shishir', 16, 'Punjab', 11)
            ]
# Create a DataFrame object
empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c'])

for (index_label, row_series) in empDfObj.iterrows():
   print('Row Index label : ', index_label)
   print('Row Content as Series : ', row_series.values)

Output:

Row Index label : a
Row Content as Series : ['Shikha' 34 'Mumbai' 5]
Row Index label : b
Row Content as Series : ['Rekha' 31 'Delhi' 7]
Row Index label : c
Row Content as Series : ['Shishir' 16 'Punjab' 11]

Note:

Do Not Preserve the data types as iterrows() returns each row contents as series however it doesn’t preserve datatypes of values in the rows.
We can not able to do any modification while iterating over the rows by iterrows(). If we do some changes to it then our original dataframe would not be affected.

Iterate over rows of a dataframe using DataFrame.itertuples()

DataFrame.itertuples()

DataFrame.itertuples() yields a named tuple for each row containing all the column names and their value for that row.

Let’s use it,

import pandas as pd
empoyees = [('Shikha', 34, 'Mumbai', 5) ,
           ('Rekha', 31, 'Delhi' , 7) ,
           ('Shishir', 16, 'Punjab', 11)
            ]
# Create a DataFrame object
empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c'])

# Iterate over the Dataframe rows as named tuples
for namedTuple in empDfObj.itertuples():
   #Print row contents inside the named tuple
   print(namedTuple)

Output:

Pandas(Index='a', Name='Shikha', Age=34, City='Mumbai', Experience=5)
Pandas(Index='b', Name='Rekha', Age=31, City='Delhi', Experience=7)
Pandas(Index='c', Name='Shishir', Age=16, City='Punjab', Experience=11)

So we can see that for every row it returned a named tuple. we can access the individual value by indexing..like,

For the first value,

namedTuple[0]

For the second value,

namedTuple[1]

Do Read:

Named Tuples without index

If we pass argument ‘index=False’ then it only shows the named tuple not the index column.

import pandas as pd
empoyees = [('Shikha', 34, 'Mumbai', 5) ,
           ('Rekha', 31, 'Delhi' , 7) ,
           ('Shishir', 16, 'Punjab', 11)
            ]
# Create a DataFrame object
empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c'])

# Iterate over the Dataframe rows as named tuples without index
for namedTuple in empDfObj.itertuples(index=False):
   # Print row contents inside the named tuple
   print(namedTuple)

Output:

Pandas(Name='Shikha', Age=34, City='Mumbai', Experience=5)
Pandas(Name='Rekha', Age=31, City='Delhi', Experience=7)
Pandas(Name='Shishir', Age=16, City='Punjab', Experience=11)

Named Tuples with custom names

If we don’t want to show Pandas name every time, we can pass custom names too:

import pandas as pd
empoyees = [('Shikha', 34, 'Mumbai', 5) ,
           ('Rekha', 31, 'Delhi' , 7) ,
           ('Shishir', 16, 'Punjab', 11)
            ]
# Create a DataFrame object
empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c'])

# Give Custom Name to the tuple while Iterating over the Dataframe rows
for row in empDfObj.itertuples(name='Employee'):
   # Print row contents inside the named tuple
   print(row)

Output:

Employee(Index='a', Name='Shikha', Age=34, City='Mumbai', Experience=5)
Employee(Index='b', Name='Rekha', Age=31, City='Delhi', Experience=7)
Employee(Index='c', Name='Shishir', Age=16, City='Punjab', Experience=11)

Iterate over rows in dataframe as Dictionary

Using this method we can iterate over the rows of the dataframe and convert them to the dictionary for accessing by column label using the same itertuples().

import pandas as pd
employees = [('Shikha', 34, 'Mumbai', 5) ,
           ('Rekha', 31, 'Delhi' , 7) ,
           ('Shishir', 16, 'Punjab', 11)
            ]
# Create a DataFrame object
empDfObj = pd.DataFrame(employees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c'])

# itertuples() yields an iterate to named tuple
for row in empDfObj.itertuples(name='Employee'):
   # Convert named tuple to dictionary
   dictRow = row._asdict()
   # Print dictionary
   print(dictRow)
   # Access elements from dict i.e. row contents
   print(dictRow['Name'] , ' is from ' , dictRow['City'])

Output:

{'Index': 'a', 'Name': 'Shikha', 'Age': 34, 'City': 'Mumbai', 'Experience': 5}
Shikha is from Mumbai
{'Index': 'b', 'Name': 'Rekha', 'Age': 31, 'City': 'Delhi', 'Experience': 7}
Rekha is from Delhi
{'Index': 'c', 'Name': 'Shishir', 'Age': 16, 'City': 'Punjab', 'Experience': 11}
Shishir is from Punjab

Iterate over rows in dataframe using index position and iloc

We will loop through the 0th index to the last row and access each row by index position using iloc[].

import pandas as pd
employees = [('Shikha', 34, 'Mumbai', 5) ,
           ('Rekha', 31, 'Delhi' , 7) ,
           ('Shishir', 16, 'Punjab', 11)
            ]
# Create a DataFrame object
empDfObj = pd.DataFrame(employees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c'])

# Loop through rows of dataframe by index i.e. from 0 to number of rows
for i in range(0, empDfObj.shape[0]):
   # get row contents as series using iloc{] and index position of row
   rowSeries = empDfObj.iloc[i]
   # print row contents
   print(rowSeries.values)

Output:

['Shikha' 34 'Mumbai' 5]
['Rekha' 31 'Delhi' 7]
['Shishir' 16 'Punjab' 11]

Iterate over rows in dataframe in reverse using index position and iloc

Using this we will loop through the last index to the 0th index and access each row by index position using iloc[].

import pandas as pd
employees = [('Shikha', 34, 'Mumbai', 5) ,
           ('Rekha', 31, 'Delhi' , 7) ,
           ('Shishir', 16, 'Punjab', 11)
            ]
# Create a DataFrame object
empDfObj = pd.DataFrame(employees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c'])

# Loop through rows of dataframe by index in reverse i.e. from last row to row at 0th index.
for i in range(empDfObj.shape[0] - 1, -1, -1):
   # get row contents as series using iloc{] and index position of row
   rowSeries = empDfObj.iloc[i]
   # print row contents
   print(rowSeries.values)

Output:

['Shishir' 16 'Punjab' 11]
['Rekha' 31 'Delhi' 7]
['Shikha' 34 'Mumbai' 5]

Iterate over rows in dataframe using index labels and loc[]

import pandas as pd
employees = [('Shikha', 34, 'Mumbai', 5) ,
           ('Rekha', 31, 'Delhi' , 7) ,
           ('Shishir', 16, 'Punjab', 11)
            ]
# Create a DataFrame object
empDfObj = pd.DataFrame(employees, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c'])

# loop through all the names in index label sequence of dataframe
for index in empDfObj.index:
   # For each index label, access the row contents as series
   rowSeries = empDfObj.loc[index]
   # print row contents
   print(rowSeries.values)

Output:

['Shikha' 34 'Mumbai' 5]
['Rekha' 31 'Delhi' 7]
['Shishir' 16 'Punjab' 11]

Update contents a dataframe While iterating row by row

As Dataframe.iterrows() returns a copy of the dataframe contents in a tuple, so updating it will have no effect on the actual dataframe. So, to update the contents of the dataframe we need to iterate over the rows of the dataframe using iterrows() and then access each row using at() to update its contents.

Let’s see an example,

Suppose we have a dataframe i.e

import pandas as pd


# List of Tuples
salaries = [(11, 5, 70000, 1000) ,
           (12, 7, 72200, 1100) ,
           (13, 11, 84999, 1000)
           ]
# Create a DataFrame object
salaryDfObj = pd.DataFrame(salaries, columns=['ID', 'Experience' , 'Salary', 'Bonus'])

Output:

   ID Experience Salary Bonus
0 11    5             70000 1000
1 12    7             72200 1100
2 13   11            84999 1000

Now we will update each value in column ‘Bonus’ by multiplying it with 2 while iterating over the dataframe row by row.

import pandas as pd


# List of Tuples
salaries = [(11, 5, 70000, 1000) ,
           (12, 7, 72200, 1100) ,
           (13, 11, 84999, 1000)
           ]
# iterate over the dataframe row by row
salaryDfObj = pd.DataFrame(salaries, columns=['ID', 'Experience' , 'Salary', 'Bonus'])
for index_label, row_series in salaryDfObj.iterrows():
   # For each row update the 'Bonus' value to it's double
   salaryDfObj.at[index_label , 'Bonus'] = row_series['Bonus'] * 2
print(salaryDfObj)

Output:

    ID    Experience Salary Bonus
0 11          5           70000 2000
1 12          7           72200 2200
2 13        11           84999 2000

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas

Conclusion:

So in this article, you have seen different ways to iterate over rows in a dataframe & update while iterating row by row. Keep following our BtechGeeks for more concepts of python and various programming languages too.

Pandas: 6 Different ways to iterate over rows in a Dataframe & Update while iterating row by row Read More »

Python : How to Get all Keys with Maximum Value in a Dictionary

Python / By Mayank Gupta

Method to get all the keys with maximum value in a dictionary in python

In this article we will discuss about different methods to get all the keys with maximum value in a dictionary. Let us see all these methods one by one.

Method 1-Using max() function and d.get

As the name suggests max function is used to find the maximum value. Let us see what is max() function in python and how it works.

max() function

The max() function returns the item with the highest value or the item with the highest value in an iterable. Normally we pass iterables in the max() function to get the max value in iterable but as we know dictionary has both keys and values so we have to pass an extra argument in the max() function to get the keys with max value in a dictionary.

Syntax: max(iterable,key=d.get) where d denotes the name of the dictionary.

It returns the item with maximum value in the Iterable.

So with the help of the max() function and d.get argument we can easily find the key with max value in a dictionary.

d = {"a": 1, "b": 2, "c": 3,"d":4}
max_key = max(d, key=d.get)
print(max_key)

Output

Here we see that corresponding to key “d” we get max value so this function return d as the output.

There is a small problem with this method. Let see the problem and see the method to solve the problem.

Problem

The problem with this method is that if in the dictionary multiple keys have max value then this method only returns the key that occurs first in the dictionary. Let see this with the help of an example.

d = {"a": 1, "b": 2, "c":4,"d":4}
max_key = max(d, key=d.get)
print(max_key)

Output

In the dictionary, we see that the dictionary has max value 4 corresponds to key c and d still function only return c as the output because c comes first in the dictionary. So this is the main problem with this method.

So if there is a problem so solution also exists. As this problem occurs with all the methods so after study all the methods we can discuss the solution to the problem.

Method 2-Using max function() and operator

As we have already discussed max() function now we will discuss the operator module in python and how to use it in our program to get key with maximum value.

operator module

The operator module exports a set of efficient functions corresponding to the intrinsic operators of Python. For example, operator.add(x,y) is equivalent to the expression x+y.

Let see with an example how we can achieve our objective with the help of max and operator module.

import operator
d={"a":1,"b":2,"c":3,"d":4}
max_key = max(d.items(), key = operator.itemgetter(1))[0]
print(max_key)

Output

So here is an example of how with using the max() function and operator module we get a key with maximum value in the dictionary. Here also a similar problem arises that if we have multiple keys with max value so the function returns the only key with the first occurrence.

The solution to the above problem

In both the method we see that how we only get one key if we have multiple keys with max value. So let discuss the solution. We can be done this task with the help of a single iteration over the dictionary. As we get almost one key that corresponds to the max value in the dictionary. So we can take the help of the key to get max value. Now we can iterate over the dictionary and check the key having max value and store these keys in a list and then print the list. Let see this with the help of an example.

d = {"a": 1, "b": 2, "c":4,"d":4}

max_key = max(d, key=d.get)
val=d[max_key]
l=[]
for key in d:
    if(d[key]==4):
        l.append(key)
print(l)

Output

['c', 'd']

So here we see how we can easily print a list of keys having max value.

So these are the methods to get key with max value in a python dictionary.

Python : How to Get all Keys with Maximum Value in a Dictionary Read More »

Python: Count Nan and Missing Values in Dataframe Using Pandas

Python / By Mayank Gupta

Method to count Nan and missing value in data frames using pandas

In this article, we will discuss null values in data frames and calculate them in rows, columns, and in total. Let discuss nan or missing values in the dataframe.

NaN or Missing values

The full form of NaN is Not A Number.It is used to represent missing values in the dataframe. Let see this with an example.

students= {'students': ['Raj', 'Rahul', 'Mayank', 'Ajay', 'Amar'],
      'Marks':[90,np.nan,87,np.nan,19]}
df = pd.DataFrame(num, columns=['students','Marks'])
print(df)

Output

       students  Marks
0      Raj   90.0
1    Rahul    NaN
2   Mayank   87.0
3     Ajay    NaN
4     Amar   19.0

Here we see that there are NaN inside the Marks column that is used to represent missing values.

Reason to count Missing values or NaN values in Dataframe

One of the main reasons to count missing values is that missing values in any dataframe affects the accuracy of prediction. If there are more missing values in the dataframe then our prediction or result highly effect. Hence we calculate missing values. If there are the high count of missing values we can drop them else we can leave them as it is in dataframe.

Method to count NaN or missing values

To use count or missing value first we use a function isnull(). This function replaces all NaN value with True and non-NaN values with False which helps us to calculate the count of NaN or missing values. Let see this with the help of an example.

students= {'students': ['Raj', 'Rahul', 'Mayank', 'Ajay', 'Amar'],
      'Marks':[90,np.nan,87,np.nan,19]}
df = pd.DataFrame(num, columns=['students','Marks'])
df.isnull()

Output

  students  Marks
0      False   False
1     False   True
2     False   False
3     False   NaN
4     False   True

We get our dataframe something like this. Now we can easily calculate the count of NaN or missing values in the dataframe.

Count NaN or missing values in columns

With the help of .isnull().sum() method, we can easily calculate the count of NaN or missing values. isnull() method converts NaN values to True and non-NaN values to false and then the sum() method calculates the number of false in respective columns. Let see this with an example.

students= {'students': ['Raj', 'Rahul', 'Mayank', 'Ajay', 'Amar'],
      'Marks':[90,np.nan,87,np.nan,19]}
df = pd.DataFrame(num, columns=['students','Marks'])
df.isnull().sum()

Output

students    0
Marks       2
dtype: int64

As we also see in the dataframe that we have no NaN or missing values in the students column but we have 2 in the Marks column.

Count NaN or missing values in Rows

For this, we can iterate through each row using for loop and then using isnull().sum() method calculates NaN or missing values in all the rows. Let see this with an example.

students= {'students': ['Raj', 'Rahul', 'Mayank', 'Ajay', 'Amar'],
      'Marks':[90,np.nan,87,np.nan,19]}
df = pd.DataFrame(num, columns=['students','Marks'])
for i in range(len(df.index)) :
    print("Nan in row ", i , " : " ,  df.iloc[i].isnull().sum())

Output

Nan in row  0  :  0
Nan in row  1  :  1
Nan in row  2  :  0
Nan in row  3  :  1
Nan in row  4  :  0

Count total NaN or missing values in dataframe

In the above two examples, we see how to calculate missing values or NaN in rows or columns. Now we see how to calculate the total missing value in the dataframe For this we have to simply use isnull().sum().sum() method and we get our desired output. Let see this with help of an example.

students= {'students': ['Raj', 'Rahul', 'Mayank', 'Ajay', 'Amar'],
      'Marks':[90,np.nan,87,np.nan,19]}
df = pd.DataFrame(num, columns=['students','Marks'])
print("Total NaN values: ",df.isnull().sum().sum())

Output

Total NaN values:  2

So these are the methods tp count NaN or missing values in dataframes.

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas

Python: Count Nan and Missing Values in Dataframe Using Pandas Read More »

Python : How to delete a directory recursively using shutil.rmtree()

Python / By Satyabrata Jena

How to delete a directory recursively using shutil.rmtree() in Python ?

In this article, we will discuss about how we can delete an empty directory and also all contents of a directory including sub directories using shutil.rmtree().

Delete an empty directory using os.rmdir() :

The os module of python provide a function i.e. os.rmdir(pathOfDir) which can delete an empty directory. It can also give rise to certain errors in some scenarios like:

When directory is not empty : OSError: [WinError 145] The directory is not empty:
When the given path not pointing to any of the directory : NotADirectoryError: [WinError 267] The directory name is invalid:
When the given path has no directory : FileNotFoundError: [WinError 2] The system cannot find the file specified:

#Program :

import os

# Deleting a empty directory using function os.rmdir() and handle the exceptions
try:
 os.rmdir('/somedirec/log2')

except:
 print('Not found directory')

Output :
Not found directory

Delete all files in a directory & sub-directories recursively using shutil.rmtree() :

The shutil module of python provides a function i.e. shutil.rmtree() to delete all the contents present in a directory.

 Syntax : shutil.rmtree(path, ignore_errors=False, onerror=None)

import shutil

dirPath = '/somedirec/logs5/';

# trying to delete all contents of a directory and handle exceptions
try:
 shutil.rmtree(dirPath)

except:
 print('Not found directory')

Output :
Not found directory

Here in this case if found, all contents of directory '/somedir/logs/' will be deleted. If any of the files in directory has read only attributes then user can’t delete the file and exception i.e. PermissionError will be raised therefore we can’t delete any remaining required files.

shutil.rmtree() & ignore_errors :

In previous scenario if we failed to delete any file, then we can’t delete further required files in the directory. Now, we can ignore the errors by passing ignore_errors=True in shutil.rmtree().

It will skip the file which raised the exception and then going forward it can delete all of the remaining files.

Let there be a scenario where we can’t delete a file in log directory, then we can use the following syntax so that this exception can be ignored:

shutil.rmtree(dirPath, ignore_errors=True)

Passing callbacks in shutil.rmtree() with onerror :

Instead of ignoring errors, there may be a case where we have to handle the errors.

Syntax :  shutil.rmtree(path, ignore_errors=False, onerror=None)

In onerror parameter, callback function can be called to handle the errors

shutil.rmtree(dirPath, onerror=handleError)

The callable function passed must be callable like:

def handleError(func, path, exc_info):
   pass

where,

function : It is the function which raised exception.
path : The path that passed which raises the exception while removal.
excinfo : The exception information is returned

Now, if any exception is raised while deleting a file, then callback will handle the error. Then, shutil.rmtree() can continue deleting remaining files.

Now let a case where we want to delete all the contents of a directory '/somediec/logs', and there is somewhere a file in logs directory which shows permission issues, and we can’t delete all. So callback will be passed in oneerror parameter of that file.

In the callback if it accesses issue, then we will change the file permission and then call calling function i.e. rmtree() with path of file. Then the file will be deleted and rmtree() will further delete remaining contents in the directory.

import os
import shutil
import stat

#Error handling function will try to change permission and call calling function again

def handleError(func, path, exc_info):
 print('Handling Error for file ' , path)
 print(exc_info)

# check that if the file is accessing the issue
if not os.access(path, os.W_OK):
 print('helloWorld')
 # trying to change the permission of file
 os.chmod(path, stat.S_IWUSR)
 # Tring to call the calling function again
 func(path)

Python : How to delete a directory recursively using shutil.rmtree() Read More »

Pandas : Check if a value exists in a DataFrame using in & not in operator | isin()

Python / By Satyabrata Jena

How to check if a value exists in a DataFrame using in and not in operator | isin() in Python ?

In this article we are going to discuss different ways to check if a value is present in the dataframe.

We will be using the following dataset as example

       Name        Age        City           Marks
0     Jill             16         Tokyo            154
1     Rachel       38        Texas             170
2     Kirti           39         New York      88
3     Veena        40        Texas            190
4     Lucifer       35        Texas             59
5     Pablo        30        New York       123
6     Lionel        45        Colombia      189

Check if a single element exists in DataFrame using in & not in operators :

Dataframe class has a member Dataframe.values that gives us all the values in an numpy representation. We will be using that with the in and not operator to check if the value is present in the dataframe.

Using in operator to check if an element exists in dataframe :

We will be checking if the value 190 exists in the dataset using in operator.

#Program :

import pandas as pd
#Example data
employees = [
('Jill',    16,     'Tokyo',    154),
('Rachel',  38,     'Texas',    170),
('Kirti',   39,     'New York',  88),
('Veena',   40,     'Texas',    190),
('Lucifer', 35, 'Texas',     59),
('Pablo',   30,     'New York', 123),
('Lionel',  45,     'Colombia', 189)]

# DataFrame object was created
dfObj = pd.DataFrame(employees, columns=['Name', 'Age', 'City', 'Marks'])
if 190 in dfObj.values:
    print('Element exists in Dataframe')

Output :
Element exists in Dataframe

Using not in operator to check if an element doesn’t exists in dataframe :

We will be checking if ‘Leo’ is present in the dataframe using not operator.

# Program :

import pandas as pd
#Example data
employees = [
('Jill',    16,     'Tokyo',    154),
('Rachel',  38,     'Texas',    170),
('Kirti',   39,     'New York',  88),
('Veena',   40,     'Texas',    190),
('Lucifer', 35, 'Texas',     59),
('Pablo',   30,     'New York', 123),
('Lionel',  45,     'Colombia', 189)]
# DataFrame object was created
dfObj = pd.DataFrame(employees, columns=['Name', 'Age', 'City', 'Marks'])
if 'Leo' not in dfObj.values:
    print('Element does not exists in Dataframe')

Output :
Element does not exists in Dataframe

Checking if multiple elements exists in DataFrame or not using in operator :

To check for multiple elements, we have to write a function.

# Program :

import pandas as pd

def checkForValues(_dfObj, listOfValues):
    #The function would check for the list of values in our dataset
    result = {}
    #Iterate through the elementes
    for elem in listOfValues:
        # Check if the element exists in the dataframe values
        if elem in _dfObj.values:
            result[elem] = True
        else:
            result[elem] = False
    # Returns a dictionary containig the vvalues and their existence in boolean        
    return result

#Example data
employees = [
('Jill',    16,     'Tokyo',    154),
('Rachel',  38,     'Texas',    170),
('Kirti',   39,     'New York',  88),
('Veena',   40,     'Texas',    190),
('Lucifer', 35,     'Texas',     59),
('Pablo',   30,     'New York', 123),
('Lionel',  45,     'Colombia', 189)]
# DataFrame object was created
dfObj = pd.DataFrame(employees, columns=['Name', 'Age', 'City', 'Marks'])

#Check for the existence of values
Tresult = checkForValues(dfObj, [30, 'leo', 190])
print('The values existence inside the dataframe are ')
print(Tresult)

Output :
The values existence inside the dataframe are
{30: True, 'leo': False, 190: True}

Rather than writing a whole function, we can also achieve this using a smaller method using dictionary comprehension.

# Program :

import pandas as pd

#Example data
employees = [
('Jill',    16,     'Tokyo',    154),
('Rachel',  38,     'Texas',    170),
('Kirti',   39,     'New York',  88),
('Veena',   40,     'Texas',    190),
('Lucifer', 35,     'Texas',     59),
('Pablo',   30,     'New York', 123),
('Lionel',  45,     'Colombia', 189)]
# DataFrame object was created
dfObj = pd.DataFrame(employees, columns=['Name', 'Age', 'City', 'Marks'])
listOfValues = [30, 'leo', 190]
#using dictionary comprehension check for given values
result = {elem: True if elem in dfObj.values else False for elem in listOfValues}
print('The values existence inside the dataframe are ')
print(result)

Output :
The values existence inside the dataframe are
{30: True, 'leo': False, 190: True}

Checking if elements exists in DataFrame using isin() function :

We can also check if a value exists inside a dataframe or not using the isin( ) function.

Syntax : DataFrame.isin(self, values)

Where,

Values : It takes the values to check for inside the dataframe.

Checking if a single element exist in Dataframe using isin() :

# Program :

import pandas as pd

#Example data
employees = [
('Jill',    16,     'Tokyo',    154),
('Rachel',  38,     'Texas',    170),
('Kirti',   39,     'New York',  88),
('Veena',   40,     'Texas',    190),
('Lucifer', 35,     'Texas',     59),
('Pablo',   30,     'New York', 123),
('Lionel',  45,     'Colombia', 189)]
# DataFrame object was created
dfObj = pd.DataFrame(employees, columns=['Name', 'Age', 'City', 'Marks'])
#Checking ofr a single element susing isin( ) function
boolDf = dfObj.isin([38])
print(boolDf)

Output :
   Name  Age   City  Marks
0  False  False  False  False
1  False   True  False  False
2  False  False  False  False
3  False  False  False  False
4  False  False  False  False
5  False  False  False  False
6  False  False  False  False

Here the isin( ) operator returned a boolean dataframe of the same number of elements, where the elements that matched our values is True and rest all are false.

We can add this to the any( ) function that only shows the true values and pass it into another any( ) function making it a series to pinpoint if our element exists or not.

# Program :

import pandas as pd

#Example data
employees = [
('Jill',    16,     'Tokyo',    154),
('Rachel',  38,     'Texas',    170),
('Kirti',   39,     'New York',  88),
('Veena',   40,     'Texas',    190),
('Lucifer', 35,     'Texas',     59),
('Pablo',   30,     'New York', 123),
('Lionel',  45,     'Colombia', 189)]
# DataFrame object was created
dfObj = pd.DataFrame(employees, columns=['Name', 'Age', 'City', 'Marks'])
# Check if the value is inisde the dataframe using both isin() and any( ) function
result = dfObj.isin(['Lionel']).any().any()
if result:
    print('The value exists inside the datframe')

Output :
Any of the Element exists in Dataframe

Checking if any of the given values exists in the Dataframe :

# Program :

import pandas as pd

#Example data
employees = [
('Jill',    16,     'Tokyo',    154),
('Rachel',  38,     'Texas',    170),
('Kirti',   39,     'New York',  88),
('Veena',   40,     'Texas',    190),
('Lucifer', 35,     'Texas',     59),
('Pablo',   30,     'New York', 123),
('Lionel',  45,     'Colombia', 189)]
# DataFrame object was created
dfObj = pd.DataFrame(employees, columns=['Name', 'Age', 'City', 'Marks'])
# Check if all the values are inisde the dataframe using both isin() and any( ) function
result = dfObj.isin([81, 'Lionel', 190,]).any().any()
if result:
    print('Any of the Element exists in Dataframe')

Output :
Any of the Element exists in Dataframe

This program only prints if any of the given values are existing inside the dataframe.

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas – Find Elements in a Dataframe

Pandas : Check if a value exists in a DataFrame using in & not in operator | isin() Read More »

Python : *args | How to pass multiple arguments to function ?

Python / By Satyabrata Jena

Passing multiple arguments to function in Python.

In this article we will discuss how we can pass multiple arguments to function using *args in Python.

Let’s say if we want to calculate average of four numbers by a function e.g avg(n1,n2,n3,n4) then we can calculate average of any four numbers by passing some numbers arguments.

# Program :

def avg(n1,n2,n3,n4):
    # function to calculate average of 4 numbers
    return (n1+n2+n3+n4)/4
average = avg(10,20,30,40)
print(average)

Output :
25.0

But what if we want to calculate average of 10 numbers then we can’t take the above function. Here, in this article we shall define a function in python which can accepts any number of arguments. So, let’s start the topic to know how we can achieve this.

Defining a function that can accept variable length arguments :

We can give any number of arguments in function by prefixing function with symbol ‘*‘.

# Program :

def calcAvg(*args):
    '''Accepts variable length arguments and calculate average of n numbers'''
    # to get the count of total arguments passed
    argNums = len(args)
    if argNums > 0 :
        sum_Nums = 0
        # to calculate average from arguments passed
        for ele in args :
            sum_Nums += ele
        return sum_Nums / argNums
        print(sum_Nums)
    else:
        return 
if __name__ == '__main__':
    avg_Num = calcAvg(10,20,30,40,50)
    print("Average is " , avg_Num)

Output :
Average is 30.0

Important points about *args :

Positioning of parameter *args :

Along with *args we can also add other parameters. But it should be make sure that *args should be after formal arguments.

Let’s see the representation of that.

# Program :

def publishError(startPar, endPar, *args):
# Accepts variable length arguments and publish error
    print(startPar)
    for el in args :
        print("Error : " , el)
    print(endPar)    
publishError("[Begin]" , "[Ends]" , "Unknown error")

Output :
[Begin]                                                             
Error :  Unknown error                                               
[Ends]

Variable length arguments can be of any type :

In *arg we can not only pass variable number of arguments, but also it can be of any data type.

# Programs :

def publishError(startPar, endPar, *args):
# Accepts variable length arguments and publish error
    print(startPar)
    for el in args :
        print("TypeError : " , el)
    print(endPar)    
publishError("[Begin]" , "[Ends]" , [10, 6.5, 8], ('Holla','Hello'), "")

Output :
[Begin]                                                             
TypeError :  [10, 6.5, 8]                                           
TypeError :  ('Holla', 'Hello')                                     
TypeError :                                                         
[Ends]

Python : *args | How to pass multiple arguments to function ? Read More »

Python

Getting the Last Modification date & time of a file. | os.stat() | os.path.getmtime() in Python.

Get last modification time of a file using os.stat( ) :

Get last modification time using os.path.getmtime() & time.localtime( ) :

Get last modification time using os.path.getmtime() & datetime.fromtimestamp() :

Get last modification time of a file in UTC Timezone :

Deque vs Vector

VECTOR :

DEQUE :

What’s the difference ?

Appropriate place to use :

How to check if a dataframe is empty in Python ?

Approach-1 : Check whether dataframe is empty using Dataframe.empty :

Approach-2 : Check if dataframe is empty using Dataframe.shape :

Approach-3 : Check if dataframe is empty by checking length of index :

Approach-4 : Check if dataframe is empty by using len on Datafarme :

Python numpy.insert()

Syntax:

Parameters:

Return Values:

numpy.insert() function Example

Insert an element into a NumPy array at a given index position

Insert multiple elements into a NumPy array at the given index

Insert multiple elements at multiple indices in a NumPy array

Insert a row into a 2D Numpy array

Insert a column into a 2D Numpy array

Conclusion

Iterate over rows of a dataframe using DataFrame.iterrows()

DataFrame.iterrows()

Iterate over rows of a dataframe using DataFrame.itertuples()

Named Tuples without index

Named Tuples with custom names

Iterate over rows in dataframe as Dictionary

Iterate over rows in dataframe using index position and iloc

Iterate over rows in dataframe in reverse using index position and iloc

Iterate over rows in dataframe using index labels and loc[]

Update contents a dataframe While iterating row by row

Conclusion:

Method to get all the keys with maximum value in a dictionary in python

Method 1-Using max() function and d.get

max() function

Problem

Method 2-Using max function() and operator

operator module

The solution to the above problem

Method to count Nan and missing value in data frames using pandas

NaN or Missing values

Reason to count Missing values or NaN values in Dataframe

Method to count NaN or missing values

Count NaN or missing values in columns

Count NaN or missing values in Rows

Count total NaN or missing values in dataframe

How to delete a directory recursively using shutil.rmtree() in Python ?

Delete an empty directory using os.rmdir() :

Delete all files in a directory & sub-directories recursively using shutil.rmtree() :

shutil.rmtree() & ignore_errors :

Passing callbacks in shutil.rmtree() with onerror :

How to check if a value exists in a DataFrame using in and not in operator | isin() in Python ?

Check if a single element exists in DataFrame using in & not in operators :

Using in operator to check if an element exists in dataframe :

Using not in operator to check if an element doesn’t exists in dataframe :

Checking if multiple elements exists in DataFrame or not using in operator :

Checking if elements exists in DataFrame using isin() function :

Checking if a single element exist in Dataframe using isin() :

Checking if any of the given values exists in the Dataframe :

Passing multiple arguments to function in Python.

Defining a function that can accept variable length arguments :

Important points about *args :

Positioning of parameter *args :

Variable length arguments can be of any type :