Satyabrata Jena

Append/ Add an element to Numpy Array in Python (3 Ways)

Adding an element to Numpy Array in Python

In this article we are going to discuss 3 different ways to add element to the numpy array.

Let’s see one by one method.

Method-1 : By using append() method :

In numpy module of python there is a function numpy.append() which can be used to add an element. We need to pass the element as the argument of the function.

Let’s take a  example where an array is declared first and then we used the append() method to add more elements  to the array.

import numpy as np
arr = np.array([12, 24, 36])
new_Arr = np.append (arr, [48,50,64])
print(‘ array : ’,arr)
print(‘result = : ’ ,new_Arr)
Output :
array : [ 12  24  36 ]
result = : [ 12   24   36  48   50   64 ]

We can also insert a column by the use of  append() method .

Let’s take an example below where we created a 2-Darray and we have to  insert two columns at a specific place.

import numpy as np
arr1= np.array([[12, 24, 36], [48, 50, 64]])
arr2 = np.array([[7], [8]])
new_Array = np.append(arr1,arr2, axis = 1)
print(' array : ',arr1)
print(' result = : ', new_Array)
Output :
array : [ [ 12  24  36  ]  
[ 48  50  64  ] ]
result  : [  [ 12  24  36  7  ] 
[48  50  64   8  ]  ]

We can also insert a row  by the use of  append() method.

Let’s take the  example below where we created a 2-D array and we have to  inserte  a row to it.

import numpy as np 
arr= np.array([[12, 24, 36], [48, 50, 62]])
new_Array = np.append(arr, [[70 ,80 ,90 ]], axis = 0)
print('array = ', arr )
print('result = ' ,new_Array)
Output :
array = [ [ 12  24   36   ]  
[ 48  50    64   ] ]
result = [  [ 12   24   36   ] 
[ 48   50  64   ]
[70   80   90 ] ]

Method-2 : By using concatenate() method :

In numpy module of python there is a function numpy.concatenate() to join two or more arrays. To add a single element we need to encapsulate the single value in a sequence data structure like list pass it to the function.

import numpy as np
# Numpy Array of integers created
arr = np.array([1, 2, 6, 8, 7])
# Adding an element at the end of a numpy array
new_arr = np.concatenate( (arr, [20] ) )
print('array: ', new_arr)
print('result: ', arr)
Output :
array:[1, 2, 6, 8, 7]
result: [1, 2, 6, 8, 7,20]

Adding another array, see the below program

import numpy as np 
arr1 = np.array([[10, 20], [30, 40]])
arr2 = np.array([[50, 60]])
new_Array= np.concatenate((arr1,arr2 ), axis=0)
print( 'First Array : ' ,arr1 )
print( 'Second Array : ' , arr2  )
print( 'concatenated Array : ' , new_Array )
Output :
First Array :   [[ 10  20 ]
                     [30  40 ]]
Second Array :  [[ 50  60 ] ]
 Concatenated Array :     [[ 10   20 ]
                                        [ 30   40 ]
                                        [ 50    60 ]]

Method-3 : By using insert() method :

In numpy module of python there is a function numpy.insert() to add an element at the end of the numpy array.

Below program is to add an elements to the array.

import numpy as np 
arr= np.array (  [ 16, 33, 47, 59, 63 ,79 ])
# Here specified at index 1, so elemnt will eb replaced with new element
new_Array = np.insert(arr, 1, 20 )
print('The array : ', arr)
print ('result :',  new_Array)
Output :
array : [ 16  33  47  59  63  79 ]
result : [ 16  20  33  47  59  66  79 ]

np.ones() – Create 1D / 2D Numpy Array filled with ones (1’s)

Create a 1D / 2D Numpy Arrays of zeros or ones

In this article, we will learn to create array of various shapes where all the values are initialized with 0 & 1.

numpy.zeros() :

A function is provided by Python’s numpy module i.e. numpy.zeros() which will create a numpy array of given shape and where all values are initialized with 0’s.

i.e.

numpy.zeros(shape, dtype=float, order='C')

Arguments:-

  • shape : It denotes shape of the numpy array. It may be single int or sequence of int.
  • dtype : It denotes data type of elements. Default value for dtype is float64.
  • order : It denotes the order in which data is stored in a multi-dimension array. It is of two types
  1. ‘F’: Data will be stored Row major order
  2. ‘C’: Data will be stored in Column major order. Default value of order is ‘C’.

Let’s see the below 4 different types implementation with code

Creating a flattened 1D numpy array filled with all zeros :

We can create a flattened numpy array with all values as ‘0’.

import numpy as sc
# create a 1D numpy array with values as 0
numarr = sc.zeros(5)
print('Contents of the Flattened Numpy Array : ')
print(numarr)
Output :
Contents of the Flattened Numpy Array :
[0. 0. 0. 0. 0.]

Creating a 2D numpy array with 4 rows & 3 columns, filled with 0’s :

We can create a 2D numpy array by passing (4,3) as argument in numpy.zeros() which will return all values as ‘0’.

import numpy as sc
# create a 2D numpy array with values as 0 
numarr = sc.zeros((4,3))
print('Contents of the 2D Numpy Array : ')
print(numarr)
Output :
Contents of the 2D Numpy Array :
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]

We know that default value of dtype is float64, let’s try to change the dtype to int64.

import numpy as sc
# create a 2D numpy array with values as 0 which are of int data type
numarr = sc.zeros((4,3), dtype=sc.int64)
print('Contents of the 2D Numpy Array : ')
print(numarr)
Output :
Contents of the 2D Numpy Array :
[[0 0 0]
 [0 0 0]
 [0 0 0]
 [0 0 0]]

numpy.ones() :

A function is provided by Python’s numpy module i.e. numpy.ones() which will create a numpy array of given shape and where all values are initialized with 1’s.

i.e.

numpy.ones(shape, dtype=float, order='C')

Arguments:-

  • shape : It denotes shape of the numpy array. It may be single int or sequence of int.
  • dtype : It denotes data type of elements. Default value for dtype is float64.
  • order : It denotes the order in which data is stored in a multi-dimension array. It is of two types
  1. ‘F’: Data will be stored Row major order
  2. ‘C’: Data will be stored in Column major order. Default value of order is ‘C’.

Creating a flattened 1D numpy array filled with all Ones :

We can make all values as 1 in a flattened array.

import numpy as sc
# create a flattened 1D numpy array of size 5 where all values are 1
numarr = sc.ones(5)
print('Contents of the Flattened Numpy Array : ')
print(numarr)
Output :
Contents of the Flattened Numpy Array :
[ 1.  1.  1.  1.  1.]

Creating a 2D numpy array with 3 rows & 3 columns, filled with 1’s  :

We can create a 2D numpy array by passing row and column as argument in numpy.ones() where all the values are 1.

import numpy as sc
# create a 2D numpy array with 3 rows & 4 columns with all values as 1
numarr = sc.ones((3,3))
print('Contents of the 2D Numpy Array : ')
print(numarr)
print('Data Type of the contents in  given Array : ',numarr.dtype)
Output :
Contents of the 2D Numpy Array :
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]
Data Type of the contents in  given Array :  float64

We know default type of dtype is float64, let’s try to change it to int64.

import numpy as sc
# create a 2D numpy array with values as 1 which are of int data type
numarr = sc.ones((4,3), dtype=sc.int64)
print('Contents of the 2D Numpy Array : ')
print(numarr)
Output :
Contents of the 2D Numpy Array :
[[1 1 1]
 [1 1 1]
 [1 1 1]
 [1 1 1]]

 

Python: How to insert lines at the top of a file?

Inserting lines at the top of a file

Python: How to insert lines at the top of a file ?

In this article, we will learn to insert single or multiple lines at the beginning of text file in python.

Let’s see inserting a line at the top of a file :

We can’t insert a text directly anywhere in a file. To insert a line of text at the top, we can create a new file with same name as that of the original file and with same line of text at the top. We can implement this by taking a function.

How the function works?

  • First it accepts0 file path and new line that would be inserted as arguments
  • Then it creates & then open a new temporary file in write mode.
  • After creating Temporary file then add the new line that you want to insert in beginning in temporary file
  • After this open the original file in read mode and read the contents line by line
  • – For each line add/append that into the temporary file
  • After appending contents and new line to the temporary new file, delete the original file.
  • At last, rename the temporary new file as that of the original file.
import os
def prepend_singline(filename, line):
    # Code to insert new line at the beginning of a file
    # define a name to temporary file
    tempfile = filename + '.abc'
    # Open the available original file in read mode and temporary file in write mode
    with open(filename, 'r') as read_objc, open(tempfile, 'w') as write_objc:
        # Write new line to the temporary file
        write_objc.write(line + '\n')
        # Read lines from the original file and append them to the temporary file
        for line in read_obj:
            write_objc.write(line)
    # delete original file
    os.remove(filename)
    # Rename temporary file as that of the original file
    os.rename(tempfile, filename)
def main():
    # Insert new line before the first line of original file
    prepend_singline("document.txt", "This is a new line")
if __name__ == '__main__':
   main() 
Output :
This is a new line
Hi, welcome to document file
 India is my country
Content Line I
Content Line II
This is the end of document file

Insert multiple lines at the top of a file :

Let we have a list of strings and we want to add these to the beginning of a file. Rather than calling above used function and adding new line one by one to new file, we can create another function which accepts file name and list of strings as arguments.

Then we can add list of strings in the beginning of temporary new file by adding lines from list of strings and contents to new file and renaming the new file as the original one.

import os
def prepend_multlines(filename, list_asmult_lines):
    """Code to insert new list of string at beginning of a file"""
    # define a name to temporary file
    dummyfile = filename + '.abc'
    # Open the available original file in read mode and temporary file in write mode
    with open(filename, 'r') as read_objc, open(dummyfile, 'w') as write_objc:
        # Iterate over the given list of strings and write them to dummy file as lines
        for line in list_of_lines:
            write_obj.write(line + '\n')
        # Read lines one by one from the original file and append them to the temporary file
        for line in read_objc:
            write_objc.write(line)
    # delete original file
    os.remove(filename)
    # Rename temporary file as that of the original file
    os.rename(dummyfile, filename)
def main():
    list_asmult_lines = ['New Line A', 'New Line B',  'New Line C']
    # Insert string list as new line before the first line of original file
    prepend_multlines("document.txt", list_asmult_lines)
if __name__ == '__main__':
   main()    
Output :
New Line A
New Line B
New Line C
Hi, welcome to document file
 India is my country
Content Line I
Content Line II
This is the end of document file

 

 

 

 

How to check if a file or directory or link exists in Python ?

Checking if a file or directory or link exists in Python

In this article, we are going to see how to check if a file, directory or a link exists using python.

Let’s see one by one

Python – Check if a path exists :

Syntax- os.path.exists(path)

The above function will return true if the path exists and false if it does not. It can take either relative or absolute path as parameter into the function.

import os

pathString = "E:\Python"

# Check for path existence
if os.path.exists(pathString):
    print("Path Exists!")
else:
    print("Path could not be reached")
Output :
Path Exists!

However we should ensure that we have sufficient user rights available to access the path.

Python – Check if a file exists :

While accessing files inside python, if the file does not exist beforehand, we will get a FileNotFoundError. To avoid these errors we should always check if a particular file exists. For that we would have to use the following function

Syntax- os.path.isfile(path)
import os

filePathString = "E:\Python\data.csv"

# Check for file existence
if os.path.isfile(filePathString):
    #If the file exists display its contents
    print("File Exists!")
    fileHandler = open(filePathString, "r")
    fileData = fileHandler.read()
    fileHandler.close()
    print(fileData)
else:
    print("File could not be found")
Output :
File Exists!
Id,Name,Course,City,Session
21,Jill,DSA,Texas,Night
22,Rachel,DSA,Tokyo,Day
23,Kirti,ML,Paris,Day
32,Veena,DSA,New York,Night

Python – Check if a Directory exists :

To check if a given directory exists or not, we can use the following function

Syntax- os.path.isdir(path)
import os

filePathString = "E:\Python\.vscode"

# Check for directories existence
if os.path.isdir(filePathString):
    # If the directory exists display its name
    print(filePathString, "does exist")
else:
    print("Directory could not be found")
Output :
.vscode does exist

Python – Check if given path is a link :

We can use isdir( ) and isfile( ) to check if a symbolic(not broken) link exists. For that we have a dedicated function in os module

Syntax- os.path.islink(path)

We will be using path.exists and path.islink together in this function so that we can check if the link exists and if it is intact.

import os

linkPathString = "E:\Python\data.csv"

# Check for link
if os.path.exists(linkPathString) and os.path.islink(linkPathString):
    print("The link is present and not broken")
else:
    print("Link could not be found or is broken")
Output :
Link could not be found or is broken

As the path we provided is not a link, the program returns the else condition.

 

Drop last row of pandas dataframe in python (3 ways)

Dropping last row of pandas dataframe  (3 ways) in python

In this article, we will three different ways to drop/ delete of a pandas dataframe.

Use iloc to drop last row of pandas dataframe :

 Dataframe from Pandas provide an attribute iloc that selects a portion of dataframe. We can take the help of this attribute, as this attribute can select all rows except last one and then assigning these rows back to the original variable, as a result last row will be deleted.

Syntax of dataframe.iloc[]:- df.iloc[row_start:row_end , col_start, col_end]

where,

Arguments:

  • row_start: The row index from which selection is started. Default value is 0.
  • row_end: The row index at which selection should be end i.e. select till row_end-1. Default value is till the last row of the dataframe.
  • col_start: The column index from which selection is started. Default is 0.
  • col_end: The column index at which selection should be end i.e. select till end-1. Default value is till the last column of the dataframe.
import pandas as sc
# List of Tuples
players = [('Messi',35, 'Barcelona',   175) ,
            ('James',25, 'Tonga' ,   187) ,
            ('Hardik', 30, 'Mumbai',   169) ,
            ('Harsh',32, 'Mumabi',   201)]
# Creation of DataFrame object
dataf = sc.DataFrame(  players,
                    columns=['Name', 'Age', 'Team', 'Height'],
                    index = ['a', 'b', 'c', 'd'])
print("Original dataframe is : ")
print(dataf)
# Select last row except last row i.e. drops it
dataf = dataf.iloc[:-1 , :]
print("Modified Dataframe is : ")
print(dataf)
Output :
Original dataframe is :
Name  Age       Team  Height
a   Messi   35  Barcelona     175
b   James   25      Tonga     187
c  Hardik   30     Mumbai     169
d   Harsh   32     Mumabi     201
Modified Dataframe is :
Name  Age       Team  Height
a   Messi   35  Barcelona     175
b   James   25      Tonga     187
c  Hardik   30     Mumbai     169

Use drop() to remove last row of pandas dataframe :

drop() function deleted a sequence of rows. Only we have to use axis=0 & pass argument inplace=True.

import pandas as sc
# List of Tuples
players = [('Messi',35, 'Barcelona',   175) ,
            ('James',25, 'Tonga' ,   187) ,
            ('Hardik', 30, 'Mumbai',   169) ,
            ('Harsh',32, 'Mumabi',   201)]
# Creation of DataFrame object
dataf = sc.DataFrame(  players,
                    columns=['Name', 'Age', 'Team', 'Height'],
                    index = ['a', 'b', 'c', 'd'])
print("Original dataframe is : ")
print(dataf)
# Drop the last row
dataf.drop(index=dataf.index[-1], 
        axis=0, 
        inplace=True)
print("Modified Dataframe is: ")
print(dataf)
Output :
Original dataframe is :
Name  Age       Team  Height
a   Messi   35  Barcelona     175
b   James   25      Tonga     187
c  Hardik   30     Mumbai     169
d   Harsh   32     Mumabi     201
Modified Dataframe is:
Name  Age       Team  Height
a   Messi   35  Barcelona     175
b   James   25      Tonga     187
c  Hardik   30     Mumbai     169

Use head() function to drop last row of pandas dataframe :

dataframe in Python provide head(n) function which returns first ‘n’ rows of dataframe. So to the delete last row of dataframe we have to only select first (n-1) rows using head() function.

import pandas as sc
# List of Tuples
players = [('Messi',35, 'Barcelona',   175) ,
            ('James',25, 'Tonga' ,   187) ,
            ('Hardik', 30, 'Mumbai',   169) ,
            ('Harsh',32, 'Mumabi',   201)]
# Creation of DataFrame object
dataf = sc.DataFrame(  players,
                    columns=['Name', 'Age', 'Team', 'Height'],
                    index = ['a', 'b', 'c', 'd'])
print("Original dataframe is : ")
print(dataf)
# To delete last row, print first n-1 rows
dataf = dataf.head(dataf.shape[0] -1)
print("Modified Dataframe is : ")
print(dataf)
Output :
Original dataframe is :
Name  Age       Team  Height
a   Messi   35  Barcelona     175
b   James   25      Tonga     187
c  Hardik   30     Mumbai     169
d   Harsh   32     Mumabi     201
Modified Dataframe is :
Name  Age       Team  Height
a   Messi   35  Barcelona     175
b   James   25      Tonga     187
c  Hardik   30     Mumbai     169

 

How to Find and Drop duplicate columns in a DataFrame | Python Pandas

Find & Drop duplicate columns in a DataFrame | Python Pandas

In this article we will learn to find duplicate columns in a Pandas dataframe and drop them.

Pandas library contain direct APIs to find out the duplicate rows, but there is no direct APIs for duplicate columns. And hence, we have to build API for that. Initially let’s create a dataframe with duplicate columns.

import pandas as sc
# List of Tuples
players = [('Nathan', 35, 'Australia', 35, 'Australia', 35),
            ('Vishal', 24, 'India', 24, 'India', 24),
            ('Abraham', 34, 'South Africa', 34, 'South Africa', 34),
            ('Trevor', 28, 'England', 28, 'England', 28),
            ('Kumar', 42, 'SriLanka', 42, 'SriLanka', 42),
            ]
# Create a DataFrame object
PlayerObj = sc.DataFrame(players, columns=['Name', 'Age', 'Country', 'Address', 'Citizen', 'Jersey'])
print("Original Dataframe is:")
print(PlayerObj)
Output :
Original Dataframe is:
      Name  Age       Country  Address       Citizen  Jersey
0   Nathan   35     Australia       35     Australia      35
1   Vishal   24         India       24         India      24
2  Abraham   34  South Africa       34  South Africa      34
3   Trevor   28       England       28       England      28
4    Kumar   42      SriLanka       42      SriLanka      42
Original Dataframe is:
      Name  Age       Country  Address       Citizen  Jersey
0   Nathan   35     Australia       35     Australia      35
1   Vishal   24         India       24         India      24
2  Abraham  34  South Africa       34  South Africa      34
3   Trevor   28       England       28       England      28
4    Kumar   42      SriLanka       42      SriLanka      42

Find duplicate columns in a DataFrame :

To find the duplicate columns in dataframe, we will iterate over each column and search if any other columns exist of same content. If yes, that column name will be stored in duplicate column list and in the end our API will returned list of duplicate columns.

import pandas as sc
def getDuplicateColumns(df):
    '''
    Get a list of duplicate columns.
    It will iterate over all the columns and finfd the duplicate columns in dataframe
    :param df: Dataframe object
    :return: Column’s list whose contents are same
    '''
    duplicateColumnNames = set()
    # Iterate over all the columns 
    for x in range(df.shape[1]):
        # Select column at xth index of dataframe.
        col = df.iloc[:, x]
        # Iterate over all the columns from (x+1)th index till end
        for y in range(x + 1, df.shape[1]):
            # Select column at yth index of dataframe.
            otherCol = df.iloc[:, y]
            # Check if two columns x & y are equal
            if col.equals(otherCol):
                duplicateColumnNames.add(df.columns.values[y])
    return list(duplicateColumnNames)
    
def main():
# List of Tuples
    players = [('Nathan', 35, 'Australia', 35, 'Australia', 35),
            ('Vishal', 24, 'India', 24, 'India', 24),
            ('Abraham', 34, 'South Africa', 34, 'South Africa', 34),
            ('Trevor', 28, 'England', 28, 'England', 28),
            ('Kumar', 42, 'SriLanka', 42, 'SriLanka', 42),
            ]
# Creation of DataFrame object
    PlayerObj = sc.DataFrame(players, columns=['Name', 'Age', 'Country', 'Address', 'Citizen', 'Jersey'])
    print("Original Dataframe is:")
    print(PlayerObj)
# To get list of duplicate columns
    duplicateColumnNames = getDuplicateColumns(PlayerObj)
    print('Duplicate Columns are: ')
    for ele in duplicateColumnNames:
        print('Column name is : ', ele)

if __name__ == '__main__':
    main()
Output :
Original Dataframe is:
      Name  Age       Country  Address       Citizen  Jersey
0   Nathan   35     Australia       35     Australia      35
1   Vishal   24         India       24         India      24
2  Abraham   34  South Africa       34  South Africa      34
3   Trevor   28       England       28       England      28
4    Kumar   42      SriLanka       42      SriLanka      42
Duplicate Columns are:
('Column name is : ', 'Citizen')
('Column name is : ', 'Jersey')
('Column name is : ', 'Address')

Drop duplicate columns in a DataFrame :

To drop/ remove the duplicate columns we will pass the list of duplicate column’s name which is returned by our API to dataframe.drop.

import pandas as sc
def getDuplicateColumns(df):
    '''
    Get a list of duplicate columns.
    It will iterate over all the columns and finfd the duplicate columns in dataframe
    :param df: Dataframe object
    :return: Column’s list whose contents are same
    '''
    duplicateColumnNames = set()
    # Iterate over all the columns 
    for x in range(df.shape[1]):
        # Select column at xth index of dataframe.
        col = df.iloc[:, x]
        # Iterate over all the columns from (x+1)th index till end
        for y in range(x + 1, df.shape[1]):
            # Select column at yth index of dataframe.
            otherCol = df.iloc[:, y]
            # Check if two columns x & y are equal
            if col.equals(otherCol):
                duplicateColumnNames.add(df.columns.values[y])
    return list(duplicateColumnNames)
    
def main():
# List of Tuples
    players = [('Nathan', 35, 'Australia', 35, 'Australia', 35),
            ('Vishal', 24, 'India', 24, 'India', 24),
            ('Abraham', 34, 'South Africa', 34, 'South Africa', 34),
            ('Trevor', 28, 'England', 28, 'England', 28),
            ('Kumar', 42, 'SriLanka', 42, 'SriLanka', 42),
            ]
# Creation of DataFrame object
    PlayerObj = sc.DataFrame(players, columns=['Name', 'Age', 'Country', 'Address', 'Citizen', 'Jersey'])
    print("Original Dataframe is:")
    print(PlayerObj)
# To get list of duplicate columns
    duplicateColumnNames = getDuplicateColumns(PlayerObj)
    print('Duplicate Columns are: ')
    for ele in duplicateColumnNames:
        print('Column name is : ', ele)
    
 # Delete duplicate columns
    print('After removing duplicate columns new data frame becomes: ')
    newDf = PlayerObj.drop(columns=getDuplicateColumns(PlayerObj))
    print("Modified Dataframe is: ", newDf)

if __name__ == '__main__':
    main()
Output :
Original Dataframe is:
Name  Age       Country  Address       Citizen  Jersey
0   Nathan   35     Australia       35     Australia      35
1   Vishal   24         India       24         India      24
2  Abraham   34  South Africa       34  South Africa      34
3   Trevor   28       England       28       England      28
4    Kumar   42      SriLanka       42      SriLanka      42
Duplicate Columns are:
Column name is :  Jersey
Column name is :  Citizen
Column name is :  Address
After removing duplicate columns new data frame becomes:
Modified Dataframe is:        Name  Age       Country
0   Nathan   35     Australia
1   Vishal   24         India
2  Abraham   34  South Africa
3   Trevor   28       England
4    Kumar   42      SriLanka

 

How to convert Dataframe column type from string to date time

Converting Dataframe column type from string to date time

In this article we will learn to convert data type of dataframe column to from string to datetime where the data can be custom string formats or embedded in big texts. We will also learn how we can handle the error while converting data types.

A function provided by Python’s Pandas module is used to convert a given argument to datetime.

Synatx : pandas.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, utc=None, box=True, format=None, exact=True, unit=None, infer_datetime_format=False, origin='unix', cache=True)

where,

Parameters:

  • arg : Element that is to be converted to a datetime with type like int, float, string, datetime, list, 1-d array or Series.
  • errors : It is a way to handle error which can be ‘ignore’, ‘raise’, ‘coerce’. Whereas, default value is ‘raise’ (‘raise’: Raise exception in invalid parsing , ‘coerce’: Set as NaT in case of invalid parsing , ‘ignore’: Return the input if invalid parsing found)
  • format : string, default Nonedate & time string in format eg “%d/%m/%Y” etc.

 Returns:

  • It converts and return the value as date time format based on input.
  1. A series of datetime64 type will be returned, if a series of string is passed.
  2. A datetime64 object will be returned, if scalar entity is passed

Convert the Data type of a column from string to datetime64 :

Let’s create a dataframe where column ‘DOB’ has dates in string format i.e. DD/MM/YYYY’.

import pandas as sc
# List of Tuples
players = [('Jason', '31/01/1978', 'Delhi', 155) ,
            ('Johny', '26/05/1980', 'Hyderabad' , 15) ,
            ('Darren', '03/01/1992', 'Jamaica',222) ,
            ('Finch', '22/12/1994','Pune' , 12) ,
            ('Krunal', '16/08/1979', 'Mumbai' , 58) ,
            ('Ravindra', '04/06/1985', 'Chennai', 99 ),
            ('Dinesh', '23/02/1985', 'Kolkata', 10)
           ]
# Creation of DataFrame object
PlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Teams', 'Jersey'])
print(PlayerObj)
print('Datatype of players dataframe is:')
print(PlayerObj.dtypes)
Output :
       Name         DOB      Teams  Jersey
0     Jason  31/01/1978      Delhi     155
1     Johny  26/05/1980  Hyderabad      15
2    Darren  03/01/1992    Jamaica     222
3     Finch  22/12/1994       Pune      12
4    Krunal  16/08/1979     Mumbai      58
5  Ravindra  04/06/1985    Chennai      99
6    Dinesh  23/02/1985    Kolkata      10
Datatype of players dataframe is:
Name      object
DOB       object
Teams     object
Jersey     int64
dtype: object

Now let’s try to convert data type of column ‘DOB’ to datetime64.

import pandas as sc
# List of Tuples
players = [('Jason', '31/01/1978', 'Delhi', 155) ,
            ('Johny', '26/05/1980', 'Hyderabad' , 15) ,
            ('Darren', '03/01/1992', 'Jamaica',222) ,
            ('Finch', '22/12/1994','Pune' , 12) ,
            ('Krunal', '16/08/1979', 'Mumbai' , 58) ,
            ('Ravindra', '04/06/1985', 'Chennai', 99 ),
            ('Dinesh', '23/02/1985', 'Kolkata', 10)
           ]
# Creation of DataFrame object
PlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Teams', 'Jersey'])
print(PlayerObj)
print('Datatype of players dataframe is:')
# Convert the column 'DOB' to datetime64 data type
PlayerObj['DOB'] = sc.to_datetime(PlayerObj['DOB'])
print(PlayerObj.dtypes)

Output :

Name         DOB      Teams  Jersey
0     Jason  31/01/1978      Delhi     155
1     Johny  26/05/1980  Hyderabad      15
2    Darren  03/01/1992    Jamaica     222
3     Finch  22/12/1994       Pune      12
4    Krunal  16/08/1979     Mumbai      58
5  Ravindra  04/06/1985    Chennai      99
6    Dinesh  23/02/1985    Kolkata      10
Datatype of players dataframe is:
Name              object
DOB       datetime64[ns]
Teams             object
Jersey             int64
dtype: object

to_datetime() also converts the DOB strings in ISO8601 format to datetime64 type which automatically handles string types like. Henceforth, let’s try to convert data type of string to datetime64:

DD-MM-YYYY HH:MM AM/PM’

‘YYYY-MM-DDTHH:MM:SS’

‘YYYY-MM-DDT HH:MM:SS.ssssss’, etc.

import pandas as sc
# List of Tuples
players = [('Jason', '31/01/1978 12:00 AM', 'Delhi', 155) ,
            ('Johny', '26/05/1980 02:00:55', 'Hyderabad' , 15) ,
            ('Darren', '03/01/1992', 'Jamaica',222) ,
            ('Finch', '22/12/1994 T23:11:25Z','Pune' , 12)
           ]
# Creation of DataFrame object
PlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Teams', 'Jersey'])
print(PlayerObj)
print('Datatype of players dataframe is:')
# Convert the column 'DOB' to datetime64 datatype
PlayerObj['DOB'] = sc.to_datetime(PlayerObj['DOB'])
print(PlayerObj.dtypes)
Output :
Name                    DOB      Teams  Jersey
0   Jason    31/01/1978 12:00 AM      Delhi     155
1   Johny    26/05/1980 02:00:55  Hyderabad      15
2  Darren             03/01/1992    Jamaica     222
3   Finch  22/12/1994 T23:11:25Z       Pune      12
Datatype of players dataframe is:
Name              object
DOB       datetime64[ns]
Teams             object
Jersey             int64
dtype: object

Convert the Data type of a column from custom format string to datetime64 :

We can also have case where the dataframe have columns having dates in custom format like DDMMYYYY, DD–MM–YY and then try to convert string format of custom format to datetime64.

import pandas as sc
# List of Tuples
players = [('Jason', '08091986', 'Delhi', 155),
            ('Johny', '11101988', 'Hyderabad', 15)
            ]
# Creation of DataFrame object
PlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Teams', 'Jersey'])
print(PlayerObj)
print('Datatype of players dataframe is:')
# Convert the column 'DOB' to datetime64 datatype
PlayerObj['DOB'] = sc.to_datetime(PlayerObj['DOB'], format='%m%d%Y')
print(PlayerObj.dtypes)
Output :
Name       DOB      Teams  Jersey
0  Jason  08091986      Delhi     155
1  Johny  11101988  Hyderabad      15
Datatype of players dataframe is:
Name              object
DOB       datetime64[ns]
Teams             object
Jersey             int64
dtype: object

Convert the Data type of a column from string to datetime by extracting date & time strings from big string :

 There may be a case where columns may contain: date of birth is 28101982 OR 17071990 is DOB. We have to pass the argument in pd.to_dataframe(), if passed as False it will try to match the format  anywhere in string. After that let’s convert data type of column DOB as string to datatime64.

import pandas as sc
# List of Tuples
players = [('Jason', 'date of birth is 08091986', 'Delhi', 155),
            ('Johny', '11101988 is DOB', 'Hyderabad', 15)
            ]
# Creation of DataFrame object
PlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Teams', 'Jersey'])
print('Datatype of players dataframe is:')
# Convert the column 'DOB' to datetime64 data type
PlayerObj['DOB'] = sc.to_datetime(PlayerObj['DOB'], format='%m%d%Y', exact=False)
print(PlayerObj)
print(PlayerObj.dtypes)
Output :
Datatype of players dataframe is:
    Name        DOB      Teams  Jersey
0  Jason 1986-08-09      Delhi     155
1  Johny 1988-11-10  Hyderabad      15
Name              object
DOB       datetime64[ns]
Teams             object
Jersey             int64
dtype: object

Another Example : Extract date & time from big string in a column and add new columns of datetime64 format :

import pandas as sc

# List of Tuples

players = [('Jason', '12:00 AM on the date 08091986', 'Delhi', 155),

            ('Johny', '11101988 and evening 07:00 PM', 'Hyderabad', 15)

            ]

# Creation of DataFrame object

PlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Teams', 'Jersey'])

print('Datatype of players dataframe is:')

# Convert the column 'DOB' to datetime64 data type

PlayerObj['DOB_time'] = sc.to_datetime(PlayerObj['DOB'], format='%H:%M %p', exact=False)

PlayerObj['DOB_date'] = sc.to_datetime(PlayerObj['DOB'], format='%m%d%Y', exact=False)

print('New dataframe is:')

print(PlayerObj)
Output :
Datatype of players dataframe is:
New dataframe is:
Name DOB ... DOB_time DOB_date
0 Jason 12:00 AM on the date 08091986 ... 1900-01-01 12:00:00 1986-08-09
1 Johny 11101988 and evening 07:00 PM ... 1900-01-01 07:00:00 1988-11-10

In DOB_time column as we provided time only so it took date as default i.e. 1900-01-01, whereas DOB_date contains the date onle. But both the columns i.e. DOB_time & DOB_date have same data type i.e. datetime64.

Handle error while Converting the Data type of a column from string to datetime :

To handle the errors while converting data type of column we can pass error arguments like ‘raise’, ‘coerce’, ‘ignore’ to customize the behavior.

import pandas as sc
# List of Tuples
players = [('Jason', '08091986', 'Delhi', 155),
            ('Johny', '11101988', 'Hyderabad', 15)
            ]

# Creation of DataFrame object
PlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Team', 'Jersey'])
print("Contents of the original Dataframe : ")
print(PlayerObj)
print('Data types of columns in original dataframe')
print(PlayerObj.dtypes)
# Ignores errors while converting the type
PlayerObj['DOB'] = sc.to_datetime(PlayerObj['DOB'], errors='ignore')
print("Contents of the Dataframe : ")
print(PlayerObj)
print('Data types of columns in modified dataframe')
print(PlayerObj.dtypes)
Output :
Contents of the original Dataframe : 
Name DOB Team Jersey
0 Jason 08091986 Delhi 155
1 Johny 11101988 Hyderabad 15
Data types of columns in original dataframe
Name object
DOB object
Team object
Jersey int64
dtype: object
Contents of the Dataframe : 
Name DOB Team Jersey
0 Jason 08091986 Delhi 155
1 Johny 11101988 Hyderabad 15
Data types of columns in modified dataframe
Name object
DOB object
Team object
Jersey int64
dtype: object

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas – Modify a Dataframe

What is a Structured Numpy Array and how to create and sort it in Python?

Structured Numpy Array and how to create and sort it in Python

In this article we will learn what is structured numpy array, how to create it and how to sort with different functions.

What is a Structured Numpy Array ?

A Structured Numpy array is an array of structures where we can also make of homogeneous structures too.

Creating a Structured Numpy Array

To create structured numpy array we will pass list of tuples with elements in dtype parameter and we will create numpy array based on this stype.

import numpy as sc

# Creation of structure

dtype = [('Name', (sc.str_, 10)), ('CGPA', sc.float64), ('Age', sc.int32)]

# Creation a Strucured Numpy array

structured_arr = sc.array([('Ben',8.8 , 18), ('Rani', 9.4, 15), ('Tanmay', 9.8, 17), ('Saswat', 7.6, 16)], dtype=dtype)

print(structured_arr.dtype)
Output :
[('Name', '<U10'), ('CGPA', '<f8'), ('Age', '<i4')]

Sort the Structured Numpy array by field ‘Name’ of the structure

How to Sort a Structured Numpy Array ?

We can sort a big structured numpy array by providing a parameter ‘order’ parameter provided by numpy.sort() and numpy.ndarray.sort(). Let’s sort the structured numpy array on the basis of field ‘Name‘.

import numpy as sc

# Creation of structure

dtype = [('Name', (sc.str_, 10)), ('CGPA', sc.float64), ('Age', sc.int32)]

# Creation a Strucured Numpy array

structured_arr = sc.array([('Ben',8.8 , 18), ('Rani', 9.4, 15), ('Tanmay', 9.8, 17), ('Saswat', 7.6, 16)], dtype=dtype)

sor_arr = sc.sort(structured_arr, order='Name')

print('Sorted Array on the basis on name : ')

print(sor_arr)
Output :
Sorted Array on the basis on name :
[('Ben', 8.8, 18) ('Rani', 9.4, 15) ('Saswat', 7.6, 16)
('Tanmay', 9.8, 17)]

Sort the Structured Numpy array by field ‘Age’ of the structure

We can also sort the structured numpy array on the basis of field ‘Marks‘.

import numpy as sc

# Creation of structure

dtype = [('Name', (sc.str_, 10)), ('CGPA', sc.float64), ('Age', sc.int32)]

# Creation a Strucured Numpy array

structured_arr = sc.array([('Ben',8.8 , 18), ('Rani', 9.4, 15), ('Tanmay', 9.8, 17), ('Saswat', 7.6, 16)], dtype=dtype)

sor_arr = sc.sort(structured_arr, order='Age')

print('Sorted Array on the basis on Age : ')

print(sor_arr)
Output :
Sorted Array on the basis on Age :
[('Rani', 9.4, 15) ('Saswat', 7.6, 16) ('Tanmay', 9.8, 17)
('Ben', 8.8, 18)]

Sort the Structured Numpy array by ‘Name’ & ‘Age’ fields of the structure :

We can also sort Structured Numpy array based on multiple fields ‘Name‘ & ‘Age‘.

import numpy as sc

# Creation of structure

dtype = [('Name', (sc.str_, 10)), ('CGPA', sc.float64), ('Age', sc.int32)]

# Creation a Strucured Numpy array

structured_arr = sc.array([('Ben',8.8 , 18), ('Rani', 9.4, 15), ('Tanmay', 9.8, 17), ('Saswat', 7.6, 16)], dtype=dtype)

sor_arr = sc.sort(structured_arr, order=['Name','Age'])

print('Sorted Array on the basis on name & age : ')

print(sor_arr)
Output :
Sorted Array on the basis on name & age:
[('Ben', 8.8, 18) ('Rani', 9.4, 15) ('Saswat', 7.6, 16)
('Tanmay', 9.8, 17)]
How to get the list of all files in a zip archive

Python : How to get the list of all files in a zip archive

How to get the list of all files in a zip archive in Python

In this article we will learn about various ways to get detail about all files in a zip archive like file’s name, size etc.

How to find name of all the files in the ZIP archive using ZipFile.namelist() :

ZipFile class from zipfile module provide a member function i.e. ZipFile.namelist() to get the names of all files present it.

from zipfile import ZipFile
def main():
    # To Create a ZipFile Object and load Document.zip in it
    with ZipFile('DocumentDir.zip', 'r') as zip_Obj:
       # To get list of files names in zip
       fileslist = zip_Obj.namelist()
       # Iterate over the fileslist names in given list & print them
       for ele in fileslist:
           print(ele)
if __name__ == '__main__':
    main()
Output :
DocumentDir/doc1.csv
DocumentDir/doc2.csv
DocumentDir/test.csv

Find detail info like name, size etc of the files in a Zip file using ZipFile.infolist() :

ZipFile class from zipfile module also provide a member function i.e. ZipFile.infolist() to get the details of each entries present in zipfile.

Here a list of ZipInfo objects is returned where each object has certain information like name, permission, size etc.

from zipfile import ZipFile
def main():
        # To Create a ZipFile Object and load Document.zip in it
    with ZipFile('DocumentDir.zip', 'r') as zip_Obj:
        # Get list of ZipInfo objects
        fileslist = zip_Obj.infolist()
        # Iterate of over object’s list and also to access members of the object
        for ele in fileslist:
            print(ele.filename, ' : ', ele.file_size, ' : ', ele.date_time, ' : ', ele.compress_size)
if __name__ == '__main__':
    main()
Output :
DocumentDir/doc1.csv :  2759  :  (2021, 01, 03, 21, 00, 02)  :  2759
DocumentDir/doc2.csv :  2856  :  (2021, 01, 25, 13, 45, 58)  :  2856
DocumentDir/test.csv  :  3458  :  (2021, 02, 20, 20, 20, 41)  :  3458

Details of ZIP archive to std.out using ZipFile.printdir() :

ZipFile class from zipfile module also provide a member function i.e. ZipFile.printdir() which can print the contents of zip file as table.

from zipfile import ZipFile
def main():
    # To Create a ZipFile Object and load Document.zip in it
    with ZipFile('DocumentDir.zip', 'r') as zip_Obj:
        zip_Obj.printdir()
if __name__ == '__main__':
    main()
Output :
File Name                                             Modified                         Size
DocumentDir/doc1.csv                      2021-01-03 21:00:02         2759
DocumentDir/doc2.csv                      2021-01-25 13:45:58         2856
DocumentDir/test.csv                        2021-02-20 20:20:41         3458

 

 

Create a Thread using Class in Python

Creating a Thread using Class in Python

In this article, we will discuss how we can create thread in python by extending class from Thread class or calling a member function of the class.

Python provides a threading module to create and manage threads.

Extend Thread class to create Threads :

Let’s create a FileLoadTh class by extending Thread class provided by the threading module where it mimics functionality of a File loader and it’s run() method sleeps for sometime.

When we start thread by calling start() function, it will invoke run() method of Thread class and overriden run() method will be executed which may contain any custom implementation.

It is better to wait for other threads to finish calling join() method on FileLoaderTh class object before returning from main thread.

from threading import Thread
import time
# FileLoadTh extending Thread class
class FileLoaderTh(Thread):
   def __init__(self, fileName, encryptionType):
       # Calling Thread class's init() function
       Thread.__init__(self)
       self.fileName = fileName
       self.encryptionType = encryptionType
   # Overriding run() of Thread class
   def run(self):
       print('Loading started from file : ', self.fileName)
       print('Encryption Type : ', self.encryptionType)
       for i in range(5):
           print('Please wait loading... ')
           time.sleep(3)
       print('Loading finished from file : ', self.fileName)
def main():
   # Create an object of Thread
   th = FileLoaderTh('threadDemo.csv','DOC')
   # calling Thread class start() to start the thread
   th.start()
   for i in range(5):
       print('Inside main function-------')
       time.sleep(3)
   # Wait for thread to finish
   th.join()
if __name__ == '__main__':
   main()
Output :
Loading started from file :  threadDemo.csv
Inside main function-------
Encryption Type :  DOC
Please wait loading...
Inside main function-------
Please wait loading...
Inside main function-------
Please wait loading...
Please wait loading...
Inside main function-------
Please wait loading...
Inside main function-------
Loading finished from file :  threadDemo.csv

Create a Thread from a member function of a class :

Now let’s create a thread that executes loadcont() memeber function of FileLoadTh class. For that we can create a object and then pass the function with object to target argument of Thread class constructor.

So both main() function and loadcont() member function will run in parallel and at the end main() function will wait for other threads calling join() function on the same object.

import threading
import time
class FileLoaderTh():
   def __init__(self):
       pass
   
   def loadcont(self, fileName, encryptionType):
       print('Loading started from file : ', fileName)
       print('Encryption Type : ', encryptionType)
       for i in range(5):
           print('Loading ... ')
           time.sleep(3)
       print('Loading finished from file : ', fileName)
def main():
   # Creating object of FileLoaderTh class
   fileLoader = FileLoaderTh()
   # Create a thread using member function of FileHolderTh class
   th = threading.Thread(target=fileLoader.loadcont, args=('threadDemo.csv','DOC', ))
   # Start a thread
   th.start()
   # Print some logs in main thread
   for i in range(5):
       print('Inside main Function')
       time.sleep(3)
   # Wait for thread to finish
   th.join()
if __name__ == '__main__':
   main()
Output :
Loading started from file :  threadDemo.csv
Encryption Type :  DOC
Inside main Function
Loading ...
Inside main Function
Loading ...
Inside main Function
Loading ...
Inside main Function
Loading ...
Loading ...
Inside main Function
Loading finished from file :  threadDemo.csv