Python

Python: Find Unique Values in a Numpy Array With Frequency and Indices

Methods to find unique values in a numpy array with frequency and indices

In this article, we will discuss how to find unique values, rows, and columns in a 1D & 2D Numpy array. Before going to the methods first we see numpy.unique() method because this method is going to be used.

numpy.unique() method

numpy.unique() method help us to get the unique() values from given array.

syntax:numpy.unique(array, return_index=False, return_inverse=False, return_counts=False, axis=None)

Parameters

  1. array-Here we have to pass our array from which we want to get unique value.
  2. return_index- If this parameter is true then it will return the array of the index of the first occurrence of each unique value. By default it is false.
  3. return_counts-If this parameter is true then it will return the array of the count of the occurrence of each unique value. By default it is false.
  4. axis- It is used in the case of nd array, not in 1d array. axis=1 means we have to do operation column-wise and axis=0 means we have to do operation row-wise.

Now we will see different methods to find unique value with their indices and frequencies in a numpy array.

case 1-When our array is 1-D

  • Method 1-Find unique value from the array

As we only need unique values and not their frequencies and indices hence we simply pass our numpy array in the unique() method because the default value of other parameters is false so we don’t need to change them. Let see this with the help of an example.

import numpy as np
arr = np.array([1, 1, 2, 3, 4, 5, 6, 7, 2, 3, 1, 4, 7])
unique_values=np.unique(arr)
print("Original array is")
print(arr)
print("------------------")
print("Unique values are")
print(unique_values)

Output

Original array is
[1 1 2 3 4 5 6 7 2 3 1 4 7]
------------------
Unique values are
[1 2 3 4 5 6 7]
  • Method 2-Find unique value from the array along with their indices

In this method, as we want to get unique values along with their indices hence we make the return_index parameter true and pass our array. Let see this with the help of an example.

import numpy as np
arr = np.array([1, 1, 2, 3, 4, 5, 6, 7, 2, 3, 1, 4, 7])
unique_values,index=np.unique(arr,return_index=True)
print("Original array is")
print(arr)
print("------------------")
print("Unique values are")
print(unique_values)
print("First index of unique values are:")
print(index)

Output

Original array is
[1 1 2 3 4 5 6 7 2 3 1 4 7]
------------------
Unique values are
[1 2 3 4 5 6 7]
First index of unique values are:
[0 2 3 4 5 6 7]
  • Method 3-Find unique value from the array along with their frequencies

In this method, as we want to get unique values along with their frequencies hence we make the return_counts parameter true and pass our array. Let see this with the help of an example.

import numpy as np
arr = np.array([1, 1, 2, 3, 4, 5, 6, 7, 2, 3, 1, 4, 7])
unique_values,count=np.unique(arr,return_counts=True)
print("Original array is")
print(arr)
print("------------------")
print("Unique values are")
print(unique_values)
print("Count of unique values are:")
for i in range(0,len(unique_values)):
  print("count of ",unique_values[i]," is ",count[i])

Output

Original array is
[1 1 2 3 4 5 6 7 2 3 1 4 7]
------------------
Unique values are
[1 2 3 4 5 6 7]
Count of unique values are:
count of  1  is  3
count of  2  is  2
count of  3  is  2
count of  4  is  2
count of  5  is  1
count of  6  is  1
count of  7  is  2

Case 2: When our array is 2-D

  • Method 1-Find unique value from the array

Here we simply pass our array and all the parameter remain the same. Here we don’t make any changes because we want to work on both rows and columns. Let see this with the help of an example.

import numpy as np
arr = np.array([[1, 1, 2,1] ,[ 3, 1, 2,1] , [ 6, 1, 2, 1],  [1, 1, 2, 1]])
unique_values=np.unique(arr)
print("Original array is")
print(arr)
print("------------------")
print("Unique values are")
print(unique_values)

Output

Original array is
[[1 1 2 1]
 [3 1 2 1]
 [6 1 2 1]
 [1 1 2 1]]
------------------
Unique values are
[1 2 3 6]

Method 2-Get unique rows

As here want to want to work only on rows so here we will make axis=0 and simply pass our array. Let see this with the help of an example.

import numpy as np
arr = np.array([[1, 1, 2,1] ,[ 3, 1, 2,1] , [ 6, 1, 2, 1],  [1, 1, 2, 1]])
unique_values=np.unique(arr,axis=0)
print("Original array is")
print(arr)
print("------------------")
print("Unique rows are")
print(unique_values)

Output

Original array is
[[1 1 2 1]
 [3 1 2 1]
 [6 1 2 1]
 [1 1 2 1]]
------------------
Unique rows are
[[1 1 2 1]
 [3 1 2 1]
 [6 1 2 1]]

Method 3-Get unique columns

As here want to want to work only on columns so here we will make axis=1 and simply pass our array. Let see this with the help of an example.

import numpy as np
arr = np.array([[1, 1, 2,1] ,[ 3, 1, 2,1] , [ 6, 1, 2, 1],  [1, 1, 2, 1]])
unique_values=np.unique(arr,axis=1)
print("Original array is")
print(arr)
print("------------------")
print("Unique columns are")
print(unique_values)

Output

Original array is
[[1 1 2 1]
 [3 1 2 1]
 [6 1 2 1]
 [1 1 2 1]]
------------------
Unique columns are
[[1 1 2]
 [1 3 2]
 [1 6 2]
 [1 1 2]]

so these are the methods to find unique values in a numpy array with frequency and indices.

 

Python: Find Unique Values in a Numpy Array With Frequency and Indices Read More »

np.ones() – Create 1D / 2D Numpy Array filled with ones (1’s)

Create a 1D / 2D Numpy Arrays of zeros or ones

In this article, we will learn to create array of various shapes where all the values are initialized with 0 & 1.

numpy.zeros() :

A function is provided by Python’s numpy module i.e. numpy.zeros() which will create a numpy array of given shape and where all values are initialized with 0’s.

i.e.

numpy.zeros(shape, dtype=float, order='C')

Arguments:-

  • shape : It denotes shape of the numpy array. It may be single int or sequence of int.
  • dtype : It denotes data type of elements. Default value for dtype is float64.
  • order : It denotes the order in which data is stored in a multi-dimension array. It is of two types
  1. ‘F’: Data will be stored Row major order
  2. ‘C’: Data will be stored in Column major order. Default value of order is ‘C’.

Let’s see the below 4 different types implementation with code

Creating a flattened 1D numpy array filled with all zeros :

We can create a flattened numpy array with all values as ‘0’.

import numpy as sc
# create a 1D numpy array with values as 0
numarr = sc.zeros(5)
print('Contents of the Flattened Numpy Array : ')
print(numarr)
Output :
Contents of the Flattened Numpy Array :
[0. 0. 0. 0. 0.]

Creating a 2D numpy array with 4 rows & 3 columns, filled with 0’s :

We can create a 2D numpy array by passing (4,3) as argument in numpy.zeros() which will return all values as ‘0’.

import numpy as sc
# create a 2D numpy array with values as 0 
numarr = sc.zeros((4,3))
print('Contents of the 2D Numpy Array : ')
print(numarr)
Output :
Contents of the 2D Numpy Array :
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]

We know that default value of dtype is float64, let’s try to change the dtype to int64.

import numpy as sc
# create a 2D numpy array with values as 0 which are of int data type
numarr = sc.zeros((4,3), dtype=sc.int64)
print('Contents of the 2D Numpy Array : ')
print(numarr)
Output :
Contents of the 2D Numpy Array :
[[0 0 0]
 [0 0 0]
 [0 0 0]
 [0 0 0]]

numpy.ones() :

A function is provided by Python’s numpy module i.e. numpy.ones() which will create a numpy array of given shape and where all values are initialized with 1’s.

i.e.

numpy.ones(shape, dtype=float, order='C')

Arguments:-

  • shape : It denotes shape of the numpy array. It may be single int or sequence of int.
  • dtype : It denotes data type of elements. Default value for dtype is float64.
  • order : It denotes the order in which data is stored in a multi-dimension array. It is of two types
  1. ‘F’: Data will be stored Row major order
  2. ‘C’: Data will be stored in Column major order. Default value of order is ‘C’.

Creating a flattened 1D numpy array filled with all Ones :

We can make all values as 1 in a flattened array.

import numpy as sc
# create a flattened 1D numpy array of size 5 where all values are 1
numarr = sc.ones(5)
print('Contents of the Flattened Numpy Array : ')
print(numarr)
Output :
Contents of the Flattened Numpy Array :
[ 1.  1.  1.  1.  1.]

Creating a 2D numpy array with 3 rows & 3 columns, filled with 1’s  :

We can create a 2D numpy array by passing row and column as argument in numpy.ones() where all the values are 1.

import numpy as sc
# create a 2D numpy array with 3 rows & 4 columns with all values as 1
numarr = sc.ones((3,3))
print('Contents of the 2D Numpy Array : ')
print(numarr)
print('Data Type of the contents in  given Array : ',numarr.dtype)
Output :
Contents of the 2D Numpy Array :
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]
Data Type of the contents in  given Array :  float64

We know default type of dtype is float64, let’s try to change it to int64.

import numpy as sc
# create a 2D numpy array with values as 1 which are of int data type
numarr = sc.ones((4,3), dtype=sc.int64)
print('Contents of the 2D Numpy Array : ')
print(numarr)
Output :
Contents of the 2D Numpy Array :
[[1 1 1]
 [1 1 1]
 [1 1 1]
 [1 1 1]]

 

np.ones() – Create 1D / 2D Numpy Array filled with ones (1’s) Read More »

Python: How to insert lines at the top of a file?

Inserting lines at the top of a file

Python: How to insert lines at the top of a file ?

In this article, we will learn to insert single or multiple lines at the beginning of text file in python.

Let’s see inserting a line at the top of a file :

We can’t insert a text directly anywhere in a file. To insert a line of text at the top, we can create a new file with same name as that of the original file and with same line of text at the top. We can implement this by taking a function.

How the function works?

  • First it accepts0 file path and new line that would be inserted as arguments
  • Then it creates & then open a new temporary file in write mode.
  • After creating Temporary file then add the new line that you want to insert in beginning in temporary file
  • After this open the original file in read mode and read the contents line by line
  • – For each line add/append that into the temporary file
  • After appending contents and new line to the temporary new file, delete the original file.
  • At last, rename the temporary new file as that of the original file.
import os
def prepend_singline(filename, line):
    # Code to insert new line at the beginning of a file
    # define a name to temporary file
    tempfile = filename + '.abc'
    # Open the available original file in read mode and temporary file in write mode
    with open(filename, 'r') as read_objc, open(tempfile, 'w') as write_objc:
        # Write new line to the temporary file
        write_objc.write(line + '\n')
        # Read lines from the original file and append them to the temporary file
        for line in read_obj:
            write_objc.write(line)
    # delete original file
    os.remove(filename)
    # Rename temporary file as that of the original file
    os.rename(tempfile, filename)
def main():
    # Insert new line before the first line of original file
    prepend_singline("document.txt", "This is a new line")
if __name__ == '__main__':
   main() 
Output :
This is a new line
Hi, welcome to document file
 India is my country
Content Line I
Content Line II
This is the end of document file

Insert multiple lines at the top of a file :

Let we have a list of strings and we want to add these to the beginning of a file. Rather than calling above used function and adding new line one by one to new file, we can create another function which accepts file name and list of strings as arguments.

Then we can add list of strings in the beginning of temporary new file by adding lines from list of strings and contents to new file and renaming the new file as the original one.

import os
def prepend_multlines(filename, list_asmult_lines):
    """Code to insert new list of string at beginning of a file"""
    # define a name to temporary file
    dummyfile = filename + '.abc'
    # Open the available original file in read mode and temporary file in write mode
    with open(filename, 'r') as read_objc, open(dummyfile, 'w') as write_objc:
        # Iterate over the given list of strings and write them to dummy file as lines
        for line in list_of_lines:
            write_obj.write(line + '\n')
        # Read lines one by one from the original file and append them to the temporary file
        for line in read_objc:
            write_objc.write(line)
    # delete original file
    os.remove(filename)
    # Rename temporary file as that of the original file
    os.rename(dummyfile, filename)
def main():
    list_asmult_lines = ['New Line A', 'New Line B',  'New Line C']
    # Insert string list as new line before the first line of original file
    prepend_multlines("document.txt", list_asmult_lines)
if __name__ == '__main__':
   main()    
Output :
New Line A
New Line B
New Line C
Hi, welcome to document file
 India is my country
Content Line I
Content Line II
This is the end of document file

 

 

 

 

Python: How to insert lines at the top of a file? Read More »

How to check if a file or directory or link exists in Python ?

Checking if a file or directory or link exists in Python

In this article, we are going to see how to check if a file, directory or a link exists using python.

Let’s see one by one

Python – Check if a path exists :

Syntax- os.path.exists(path)

The above function will return true if the path exists and false if it does not. It can take either relative or absolute path as parameter into the function.

import os

pathString = "E:\Python"

# Check for path existence
if os.path.exists(pathString):
    print("Path Exists!")
else:
    print("Path could not be reached")
Output :
Path Exists!

However we should ensure that we have sufficient user rights available to access the path.

Python – Check if a file exists :

While accessing files inside python, if the file does not exist beforehand, we will get a FileNotFoundError. To avoid these errors we should always check if a particular file exists. For that we would have to use the following function

Syntax- os.path.isfile(path)
import os

filePathString = "E:\Python\data.csv"

# Check for file existence
if os.path.isfile(filePathString):
    #If the file exists display its contents
    print("File Exists!")
    fileHandler = open(filePathString, "r")
    fileData = fileHandler.read()
    fileHandler.close()
    print(fileData)
else:
    print("File could not be found")
Output :
File Exists!
Id,Name,Course,City,Session
21,Jill,DSA,Texas,Night
22,Rachel,DSA,Tokyo,Day
23,Kirti,ML,Paris,Day
32,Veena,DSA,New York,Night

Python – Check if a Directory exists :

To check if a given directory exists or not, we can use the following function

Syntax- os.path.isdir(path)
import os

filePathString = "E:\Python\.vscode"

# Check for directories existence
if os.path.isdir(filePathString):
    # If the directory exists display its name
    print(filePathString, "does exist")
else:
    print("Directory could not be found")
Output :
.vscode does exist

Python – Check if given path is a link :

We can use isdir( ) and isfile( ) to check if a symbolic(not broken) link exists. For that we have a dedicated function in os module

Syntax- os.path.islink(path)

We will be using path.exists and path.islink together in this function so that we can check if the link exists and if it is intact.

import os

linkPathString = "E:\Python\data.csv"

# Check for link
if os.path.exists(linkPathString) and os.path.islink(linkPathString):
    print("The link is present and not broken")
else:
    print("Link could not be found or is broken")
Output :
Link could not be found or is broken

As the path we provided is not a link, the program returns the else condition.

 

How to check if a file or directory or link exists in Python ? Read More »

Drop last row of pandas dataframe in python (3 ways)

Dropping last row of pandas dataframe  (3 ways) in python

In this article, we will three different ways to drop/ delete of a pandas dataframe.

Use iloc to drop last row of pandas dataframe :

 Dataframe from Pandas provide an attribute iloc that selects a portion of dataframe. We can take the help of this attribute, as this attribute can select all rows except last one and then assigning these rows back to the original variable, as a result last row will be deleted.

Syntax of dataframe.iloc[]:- df.iloc[row_start:row_end , col_start, col_end]

where,

Arguments:

  • row_start: The row index from which selection is started. Default value is 0.
  • row_end: The row index at which selection should be end i.e. select till row_end-1. Default value is till the last row of the dataframe.
  • col_start: The column index from which selection is started. Default is 0.
  • col_end: The column index at which selection should be end i.e. select till end-1. Default value is till the last column of the dataframe.
import pandas as sc
# List of Tuples
players = [('Messi',35, 'Barcelona',   175) ,
            ('James',25, 'Tonga' ,   187) ,
            ('Hardik', 30, 'Mumbai',   169) ,
            ('Harsh',32, 'Mumabi',   201)]
# Creation of DataFrame object
dataf = sc.DataFrame(  players,
                    columns=['Name', 'Age', 'Team', 'Height'],
                    index = ['a', 'b', 'c', 'd'])
print("Original dataframe is : ")
print(dataf)
# Select last row except last row i.e. drops it
dataf = dataf.iloc[:-1 , :]
print("Modified Dataframe is : ")
print(dataf)
Output :
Original dataframe is :
Name  Age       Team  Height
a   Messi   35  Barcelona     175
b   James   25      Tonga     187
c  Hardik   30     Mumbai     169
d   Harsh   32     Mumabi     201
Modified Dataframe is :
Name  Age       Team  Height
a   Messi   35  Barcelona     175
b   James   25      Tonga     187
c  Hardik   30     Mumbai     169

Use drop() to remove last row of pandas dataframe :

drop() function deleted a sequence of rows. Only we have to use axis=0 & pass argument inplace=True.

import pandas as sc
# List of Tuples
players = [('Messi',35, 'Barcelona',   175) ,
            ('James',25, 'Tonga' ,   187) ,
            ('Hardik', 30, 'Mumbai',   169) ,
            ('Harsh',32, 'Mumabi',   201)]
# Creation of DataFrame object
dataf = sc.DataFrame(  players,
                    columns=['Name', 'Age', 'Team', 'Height'],
                    index = ['a', 'b', 'c', 'd'])
print("Original dataframe is : ")
print(dataf)
# Drop the last row
dataf.drop(index=dataf.index[-1], 
        axis=0, 
        inplace=True)
print("Modified Dataframe is: ")
print(dataf)
Output :
Original dataframe is :
Name  Age       Team  Height
a   Messi   35  Barcelona     175
b   James   25      Tonga     187
c  Hardik   30     Mumbai     169
d   Harsh   32     Mumabi     201
Modified Dataframe is:
Name  Age       Team  Height
a   Messi   35  Barcelona     175
b   James   25      Tonga     187
c  Hardik   30     Mumbai     169

Use head() function to drop last row of pandas dataframe :

dataframe in Python provide head(n) function which returns first ‘n’ rows of dataframe. So to the delete last row of dataframe we have to only select first (n-1) rows using head() function.

import pandas as sc
# List of Tuples
players = [('Messi',35, 'Barcelona',   175) ,
            ('James',25, 'Tonga' ,   187) ,
            ('Hardik', 30, 'Mumbai',   169) ,
            ('Harsh',32, 'Mumabi',   201)]
# Creation of DataFrame object
dataf = sc.DataFrame(  players,
                    columns=['Name', 'Age', 'Team', 'Height'],
                    index = ['a', 'b', 'c', 'd'])
print("Original dataframe is : ")
print(dataf)
# To delete last row, print first n-1 rows
dataf = dataf.head(dataf.shape[0] -1)
print("Modified Dataframe is : ")
print(dataf)
Output :
Original dataframe is :
Name  Age       Team  Height
a   Messi   35  Barcelona     175
b   James   25      Tonga     187
c  Hardik   30     Mumbai     169
d   Harsh   32     Mumabi     201
Modified Dataframe is :
Name  Age       Team  Height
a   Messi   35  Barcelona     175
b   James   25      Tonga     187
c  Hardik   30     Mumbai     169

 

Drop last row of pandas dataframe in python (3 ways) Read More »

How to Find and Drop duplicate columns in a DataFrame | Python Pandas

Find & Drop duplicate columns in a DataFrame | Python Pandas

In this article we will learn to find duplicate columns in a Pandas dataframe and drop them.

Pandas library contain direct APIs to find out the duplicate rows, but there is no direct APIs for duplicate columns. And hence, we have to build API for that. Initially let’s create a dataframe with duplicate columns.

import pandas as sc
# List of Tuples
players = [('Nathan', 35, 'Australia', 35, 'Australia', 35),
            ('Vishal', 24, 'India', 24, 'India', 24),
            ('Abraham', 34, 'South Africa', 34, 'South Africa', 34),
            ('Trevor', 28, 'England', 28, 'England', 28),
            ('Kumar', 42, 'SriLanka', 42, 'SriLanka', 42),
            ]
# Create a DataFrame object
PlayerObj = sc.DataFrame(players, columns=['Name', 'Age', 'Country', 'Address', 'Citizen', 'Jersey'])
print("Original Dataframe is:")
print(PlayerObj)
Output :
Original Dataframe is:
      Name  Age       Country  Address       Citizen  Jersey
0   Nathan   35     Australia       35     Australia      35
1   Vishal   24         India       24         India      24
2  Abraham   34  South Africa       34  South Africa      34
3   Trevor   28       England       28       England      28
4    Kumar   42      SriLanka       42      SriLanka      42
Original Dataframe is:
      Name  Age       Country  Address       Citizen  Jersey
0   Nathan   35     Australia       35     Australia      35
1   Vishal   24         India       24         India      24
2  Abraham  34  South Africa       34  South Africa      34
3   Trevor   28       England       28       England      28
4    Kumar   42      SriLanka       42      SriLanka      42

Find duplicate columns in a DataFrame :

To find the duplicate columns in dataframe, we will iterate over each column and search if any other columns exist of same content. If yes, that column name will be stored in duplicate column list and in the end our API will returned list of duplicate columns.

import pandas as sc
def getDuplicateColumns(df):
    '''
    Get a list of duplicate columns.
    It will iterate over all the columns and finfd the duplicate columns in dataframe
    :param df: Dataframe object
    :return: Column’s list whose contents are same
    '''
    duplicateColumnNames = set()
    # Iterate over all the columns 
    for x in range(df.shape[1]):
        # Select column at xth index of dataframe.
        col = df.iloc[:, x]
        # Iterate over all the columns from (x+1)th index till end
        for y in range(x + 1, df.shape[1]):
            # Select column at yth index of dataframe.
            otherCol = df.iloc[:, y]
            # Check if two columns x & y are equal
            if col.equals(otherCol):
                duplicateColumnNames.add(df.columns.values[y])
    return list(duplicateColumnNames)
    
def main():
# List of Tuples
    players = [('Nathan', 35, 'Australia', 35, 'Australia', 35),
            ('Vishal', 24, 'India', 24, 'India', 24),
            ('Abraham', 34, 'South Africa', 34, 'South Africa', 34),
            ('Trevor', 28, 'England', 28, 'England', 28),
            ('Kumar', 42, 'SriLanka', 42, 'SriLanka', 42),
            ]
# Creation of DataFrame object
    PlayerObj = sc.DataFrame(players, columns=['Name', 'Age', 'Country', 'Address', 'Citizen', 'Jersey'])
    print("Original Dataframe is:")
    print(PlayerObj)
# To get list of duplicate columns
    duplicateColumnNames = getDuplicateColumns(PlayerObj)
    print('Duplicate Columns are: ')
    for ele in duplicateColumnNames:
        print('Column name is : ', ele)

if __name__ == '__main__':
    main()
Output :
Original Dataframe is:
      Name  Age       Country  Address       Citizen  Jersey
0   Nathan   35     Australia       35     Australia      35
1   Vishal   24         India       24         India      24
2  Abraham   34  South Africa       34  South Africa      34
3   Trevor   28       England       28       England      28
4    Kumar   42      SriLanka       42      SriLanka      42
Duplicate Columns are:
('Column name is : ', 'Citizen')
('Column name is : ', 'Jersey')
('Column name is : ', 'Address')

Drop duplicate columns in a DataFrame :

To drop/ remove the duplicate columns we will pass the list of duplicate column’s name which is returned by our API to dataframe.drop.

import pandas as sc
def getDuplicateColumns(df):
    '''
    Get a list of duplicate columns.
    It will iterate over all the columns and finfd the duplicate columns in dataframe
    :param df: Dataframe object
    :return: Column’s list whose contents are same
    '''
    duplicateColumnNames = set()
    # Iterate over all the columns 
    for x in range(df.shape[1]):
        # Select column at xth index of dataframe.
        col = df.iloc[:, x]
        # Iterate over all the columns from (x+1)th index till end
        for y in range(x + 1, df.shape[1]):
            # Select column at yth index of dataframe.
            otherCol = df.iloc[:, y]
            # Check if two columns x & y are equal
            if col.equals(otherCol):
                duplicateColumnNames.add(df.columns.values[y])
    return list(duplicateColumnNames)
    
def main():
# List of Tuples
    players = [('Nathan', 35, 'Australia', 35, 'Australia', 35),
            ('Vishal', 24, 'India', 24, 'India', 24),
            ('Abraham', 34, 'South Africa', 34, 'South Africa', 34),
            ('Trevor', 28, 'England', 28, 'England', 28),
            ('Kumar', 42, 'SriLanka', 42, 'SriLanka', 42),
            ]
# Creation of DataFrame object
    PlayerObj = sc.DataFrame(players, columns=['Name', 'Age', 'Country', 'Address', 'Citizen', 'Jersey'])
    print("Original Dataframe is:")
    print(PlayerObj)
# To get list of duplicate columns
    duplicateColumnNames = getDuplicateColumns(PlayerObj)
    print('Duplicate Columns are: ')
    for ele in duplicateColumnNames:
        print('Column name is : ', ele)
    
 # Delete duplicate columns
    print('After removing duplicate columns new data frame becomes: ')
    newDf = PlayerObj.drop(columns=getDuplicateColumns(PlayerObj))
    print("Modified Dataframe is: ", newDf)

if __name__ == '__main__':
    main()
Output :
Original Dataframe is:
Name  Age       Country  Address       Citizen  Jersey
0   Nathan   35     Australia       35     Australia      35
1   Vishal   24         India       24         India      24
2  Abraham   34  South Africa       34  South Africa      34
3   Trevor   28       England       28       England      28
4    Kumar   42      SriLanka       42      SriLanka      42
Duplicate Columns are:
Column name is :  Jersey
Column name is :  Citizen
Column name is :  Address
After removing duplicate columns new data frame becomes:
Modified Dataframe is:        Name  Age       Country
0   Nathan   35     Australia
1   Vishal   24         India
2  Abraham   34  South Africa
3   Trevor   28       England
4    Kumar   42      SriLanka

 

How to Find and Drop duplicate columns in a DataFrame | Python Pandas Read More »

How to convert Dataframe column type from string to date time

Converting Dataframe column type from string to date time

In this article we will learn to convert data type of dataframe column to from string to datetime where the data can be custom string formats or embedded in big texts. We will also learn how we can handle the error while converting data types.

A function provided by Python’s Pandas module is used to convert a given argument to datetime.

Synatx : pandas.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, utc=None, box=True, format=None, exact=True, unit=None, infer_datetime_format=False, origin='unix', cache=True)

where,

Parameters:

  • arg : Element that is to be converted to a datetime with type like int, float, string, datetime, list, 1-d array or Series.
  • errors : It is a way to handle error which can be ‘ignore’, ‘raise’, ‘coerce’. Whereas, default value is ‘raise’ (‘raise’: Raise exception in invalid parsing , ‘coerce’: Set as NaT in case of invalid parsing , ‘ignore’: Return the input if invalid parsing found)
  • format : string, default Nonedate & time string in format eg “%d/%m/%Y” etc.

 Returns:

  • It converts and return the value as date time format based on input.
  1. A series of datetime64 type will be returned, if a series of string is passed.
  2. A datetime64 object will be returned, if scalar entity is passed

Convert the Data type of a column from string to datetime64 :

Let’s create a dataframe where column ‘DOB’ has dates in string format i.e. DD/MM/YYYY’.

import pandas as sc
# List of Tuples
players = [('Jason', '31/01/1978', 'Delhi', 155) ,
            ('Johny', '26/05/1980', 'Hyderabad' , 15) ,
            ('Darren', '03/01/1992', 'Jamaica',222) ,
            ('Finch', '22/12/1994','Pune' , 12) ,
            ('Krunal', '16/08/1979', 'Mumbai' , 58) ,
            ('Ravindra', '04/06/1985', 'Chennai', 99 ),
            ('Dinesh', '23/02/1985', 'Kolkata', 10)
           ]
# Creation of DataFrame object
PlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Teams', 'Jersey'])
print(PlayerObj)
print('Datatype of players dataframe is:')
print(PlayerObj.dtypes)
Output :
       Name         DOB      Teams  Jersey
0     Jason  31/01/1978      Delhi     155
1     Johny  26/05/1980  Hyderabad      15
2    Darren  03/01/1992    Jamaica     222
3     Finch  22/12/1994       Pune      12
4    Krunal  16/08/1979     Mumbai      58
5  Ravindra  04/06/1985    Chennai      99
6    Dinesh  23/02/1985    Kolkata      10
Datatype of players dataframe is:
Name      object
DOB       object
Teams     object
Jersey     int64
dtype: object

Now let’s try to convert data type of column ‘DOB’ to datetime64.

import pandas as sc
# List of Tuples
players = [('Jason', '31/01/1978', 'Delhi', 155) ,
            ('Johny', '26/05/1980', 'Hyderabad' , 15) ,
            ('Darren', '03/01/1992', 'Jamaica',222) ,
            ('Finch', '22/12/1994','Pune' , 12) ,
            ('Krunal', '16/08/1979', 'Mumbai' , 58) ,
            ('Ravindra', '04/06/1985', 'Chennai', 99 ),
            ('Dinesh', '23/02/1985', 'Kolkata', 10)
           ]
# Creation of DataFrame object
PlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Teams', 'Jersey'])
print(PlayerObj)
print('Datatype of players dataframe is:')
# Convert the column 'DOB' to datetime64 data type
PlayerObj['DOB'] = sc.to_datetime(PlayerObj['DOB'])
print(PlayerObj.dtypes)

Output :

Name         DOB      Teams  Jersey
0     Jason  31/01/1978      Delhi     155
1     Johny  26/05/1980  Hyderabad      15
2    Darren  03/01/1992    Jamaica     222
3     Finch  22/12/1994       Pune      12
4    Krunal  16/08/1979     Mumbai      58
5  Ravindra  04/06/1985    Chennai      99
6    Dinesh  23/02/1985    Kolkata      10
Datatype of players dataframe is:
Name              object
DOB       datetime64[ns]
Teams             object
Jersey             int64
dtype: object

to_datetime() also converts the DOB strings in ISO8601 format to datetime64 type which automatically handles string types like. Henceforth, let’s try to convert data type of string to datetime64:

DD-MM-YYYY HH:MM AM/PM’

‘YYYY-MM-DDTHH:MM:SS’

‘YYYY-MM-DDT HH:MM:SS.ssssss’, etc.

import pandas as sc
# List of Tuples
players = [('Jason', '31/01/1978 12:00 AM', 'Delhi', 155) ,
            ('Johny', '26/05/1980 02:00:55', 'Hyderabad' , 15) ,
            ('Darren', '03/01/1992', 'Jamaica',222) ,
            ('Finch', '22/12/1994 T23:11:25Z','Pune' , 12)
           ]
# Creation of DataFrame object
PlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Teams', 'Jersey'])
print(PlayerObj)
print('Datatype of players dataframe is:')
# Convert the column 'DOB' to datetime64 datatype
PlayerObj['DOB'] = sc.to_datetime(PlayerObj['DOB'])
print(PlayerObj.dtypes)
Output :
Name                    DOB      Teams  Jersey
0   Jason    31/01/1978 12:00 AM      Delhi     155
1   Johny    26/05/1980 02:00:55  Hyderabad      15
2  Darren             03/01/1992    Jamaica     222
3   Finch  22/12/1994 T23:11:25Z       Pune      12
Datatype of players dataframe is:
Name              object
DOB       datetime64[ns]
Teams             object
Jersey             int64
dtype: object

Convert the Data type of a column from custom format string to datetime64 :

We can also have case where the dataframe have columns having dates in custom format like DDMMYYYY, DD–MM–YY and then try to convert string format of custom format to datetime64.

import pandas as sc
# List of Tuples
players = [('Jason', '08091986', 'Delhi', 155),
            ('Johny', '11101988', 'Hyderabad', 15)
            ]
# Creation of DataFrame object
PlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Teams', 'Jersey'])
print(PlayerObj)
print('Datatype of players dataframe is:')
# Convert the column 'DOB' to datetime64 datatype
PlayerObj['DOB'] = sc.to_datetime(PlayerObj['DOB'], format='%m%d%Y')
print(PlayerObj.dtypes)
Output :
Name       DOB      Teams  Jersey
0  Jason  08091986      Delhi     155
1  Johny  11101988  Hyderabad      15
Datatype of players dataframe is:
Name              object
DOB       datetime64[ns]
Teams             object
Jersey             int64
dtype: object

Convert the Data type of a column from string to datetime by extracting date & time strings from big string :

 There may be a case where columns may contain: date of birth is 28101982 OR 17071990 is DOB. We have to pass the argument in pd.to_dataframe(), if passed as False it will try to match the format  anywhere in string. After that let’s convert data type of column DOB as string to datatime64.

import pandas as sc
# List of Tuples
players = [('Jason', 'date of birth is 08091986', 'Delhi', 155),
            ('Johny', '11101988 is DOB', 'Hyderabad', 15)
            ]
# Creation of DataFrame object
PlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Teams', 'Jersey'])
print('Datatype of players dataframe is:')
# Convert the column 'DOB' to datetime64 data type
PlayerObj['DOB'] = sc.to_datetime(PlayerObj['DOB'], format='%m%d%Y', exact=False)
print(PlayerObj)
print(PlayerObj.dtypes)
Output :
Datatype of players dataframe is:
    Name        DOB      Teams  Jersey
0  Jason 1986-08-09      Delhi     155
1  Johny 1988-11-10  Hyderabad      15
Name              object
DOB       datetime64[ns]
Teams             object
Jersey             int64
dtype: object

Another Example : Extract date & time from big string in a column and add new columns of datetime64 format :

import pandas as sc

# List of Tuples

players = [('Jason', '12:00 AM on the date 08091986', 'Delhi', 155),

            ('Johny', '11101988 and evening 07:00 PM', 'Hyderabad', 15)

            ]

# Creation of DataFrame object

PlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Teams', 'Jersey'])

print('Datatype of players dataframe is:')

# Convert the column 'DOB' to datetime64 data type

PlayerObj['DOB_time'] = sc.to_datetime(PlayerObj['DOB'], format='%H:%M %p', exact=False)

PlayerObj['DOB_date'] = sc.to_datetime(PlayerObj['DOB'], format='%m%d%Y', exact=False)

print('New dataframe is:')

print(PlayerObj)
Output :
Datatype of players dataframe is:
New dataframe is:
Name DOB ... DOB_time DOB_date
0 Jason 12:00 AM on the date 08091986 ... 1900-01-01 12:00:00 1986-08-09
1 Johny 11101988 and evening 07:00 PM ... 1900-01-01 07:00:00 1988-11-10

In DOB_time column as we provided time only so it took date as default i.e. 1900-01-01, whereas DOB_date contains the date onle. But both the columns i.e. DOB_time & DOB_date have same data type i.e. datetime64.

Handle error while Converting the Data type of a column from string to datetime :

To handle the errors while converting data type of column we can pass error arguments like ‘raise’, ‘coerce’, ‘ignore’ to customize the behavior.

import pandas as sc
# List of Tuples
players = [('Jason', '08091986', 'Delhi', 155),
            ('Johny', '11101988', 'Hyderabad', 15)
            ]

# Creation of DataFrame object
PlayerObj = sc.DataFrame(players, columns=['Name', 'DOB', 'Team', 'Jersey'])
print("Contents of the original Dataframe : ")
print(PlayerObj)
print('Data types of columns in original dataframe')
print(PlayerObj.dtypes)
# Ignores errors while converting the type
PlayerObj['DOB'] = sc.to_datetime(PlayerObj['DOB'], errors='ignore')
print("Contents of the Dataframe : ")
print(PlayerObj)
print('Data types of columns in modified dataframe')
print(PlayerObj.dtypes)
Output :
Contents of the original Dataframe : 
Name DOB Team Jersey
0 Jason 08091986 Delhi 155
1 Johny 11101988 Hyderabad 15
Data types of columns in original dataframe
Name object
DOB object
Team object
Jersey int64
dtype: object
Contents of the Dataframe : 
Name DOB Team Jersey
0 Jason 08091986 Delhi 155
1 Johny 11101988 Hyderabad 15
Data types of columns in modified dataframe
Name object
DOB object
Team object
Jersey int64
dtype: object

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas – Modify a Dataframe

How to convert Dataframe column type from string to date time Read More »

What is a Structured Numpy Array and how to create and sort it in Python?

Structured Numpy Array and how to create and sort it in Python

In this article we will learn what is structured numpy array, how to create it and how to sort with different functions.

What is a Structured Numpy Array ?

A Structured Numpy array is an array of structures where we can also make of homogeneous structures too.

Creating a Structured Numpy Array

To create structured numpy array we will pass list of tuples with elements in dtype parameter and we will create numpy array based on this stype.

import numpy as sc

# Creation of structure

dtype = [('Name', (sc.str_, 10)), ('CGPA', sc.float64), ('Age', sc.int32)]

# Creation a Strucured Numpy array

structured_arr = sc.array([('Ben',8.8 , 18), ('Rani', 9.4, 15), ('Tanmay', 9.8, 17), ('Saswat', 7.6, 16)], dtype=dtype)

print(structured_arr.dtype)
Output :
[('Name', '<U10'), ('CGPA', '<f8'), ('Age', '<i4')]

Sort the Structured Numpy array by field ‘Name’ of the structure

How to Sort a Structured Numpy Array ?

We can sort a big structured numpy array by providing a parameter ‘order’ parameter provided by numpy.sort() and numpy.ndarray.sort(). Let’s sort the structured numpy array on the basis of field ‘Name‘.

import numpy as sc

# Creation of structure

dtype = [('Name', (sc.str_, 10)), ('CGPA', sc.float64), ('Age', sc.int32)]

# Creation a Strucured Numpy array

structured_arr = sc.array([('Ben',8.8 , 18), ('Rani', 9.4, 15), ('Tanmay', 9.8, 17), ('Saswat', 7.6, 16)], dtype=dtype)

sor_arr = sc.sort(structured_arr, order='Name')

print('Sorted Array on the basis on name : ')

print(sor_arr)
Output :
Sorted Array on the basis on name :
[('Ben', 8.8, 18) ('Rani', 9.4, 15) ('Saswat', 7.6, 16)
('Tanmay', 9.8, 17)]

Sort the Structured Numpy array by field ‘Age’ of the structure

We can also sort the structured numpy array on the basis of field ‘Marks‘.

import numpy as sc

# Creation of structure

dtype = [('Name', (sc.str_, 10)), ('CGPA', sc.float64), ('Age', sc.int32)]

# Creation a Strucured Numpy array

structured_arr = sc.array([('Ben',8.8 , 18), ('Rani', 9.4, 15), ('Tanmay', 9.8, 17), ('Saswat', 7.6, 16)], dtype=dtype)

sor_arr = sc.sort(structured_arr, order='Age')

print('Sorted Array on the basis on Age : ')

print(sor_arr)
Output :
Sorted Array on the basis on Age :
[('Rani', 9.4, 15) ('Saswat', 7.6, 16) ('Tanmay', 9.8, 17)
('Ben', 8.8, 18)]

Sort the Structured Numpy array by ‘Name’ & ‘Age’ fields of the structure :

We can also sort Structured Numpy array based on multiple fields ‘Name‘ & ‘Age‘.

import numpy as sc

# Creation of structure

dtype = [('Name', (sc.str_, 10)), ('CGPA', sc.float64), ('Age', sc.int32)]

# Creation a Strucured Numpy array

structured_arr = sc.array([('Ben',8.8 , 18), ('Rani', 9.4, 15), ('Tanmay', 9.8, 17), ('Saswat', 7.6, 16)], dtype=dtype)

sor_arr = sc.sort(structured_arr, order=['Name','Age'])

print('Sorted Array on the basis on name & age : ')

print(sor_arr)
Output :
Sorted Array on the basis on name & age:
[('Ben', 8.8, 18) ('Rani', 9.4, 15) ('Saswat', 7.6, 16)
('Tanmay', 9.8, 17)]

What is a Structured Numpy Array and how to create and sort it in Python? Read More »

Program to Reverse a String using a Stack Data Structure in C++ and Python

Program to Reverse a String using a Stack Data Structure in C++ and Python

Strings:

A string data type is used in most computer languages for data values that are made up of ordered sequences of characters, such as “hello world.” A string can include any visible or unseen series of characters, and characters can be repeated. The length of a string is the number of characters in it, and “hello world” has length 11 – made up of 10 letters and 1 space. The maximum length of a string is usually restricted. There is also the concept of an empty string, which includes no characters and has a length of zero.

A string can be both a constant and a variable. If it is a constant, it is commonly expressed as a string of characters surrounded by single or double quotes.

Stack:

Stacking objects means putting them on top of one another in the English language. This data structure allocates memory in the same manner.

Data structures are essential for organizing storage in computers so that humans can access and edit data efficiently. Stacks were among the first data structures to be defined in computer science. In layman’s terms, a stack is a linear accumulation of items. It is a collection of objects that provides fast last-in, first-out (LIFO) insertion and deletion semantics. It is a modern computer programming and CPU architecture array or list structure of function calls and parameters. Elements in a stack are added or withdrawn from the top of the stack in a “last in, first out” order, similar to a stack of dishes at a restaurant.

Unlike lists or arrays, the objects in the stack do not allow for random access.

Given a string the task is to reverse the given string using stack data structure in C++ and python.

Examples:

Example1:

Input:

given string ="hellothisisBTechGeeks"

Output:

Printing the given string before reversing : hellothisisBTechGeeks
Printing the given string after reversing : skeeGhceTBsisihtolleh

Example2:

Input:

given string ="skyisbluieIFC"

Output:

Printing the given string before reversing : skyisbluieIFC
Printing the given string after reversing : CFIeiulbsiyks

Example3:

Input:

given string="cirusfinklestein123"

Output:

Printing the given string before reversing : cirusfinklestein123
Printing the given string after reversing : 321nietselknifsuric

Program to Reverse a String using a Stack Data Structure in C++ and Python

Drive into Python Programming Examples and explore more instances related to python concepts so that you can become proficient in generating programs in Python Programming Language.

1)Reversing the string using stack<> function in C++

It is critical to concentrate on the algorithm in order to develop better code. This could be the initial step in selecting a problem; one must consider the optimal algorithm before moving on to the implementation and coding.

The following steps may be useful:

  • Scan the given string or given string as static input
  • The first step would be to start with an empty stack.
  • In C++, we use stack<type>, where type is the data type of the stack (like integer, character, string, etc).
  • Then, using the push function, add the characters from the string one by one to the stack, so that the last character in the string is at the top.
  • Consider a for loop that runs a certain string length number of times.
  • Using the pop() function, pop the character and replace the popped characters in the given string.
  • Print the string

Below is the implementation:

#include <bits/stdc++.h>
using namespace std;
// function which reverses the given string using stack
void revString(string& givenstr)
{
    // Taking a empty stack of character type
    stack<char> st;

    // Traversing the given string using for loop
    for (char character : givenstr) {
        // adding each element of the given string to stack
        st.push(character);
    }

    // popping all elements from the stack and
    // replacing the elements of string with the popped
    // element
    for (int i = 0; i < givenstr.length(); i++) {
        // intializing the ith character of string with the
        // top element of the stack
        givenstr[i] = st.top();
        // popping the top element from the stack using
        // pop() function
        st.pop();
    }
}

int main()
{
    // given string
    string givenstr = "hellothisisBTechGeeks";
    // printing the string before reversing elements
    cout << "Printing the given string before reversing : "
         << givenstr << endl;
    revString(givenstr);
    cout << "Printing the given string after reversing : "
         << givenstr << endl;

    return 0;
}

Output:

Printing the given string before reversing : hellothisisBTechGeeks
Printing the given string after reversing : skeeGhceTBsisihtolleh

All the characters present in the string get reversed using the above approach(including space characters).

2)Reversing the string using deque in Python

We use deque to copy all elements from the givenstring(which performs the same operation as a stack in this case)

We use the join method to join all the characters of the given string after reversing.

Below is the implementation:

from collections import deque
# given string
givenstr = "HellothisisBTechGeeks"
# Printing the given string before reversing
print("Printing the given string before reversing : ")
print(givenstr)
# creating the stack from the given string
st = deque(givenstr)

# pop all characters from the stack and join them back into a string
givenstr = ''.join(st.pop() for _ in range(len(givenstr)))

# Printing the given string after reversing
print("Printing the given string after reversing : ")
print(givenstr)

Output:

Printing the given string before reversing : 
HellothisisBTechGeeks
Printing the given string after reversing : 
skeeGhceTBsisihtolleH

Related Programs:

Program to Reverse a String using a Stack Data Structure in C++ and Python Read More »

How to get the list of all files in a zip archive

Python : How to get the list of all files in a zip archive

How to get the list of all files in a zip archive in Python

In this article we will learn about various ways to get detail about all files in a zip archive like file’s name, size etc.

How to find name of all the files in the ZIP archive using ZipFile.namelist() :

ZipFile class from zipfile module provide a member function i.e. ZipFile.namelist() to get the names of all files present it.

from zipfile import ZipFile
def main():
    # To Create a ZipFile Object and load Document.zip in it
    with ZipFile('DocumentDir.zip', 'r') as zip_Obj:
       # To get list of files names in zip
       fileslist = zip_Obj.namelist()
       # Iterate over the fileslist names in given list & print them
       for ele in fileslist:
           print(ele)
if __name__ == '__main__':
    main()
Output :
DocumentDir/doc1.csv
DocumentDir/doc2.csv
DocumentDir/test.csv

Find detail info like name, size etc of the files in a Zip file using ZipFile.infolist() :

ZipFile class from zipfile module also provide a member function i.e. ZipFile.infolist() to get the details of each entries present in zipfile.

Here a list of ZipInfo objects is returned where each object has certain information like name, permission, size etc.

from zipfile import ZipFile
def main():
        # To Create a ZipFile Object and load Document.zip in it
    with ZipFile('DocumentDir.zip', 'r') as zip_Obj:
        # Get list of ZipInfo objects
        fileslist = zip_Obj.infolist()
        # Iterate of over object’s list and also to access members of the object
        for ele in fileslist:
            print(ele.filename, ' : ', ele.file_size, ' : ', ele.date_time, ' : ', ele.compress_size)
if __name__ == '__main__':
    main()
Output :
DocumentDir/doc1.csv :  2759  :  (2021, 01, 03, 21, 00, 02)  :  2759
DocumentDir/doc2.csv :  2856  :  (2021, 01, 25, 13, 45, 58)  :  2856
DocumentDir/test.csv  :  3458  :  (2021, 02, 20, 20, 20, 41)  :  3458

Details of ZIP archive to std.out using ZipFile.printdir() :

ZipFile class from zipfile module also provide a member function i.e. ZipFile.printdir() which can print the contents of zip file as table.

from zipfile import ZipFile
def main():
    # To Create a ZipFile Object and load Document.zip in it
    with ZipFile('DocumentDir.zip', 'r') as zip_Obj:
        zip_Obj.printdir()
if __name__ == '__main__':
    main()
Output :
File Name                                             Modified                         Size
DocumentDir/doc1.csv                      2021-01-03 21:00:02         2759
DocumentDir/doc2.csv                      2021-01-25 13:45:58         2856
DocumentDir/test.csv                        2021-02-20 20:20:41         3458

 

 

Python : How to get the list of all files in a zip archive Read More »