Satyabrata Jena

Python: numpy.ravel() function Tutorial with examples

Understanding numpy.ravel() function with its examples in Python

In this article we will discuss about numpy.ravel( ) function and using it in different methods to flatten a multidimensional numpy array.

numpy.ravel( ) is a built-in function provided by Python’s numpy module.

Syntax - numpy.ravel(a, order='C')

where,

  1. a : array_like- It is a numpy array or list where elements will be read in given order
  2. order – The order in which numpy array will be read
  • C‘= Read elements in row by row manner i.e. using C-like index order
  • F‘= Read elements in row by row manner i.e. using Fortan-like index order
  • K‘= Read elements from array based on memory order of items.

Flatten a matrix or 2D array to 1D array using numpy.ravel( ) :

While converting a 2-D array to 1-D array, in numpy.ravel( ) default value of order parameter ‘C’ is passed, so elements in 2-D array is read row by row.

# program :

import numpy as sc

# To create a 2D Numpy array
twoD_array = sc.array([ [11, 2, 33],
                        [44, 5, 66],
                        [77, 8, 99]])

print('2D Numpy Array:')
print(twoD_array)

# To get a flattened view 1D of 2D Numpy array

flatn_array = sc.ravel(twoD_array)
print('Flattened 1D view:')
print(flatn_array)
Output :
2D Numpy Array:
[[11 2 33]
[44 5 66]
[77 8 99]]
Flattened 1D view:
[11 2 33 44 5 66 77 8 99]

numpy.ravel() returns a view :

It is seen that when we modify the view of the object, the changes is observed not only in flattened 1D object but also in the original 2D Numpy array.

#program :

import numpy as sc

# To create a 2D Numpy array

twoD_array = sc.array([ [11, 2, 33],
[44, 5, 66],
[77, 8, 99]])

flatn_array = sc.ravel(twoD_array)

#Modifying 5th element of the array
flatn_array[4]= 100

print('Modified 2D array:')
print(twoD_array)
print('Modified Flattened 1D view:')
print(flatn_array)
Output :
Modified 2D array:
[[ 11  2  33]
[ 44 100  66]
[ 77  8  99]]
Modified Flattened 1D view:
[ 11  2  33  44 100  66  77  8  99]

Accessing original array from the flattened view object  :

Flattened view object has an attribute base, which points to original Numpy array.     

#Program :

import numpy as sc

# To create a 2D Numpy array
twoD_array = sc.array([ [11, 2, 33],
[44, 5, 66],
[77, 8, 99]])

flatn_array = sc.ravel(twoD_array)

#Modifying 5th element of the array
flatn_array[4]= 100

print('Modified Flattened 1D view:')
print(flatn_array)
print('2D Numpy array')
# Here ndarray.base point to retrieve the original Numpy array
print(flatn_array.base)
Output :
Modified Flattened 1D view:
[ 11  2  33  44 100  66  77  8  99]
2D Numpy array
[[ 11  2  33]
[ 44 100  66]
[ 77  8  99]]

Using numpy.ravel() along different axis with order parameter :

ndarray.ravel() accepts parameter order i.e. ‘C’ or ‘F’ or ‘A’. The default value is ‘C’.

Get Flatten view of 2D array Row wise :

If there is mo parameter passed in ravel() function, then default value i.e. ‘C’ is taken. So, now the 2-D array will be read row wise.

# Program :

import numpy as sc

# To create a 2D Numpy array
twoD_array = sc.array([ [11, 2, 33],
[44, 5, 66],
[77, 8, 99]])


# getting flattened view of 2D array read row by row
flatn_array = sc.ravel(twoD_array, order='')

print('Flattened View row by row:')
print(flatn_array)
Output :
Flattened View row by row:
[11 2 33 44 5 66 77 8 99]

Get Flatten view of 2D array Column wise :

When we will pass ‘F in ravel() function, then the elements will be read column by column.

#program :

import numpy as sc

# creating a 2D Numpy array
twoD_array = sc.array([ [11, 2, 33],
[44, 5, 66],
[77, 8, 99]])

# getting flattened view of 2D array read column by column
flatn_array = sc.ravel(twoD_array, order='F')
print('Flattened View column by column:')
print(flatn_array)
Output :
Flattened View column by column:
[11 44 77 2 5 8 33 66 99]

Get Flatten view of 2D array based on memory layout :

We can also get the transpose view of 2-D Numpy array

# Program :

import numpy as sc

# To create a 2D Numpy array
twoD_array = sc.array([ [11, 2, 33],
[44, 5, 66],
[77, 8, 99]])


# To get transpose view of 2D array
transp_arr = twoD_array.T
print('Transpose View of the 2D Numpy Array')
print(transp_arr)
Output :
Transpose View of the 2D Numpy Array
[[11 44 77]
[2   5   8]
[33 66 99]]

We can also get a flattened 1-D view from transpose 2-D Numpy array

#Prtogram :

import numpy as sc

# To create a 2D Numpy array
twoD_array = sc.array([ [11, 2, 33],
[44, 5, 66],
[77, 8, 99]])


# To get transpose view of 2D array
transp_arr = twoD_array.T

# To get a flattened view of transpose 2D array
flatn_array = sc.ravel(transp_arr, order='C')

print('Flattened View of transpose 2D Numpy array:')
print(flatn_array)
Output :
Flattened View of transpose 2D Numpy array:
[11 44 77 2 5 8 33 66 99]

In the previous scenario, the original memory layout was ignored while current layout of view object was used. In this case we would try to get flattened view of the transposed Numpy array based on memory layout by using argument as ‘A’. So, in place of current layout in view, memory layout of the original array will be used.

# Program :

import numpy as sc

# creating a 2D Numpy array
twoD_array = sc.array([ [11, 2, 33],
[44, 5, 66],
[77, 8, 99]])

# getting transpose view of 2D array
transp_arr = twoD_array.T

# getting a flattened view of transpose 2D array in row-wise manner
flatn_array = sc.ravel(twoD_array, order='A')

print('Flattened View of transposed Numpy array based on memory layout :')
print(flatn_array)
Output :
Flattened View of transposed Numpy array based on memory layout :
[11 2 33 44 5 66 77 8 99]

Flatten a list of lists using numpy.ravel() :

We can also use list of lists to get flattened view instead of taking array.

# Program :

import numpy as sc

list_lists = [[1, 2, 3, 4],

[4, 5, 6, 7],

[7, 8, 9, 10],

[10, 11, 12, 13]]

# Creating a flattened numpy array from list of lists

flatn_array = sc.ravel(list_lists)

print('Flattened 1D Numpy Array:')

print(flatn_array)
Output :
Flattened 1D Numpy Array:
[ 1  2  3  4  4  5  6  7  7  8  9 10 10 11 12 13]

We can even convert flattened 1-D numpy array to a list.

# Program :

import numpy as sc

list_lists = [[1, 2, 3, 4],
[4, 5, 6, 7],
[7, 8, 9, 10],
[10, 11, 12, 13]]


flatn_array = sc.ravel(list_lists)
print('Flattened List from array:')

# Converting array to list
print(list(flatn_array))
Output :
Flattened List from array:
[1, 2, 3, 4, 4, 5, 6, 7, 7, 8, 9, 10, 10, 11, 12, 13]

 

Drop first row of pandas dataframe (3 Ways)

How to delete first row of pandas dataframe in Python ?

In this article we will discuss about different ways to delete first row of pandas dataframe in Python.

Method-1 : By using iloc attribute :

An iloc attribute is there in Pandas by using which we can select a portion of the dataframe that may be few columns or rows which is simply called as position based indexing.

Syntax - df.iloc[start_row:end_row , start_column, end_column]

Where,

  • start_row : It refers to the row position from where it will start selection.(Default value is 0)
  • end_row : It refers to the row position from where it will end selection.(Default value is upto last row of the dataframe)
  • start_column : It refers to the column position from where it will start selection.(Default value is 0)
  • end_column : It refers to the column position from where it will end selection.(Default value is upto last column of the dataframe)

So, we can select all the rows of the dataframe except the first row and assign back the selected rows to the original variable which will give an effect that the first row has been deleted from the dataframe.

To achieve this, select dataframe from row-2 and select all columns. Row-2 means we will select from index position-1 (as index position starts from 0 in dataframe) upto last row. And to select all columns use default values i.e (:)

i.e

df = df.iloc[1: , :]

So let’s see the implementation of it.

# Program :

import pandas as pd
# List of tuples created
empoyees = [('A',1,'a',10),
            ('B',2,'b',20),
            ('C',3,'c',30) ,
            ('D',4,'d',40)]
# DataFrame object created
df = pd.DataFrame(  empoyees, columns=['Upper', 'Smaller', 'Lower', 'Bigger'])
print("Contents of the original Dataframe : ")
print(df)
# Dropping first row 
df = df.iloc[1: , :]
print("Contents of modified Dataframe : ")
print(df)
Output :
Contents of the original Dataframe : 
  Upper  Smaller  Lower Bigger
0   A         1          a           10
1   B         2          b           20
2   C         3          c           30
3   D        4           d          40
Contents of modified Dataframe : 
    Upper Smaller Lower Bigger
1     B       2            b        20
2     C       3            c        30
3     D       4            d        40

Method-2 : Using drop() function :

There is an drop() function in Panda’s dataframe which can be used to delete any rows from the dataframe.  To make sure that rows only will be deleted then select axis=0 and pass argument inplace=True.

So let’s see the implementation of it.

# Program :

import pandas as pd
# List of tuples created
empoyees = [('A',1,'a',10),
            ('B',2,'b',20),
            ('C',3,'c',30) ,
            ('D',4,'d',40)]
# DataFrame object created
df = pd.DataFrame(  empoyees, columns=['Upper', 'Smaller', 'Lower', 'Bigger'])
print("Contents of the original Dataframe : ")
print(df)
# Dropping first row 
df.drop(index=df.index[0], 
        axis=0, 
        inplace=True)
        
print("Contents of modified Dataframe : ")
print(df)
Output : 
Contents of the original Dataframe : 
    Upper Smaller Lower Bigger
0     A           1         a       10
1     B           2         b       20
2     C           3         c       30
3     D           4         d      40
Contents of modified Dataframe : 
    Upper Smaller Lower Bigger
1     B         2           b       20
2     C         3           c       30
3     D        4            d       40

Method-3 : Using tail() function :

In python, dataframe provides a tail(n) function which returns last ‘n’ rows. So to select all the rows except first row we can pass tail(n-1) which means first row deleted. And it assign back the selected rows to the original variable.

So let’s see the implementation of it.

# Program :

import pandas as pd
# List of tuples created
empoyees = [('A',1,'a',10),
            ('B',2,'b',20),
            ('C',3,'c',30) ,
            ('D',4,'d',40)]
# DataFrame object created
df = pd.DataFrame(  empoyees, columns=['Upper', 'Smaller', 'Lower', 'Bigger'])
print("Contents of the original Dataframe : ")
print(df)

# Deleting first row by selecting last n-1 rows
df = df.tail(df.shape[0] -1)
        
print("Contents of modified Dataframe : ")
print(df)
Output :
Contents of the original Dataframe : 
     Upper Smaller Lower Bigger
0      A         1          a         10
1      B         2          b         20
2      C         3          c         30
3      D         4          d        40
Contents of modified Dataframe : 
    Upper Smaller Lower Bigger
1     B         2          b       20
2     C         3          c       30
3     D         4          d       40

 

Python: Dictionary get() function tutorial and examples

Dictionary get() function tutorial with examples in Python.

In this article, we are going to see how we can use the dict.get( ) function with some examples.

Syntax : dict.get(key, defaultValue)

 Where,

  1. Key : The key that is to be searched inside the dictionary.
  2. defaultValue : The default value to be returned in case we don’t find the key inside the dictionary.

The function returns the value associated with the key in case the key is found in the dictionary. Otherwise it returns the default value that we provides to it, if there was no default value provided it returns none.

Get value by key in a dictionary using dict.get() :

Let us take an example where there are 4 elements inside a dictionary. We will try to find the value associated with “B”.

#Program :

dictWords = {
    "A": 65,
    "B": 32,
    "V": 34,
    "U": 87 }
#Fucntion returns the value associated with the key ‘'B'’
dictValue = dictWords.get('B')
#Value associateed with the key 'B'
print('Value of the key "B" : ', dictValue)
Output : 
Value of the key "B" : 32

Get the value of a key that does not exist in a dictionary :

 In the previous example we were looking for the value associated with an existing key, here we will try to see what happens when the key doesn’t exist in the dictionary.

#Program :

dictWords = {
    "A": 56,
    "B": 23,
    "V": 43,
    "U": 78
}
#Fucntion returns the value associated with the non- existent key 'D'
dictValue = dictWords.get('D')
#Value associateed with the key 'D'
print('Value of the non-existent key "D" : ', dictValue)
Output :
Value of the non-existent key "D" : None

When the key “D” which was absent was passed into the function, the compiler prints none, which is the default value (None is the default value as we have not specified any other value in the place of default value).

Get the default value for the key that does not exist in a dictionary :

 Here we will again check for the same non-existent key, however this time we will provide the function with a default value.

#Program :

dictWords = {
    "A": 56,
    "B": 23,
    "V": 43,
    "U": 78
}
#Fucntion returns the value associated with the non- existent key 'D' i.e. default value
dictValue = dictWords.get('D',"NA")
#Value associateed with the key 'D'
print('Value of the non-existent key "D" : ', dictValue)
Output :
Value of the non-existent key "D" : NA

It returns the default value that we provided when the key we were searching for was non-existent.

Dict.get(key) vs dict[key]

 We can also use the [ ] to find out the value associated to a key inside a dictionary.

So, let’s see the implementation of it.

#Program :

dictWords = {
    "A": 56,
    "B": 23,
    "V": 43,
    "U": 78
}
#Fucntion returns the value associated with the key 'B' using []
dictValue = dictWords['B']
#Value associateed with the key 'B'
print('Value of the key "B" : ', dictValue)
Output :
Value of the key "B" : 23

It does the same thing when we check for an existent key. Let’s try with the non-existent one.

Output :
KeyError: 'D'

So the only difference in these two methods is that while looking for a non-existent key using the .get( ) function we get none as output or a default value that we assigned. However while using [ ] we get a KeyError in the program.

Python: Search strings in a file and get line numbers of lines containing the string

Search strings in a file and get line numbers of lines containing the string in Python.

In this article we will discuss how to search strings in a file and get all the lines and line numbers which will match or which contains the string.

Searching string in a file is easily accomplished by python. And we can easily get the line numbers of lines containing the string.

 Check if string is present in file or not  :

First take a file named “example.txt”

This is an example for sample file
Programming needs logic.
Languages are of many types.
Computer science is a study of computers & computational systems.
We can write a program in any language.
The end

Let’s see the program for it.

#Program :

# create a function to check the string is present in the file or not
def string_in_file(file_name, string_to_search):

  #Checking if any line in the metioned file contains given string or not
  # Open the file in read only mode to read content of the file
  with open(file_name, 'r') as read_obj:

  # Reading all lines in the file one by one by iteration
  for line in read_obj:
  # For each line, checking if the line contains the string or not
    if string_to_search in line:
      return True
  return False



#checking if string 'is' is found in file 'sample.txt'
if string_in_file('sample.txt','is'):
  print('string found')
else:
  print('string not found')
Output:
string found

Here, we use loop for iteration to check each line whether the string is present or not. If the line contains the string then it will return True and if the line does not contain the string then it will return False.

Search for a string in file & get all lines containing the string along with line numbers :

Suppose we have a file named “example.txt”

This is an example for sample file
Programming needs logic.
Languages are of many types.
Computer science is a study of computers & computational systems.
We can write a program in any language.
The end
#Program :

def string_in_file(file_name, string_to_search):
    #Searching for the given string in file along with its line numbers
    line_number = 0
    list_of_results = []
    # Opening the file in read only mode
    with open(file_name, 'r') as read_obj:
        # Reading all lines in the file one by one by iterating the file
        for line in read_obj:
            # checking each line, if the line contains the string
            line_number += 1
            if string_to_search in line:
                # If it contains the string, then add the line number & line as a tuple in the list
                list_of_results.append((line_number, line.rstrip()))
    # Return list of tuples containing line numbers and lines where string is found
    return list_of_results


lines = string_in_file('example.txt', 'is')
print('Total Matched lines : ', len(lines))
for i in lines:
    print('Line Number = ', i[0], ' :: Line = ', i[1])
Output:
Total Matched lines : 2
Line Number =  1  :: Line =  This is an example for sample file
Line Number =  4  :: Line =  Computer science is a study of computers & computational systems.

Here, we use loop for iteration to check each line whether the string is present or not. If the line contains the string then it will return True and if the line does not contain the string then it will return False.

We tried to print the total number of matched lines which consist of the string ‘is’. In total, there where two lines, which include the string ‘is’ and this function returned those lines along with their line numbers. Now instead of searching single string we want to search multiple string.

Search for multiple strings in a file and get lines containing string along with the line numbers :

To search for multiple string in a file, we have to create a separate function, that will open a file once and then search for the lines in the file which contains the given string. Because we cannot use the above created function because this will open and close the file for each string.

Suppose we have a file named “example.txt”

This is an example for sample file
Programming needs logic.
Languages are of many types.
Computer science is a study of computers & computational systems.
We can write a program in any language.
The end
#Program :

def strings_in_file(file_name, list_of_strings):
    #Here getting line from the file along with line numbers
    #which contains any of the matching string from the list
    line_number = 0
    list_of_results = []
    # Opening the file in read only mode
    with open(file_name, 'r') as read_obj:
        # Reading all lines in the file one by one by iteration
        for line in read_obj:
            line_number += 1
            # Checking each line, if the line contains any string from the list of strings
            for string_to_search in list_of_strings:
                if string_to_search in line:
                    # If any string is matched/found in line
                    # then we will append that line along with line number in the list
                    list_of_results.append((string_to_search, line_number, line.rstrip()))
    # Returning the list of tuples containing matched string, line numbers and lines where string is found
    return list_of_results

#Now, we will use this function

# search for given strings in the file 'sample.txt'
matched_lines = strings_in_file('sample.txt', ['is', 'what'])
print('Total Matched lines : ', len(matched_lines))
for elem in matched_lines:
    print('Word = ', elem[0], ' :: Line Number = ', elem[1], ' :: Line = ', elem[2])

Output:
Total Matched lines : 2
Word = 'is' :: Line Number =  1  :: Line =  This is an example for sample file
Word = 'is' :: Line Number =  4  :: Line =  Computer science is a study of computers & computational systems.

Here, we use loop for iteration to check each line whether the string is present or not. If the line contains the string then it will return True and if the line does not contain the string then it will return False.

Convert 2D NumPy array to list of lists in python

How to convert 2D NumPy array to list of lists in python ?

In this article we will discuss about different ways of converting NumPy array to list of lists in Python. So let’s start exploring the topic.

Converting a 2D Numpy Array to list of lists using tolist() :

In NumPy module of Python, there is a member function tolist() which returns a list congaing the elements of the array. If the array is a 2D array then it returns lists of list.

So, let’s see the implementation of it.

#Program :

import numpy as np
# 2D Numpy array created
arr = np.array([[11, 22, 33, 44],
                [55, 66, 77, 88],
                [12, 13, 23, 43]])

#printing the 2D array                
print(arr)
# Converting 2D Numpy Array to list of lists
list_of_lists = arr.tolist()
#Printing the list of lists
print("The list is")
print(list_of_lists)
Output :
[[11, 22, 33, 44], 
[55, 66, 77, 88], 
[12, 13, 23, 43]]
The list is
[[11, 22, 33, 44], [55, 66, 77, 88], [12, 13, 23, 43]]

Converting a 2D Numpy array to list of lists using iteration :

We can iterate a 2D array row by row and during iteration we can add it to the list. And at the end we can get the list of lists containing all the elements from 2D numpy array.

#Program :

import numpy as np
# 2D Numpy array created
arr = np.array([[11, 22, 33, 44],
                [55, 66, 77, 88],
                [12, 13, 23, 43]])

#printing the 2D array                
print(arr)
# Converting a 2D Numpy Array to list of lists
#iterating row by row using for loop
list_of_lists = list()
for row in arr:
    list_of_lists.append(row.tolist())
#Printing the list of lists
print("The list is")
print(list_of_lists)
Output :
[[11, 22, 33, 44], 
[55, 66, 77, 88], 
[12, 13, 23, 43]]
The list is
[[11, 22, 33, 44], [55, 66, 77, 88], [12, 13, 23, 43]]

Converting a 2D Numpy Array to a flat list :

In both the example, we observed that a 2D NumPy array converted into list of lists. But we can also convert it into a  flat list (not a list of lists) . So, for that first we can convert the 2D NumPy array into 1D Numpy array by using flatten() method.. Then call the tolist() function to convert it into flat list.

#Program :

import numpy as np
# 2D Numpy array created
arr = np.array([[11, 22, 33, 44],
                [55, 66, 77, 88],
                [12, 13, 23, 43]])

#printing the 2D array                
print(arr)

# Converting 2D Numpy array to a flat list
my_list = arr.flatten().tolist()
#Printing the list of lists
print("The list is")
print(my_list)
Output :
[[11, 22, 33, 44], 
[55, 66, 77, 88], 
[12, 13, 23, 43]] 
The list is [11, 22, 33, 44, 55, 66, 77, 88, 12, 13, 23, 43]

Python: Three ways to check if a file is empty

Three ways to check if a file is empty in Python.

This article is about to check if a file is empty in Python. We will use three different ways to check if a file is empty or not.

The three ways are

  1. Using os.stat() method.
  2. Using  os.path.getsize() method.
  3. Reading its first character.

Method-1 : Use os.stat() to check if a file is empty :

In python, there is a stat() function which can be used to get the statistics about a file.

os.stat(path, *, dir_fd=None, follow_symlinks=True)

This function

  • It accepts file path (string type) as an argument.
  • It returns an object of the structure stat containing various attributes. (Like st.size to know size of the file)
#Program :

#importing the os module
import os

#file path as string type
path_of_file = 'sample.txt'

# checking if size of file is 0
if os.stat(path_of_file).st_size == 0:
    print('File is empty')
else:
    print('File is not empty')
Output :
File is empty

As our file is empty, so it returned ‘File is empty’.

But we have to be more careful while using it, because if the file does not exist at the respective path then FileNotFoundError.

FileNotFoundError: [WinError 2] The system cannot find the file specified: FILE_NAME

So by keeping that in mind first we will check the file exists or not. We will create a separate function for that. After checking if we will find the file exists then we will check the file is empty or not.

So, let’s see the implementation of it.

#Program :

import os

#Checking the file exists or not
def check_file_empty(path_of_file):
    #Checking if file exist and it is empty
    return os.path.exists(path_of_file) and os.stat(path_of_file).st_size == 0

path_of_file = 'sample.txt'

# checking if file exist and it is empty
EmptyOrNot = check_file_empty(path_of_file)
if EmptyOrNot:
    print('File is empty')
else:
    print('File is not empty')
Output :
File is empty

Method-2 : Check if file is empty using os.path.getsize() in Python :

In Python, os module there is an another function, by using which we can check if a file is empty or not.

os.path.getsize(path)

This function takes file path as an argument and returns the size in bytes. If the file does not exists it gives FileNotFoundError.

#program :

import os

#file path
path_of_file = 'sample.txt'

# checking if size of file is 0
if os.path.getsize(path_of_file) == 0:
    print('File is empty')
else:
    print('File is not empty')
Output : 
File is empty

Method-3 : Check if the file is empty by reading its first character in Python :

We can also check a file is empty or not by reading first character of the file. So, for that it opens the file in read only mode. Then it tries to read first character of the string. If it finds first character in the file then file exists and if it does not find the first character then the file is empty.

#Program :

def file_empty(file_name):
    # open file in read mode
    # Reading first character to check file is empty or not
    with open(file_name, 'r') as read_obj:
        # reading the first character
        one_char = read_obj.read(1)
        # if first character not fpound then file is empty
        if not one_char:
           return True
    return False

path_of_file = 'mysample.txt'
# check if file is empty
EmptyOrNot = file_empty(path_of_file)
print(EmptyOrNot)
Output : 
File is empty

Pandas : Get unique values in columns of a Dataframe in Python

How to get unique values in columns of a Dataframe in Python ?

To find the Unique values in a Dataframe we can use-

  1. series.unique(self)- Returns a numpy array of Unique values
  2. series.nunique(self, axis=0, dropna=True )- Returns the count of Unique values along different axis.(If axis = 0 i.e. default value, it checks along the columns.If axis = 1, it checks along the rows)

To test these functions let’s use the following data-

     Name      Age       City          Experience

a     jack       34.0     Sydney             5
b     Riti        31.0      Delhi               7
c     Aadi      16.0       NaN               11
d    Mohit    31.0       Delhi               7
e    Veena    NaN      Delhi               4
f   Shaunak  35.0     Mumbai           5
g    Shaun    35.0    Colombo          11

Finding unique values in a single column :

To get the unique value(here age) we use the unique( ) function on the column

CODE:-

#Program :

import numpy as np
import pandas as pd
# Data list
emp = [('jack', 34, 'Sydney', 5) ,
         ('Riti', 31, 'Delhi' , 7) ,
         ('Aadi', 16, np.NaN, 11) ,
         ('Mohit', 31,'Delhi' , 7) ,
         ('Veena', np.NaN, 'Delhi' , 4) ,
         ('Shaunak', 35, 'Mumbai', 5 ),
         ('Shaun', 35, 'Colombo', 11)
          ]
# Object of Dataframe class created
empObj = pd.DataFrame(emp, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
# Obtain the unique values in column 'Age' of the dataframe
uValues = empObj['Age'].unique()
# empObj[‘Age’] returns a series object of the column ‘Age’
print('The unique values in column "Age" are ')
print(uValues)
Output :
The unique values in column "Age" are
[34. 31. 16. nan 35.]

Counting unique values in a single column :

If we want to calculate the number of Unique values rather than the unique values, we can use the .nunique( ) function.

CODE:-

#Program :

import numpy as np
import pandas as pd
# Data list
emp = [('jack', 34, 'Sydney', 5) ,
('Riti', 31, 'Delhi' , 7) ,
('Aadi', 16, np.NaN, 11) ,
('Mohit', 31,'Delhi' , 7) ,
('Veena', np.NaN, 'Delhi' , 4) ,
('Shaunak', 35, 'Mumbai', 5 ),
('Shaun', 35, 'Colombo', 11)
]
# Object of Dataframe class created
empObj = pd.DataFrame(emp, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
# Counting the  unique values in column 'Age' of the dataframe
uValues = empObj['Age'].nunique()
print('Number of unique values in 'Age' column :')
print(uValues)
Output :
Number of unique values in 'Age' column :
4

Including NaN while counting the Unique values in a column :

NaN’s are not counted by default in the .nunique( ) function. To also include NaN we have to pass the dropna argument

CODE:-

#Program :

import numpy as np
import pandas as pd
# Data list
emp = [('jack', 34, 'Sydney', 5) ,
('Riti', 31, 'Delhi' , 7) ,
('Aadi', 16, np.NaN, 11) ,
('Mohit', 31,'Delhi' , 7) ,
('Veena', np.NaN, 'Delhi' , 4) ,
('Shaunak', 35, 'Mumbai', 5 ),
('Shaun', 35, 'Colombo', 11)
]
# Object of Dataframe class created
empObj = pd.DataFrame(emp, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
# Counting the unique values in column 'Age' also including NaN
uValues = empObj['Age'].nunique(dropna=False)
print('Number of unique values in 'Age' column including NaN:)
print(uValues)
Output :
Number of unique values in 'Age' column including NaN:
5

Counting unique values in each column of the dataframe :

To count the number of Unique values in each columns

CODE:-

#Program :

import numpy as np
import pandas as pd
# Data list
emp = [('jack', 34, 'Sydney', 5) ,
('Riti', 31, 'Delhi' , 7) ,
('Aadi', 16, np.NaN, 11) ,
('Mohit', 31,'Delhi' , 7) ,
('Veena', np.NaN, 'Delhi' , 4) ,
('Shaunak', 35, 'Mumbai', 5 ),
('Shaun', 35, 'Colombo', 11)
]
# Object of Dataframe class created
empObj = pd.DataFrame(emp, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
# Counting the unique values in each column
uValues = empObj.nunique()
print('In each column the number of unique values are')
print(uValues)
Output :
In each column the number of unique values are
Name          7
Age           4
City          4
Experience    4
dtype: int64

To include the NaN, just pass dropna into the function.

Get Unique values in multiple columns :

To get unique values in multiple columns, we have to pass all the contents of columns as a series object into the .unique( ) function

CODE:-

#program :

import numpy as np
import pandas as pd
# Data list
emp = [('jack', 34, 'Sydney', 5) ,
('Riti', 31, 'Delhi' , 7) ,
('Aadi', 16, np.NaN, 11) ,
('Mohit', 31,'Delhi' , 7) ,
('Veena', np.NaN, 'Delhi' , 4) ,
('Shaunak', 35, 'Mumbai', 5 ),
('Shaun', 35, 'Colombo', 11)
]
# Object of Dataframe class created
empObj = pd.DataFrame(emp, columns=['Name', 'Age', 'City', 'Experience'], index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
# Obtain the Unique values in multiple columns i.e. Name & Age
uValues = (empObj['Name'].append(empObj['Age'])).unique()
print('The unique values in column "Name" & "Age" :')
print(uValues)
Output :
The unique values in column "Name" & "Age" :
['jack' 'Riti' 'Aadi' 'Mohit' 'Veena' 'Shaunak' 'Shaun' 34.0 31.0 16.0 nan
35.0]

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas – Select items from a Dataframe

How to get & check data types of Dataframe columns in Python Pandas

How to get & check data types of dataframes columns in python pandas ?

In this article we will discuss different ways to get the data type of single or multiple columns.

Use Dataframe.dtype to get data types of columns in Dataframe :

In python’s pandas module provides Dataframe class as a container for storing and manipulating two-dimensional data which provides an attribute to get the data type information of each column.

This Dataframe.dtype returns a series mentioned with the data type of each column.

Let’s try with an example:

#Program :

import pandas as pd
import numpy as np
#list of tuples
game = [('riya',37,'delhi','cat','rose'),
   ('anjali',28,'agra','dog','lily'),
   ('tia',42,'jaipur','elephant','lotus'),
   ('kapil',51,'patna','cow','tulip'),
   ('raj',30,'banglore','lion','orchid')]
#Create a dataframe object
df = pd.DataFrame(game, columns=['Name','Age','Place','Animal','Flower'], index=['a','b','c','d','e'])
print(df)
Output:
     Name   Age     Place     Animal     Flower
a    riya      37        delhi       cat          rose
b   anjali    28        agra       dog          lily
c     tia      42        jaipur    elephant     lotus
d   kapil    51       patna       cow          tulip
e     raj      30     banglore     lion       orchid

This is the contents of the dataframe. Now let’s fetch the data types of each column in dataframe.

#Program :

import pandas as pd
import numpy as np
#list of tuples
game = [('riya',37,'delhi','cat','rose'),
   ('anjali',28,'agra','dog','lily'),
   ('tia',42,'jaipur','elephant','lotus'),
   ('kapil',51,'patna','cow','tulip'),
   ('raj',30,'banglore','lion','orchid')]
#Create a dataframe object
df = pd.DataFrame(game, columns=['Name','Age','Place','Animal','Flower'], index=['a','b','c','d','e'])
DataType = df.dtypes
print('Data type of each column:')
print(DataType)
Output:
Data type of each column:
Name      object
Age          int64
Place       object
Animal    object
Flower    object
dtype:     object

Get Data types of dataframe columns as dictionary :

#Program :

import pandas as pd
import numpy as np
#list of tuples
game = [('riya',37,'delhi','cat','rose'),
   ('anjali',28,'agra','dog','lily'),
   ('tia',42,'jaipur','elephant','lotus'),
   ('kapil',51,'patna','cow','tulip'),
   ('raj',30,'banglore','lion','orchid')]
#Create a dataframe object
df = pd.DataFrame(game, columns=['Name','Age','Place','Animal','Flower'], index=['a','b','c','d','e'])
#get a dictionary containing the pairs of column names and data types object
DataTypeDict = dict(df.dtypes)
print('Data type of each column :')
print(DataTypeDict)
Output:
Data type of each column  :{'Name': dtype('O'), 'Age': dtype('int64'), 'Place': dtype('O'), 'Animal': dtype('O'), 'Flower': dtype('O')}

Get the data type of a single column in dataframe :

By using Dataframe.dtypes we can also get the data type of a single column from a series of objects.

#Program :

import pandas as pd
import numpy as np
#list of tuples
game = [('riya',37,'delhi','cat','rose'),
   ('anjali',28,'agra','dog','lily'),
   ('tia',42,'jaipur','elephant','lotus'),
   ('kapil',51,'patna','cow','tulip'),
   ('raj',30,'banglore','lion','orchid')]
#Create a dataframe object
df = pd.DataFrame(game, columns=['Name','Age','Place','Animal','Flower'], index=['a','b','c','d','e'])
#get a dictionary containing the pairs of column names and data types object
DataTypeObj = df.dtypes['Age']
print('Data type of each column Age : ')
print(DataTypeObj)
Output :
Data type of each column Age :int64

Get list of pandas dataframe column names based on data types :

Suppose, we want a list of column names based on datatypes. Let’s take an example program whose data type is object(string).

import pandas as pd
import numpy as np
#list of tuples
game = [('riya',37,'delhi','cat','rose'),
('anjali',28,'agra','dog','lily'),
('tia',42,'jaipur','elephant','lotus'),
('kapil',51,'patna','cow','tulip'),
('raj',30,'banglore','lion','orchid')]
#Create a dataframe object
df = pd.DataFrame(game, columns=['Name','Age','Place','Animal','Flower'], index=['a','b','c','d','e'])

# Get  columns whose data type is object means string
filteredColumns = df.dtypes[df.dtypes == np.object]
# list of columns whose data type is object means string
listOfColumnNames = list(filteredColumns.index)
print(listOfColumnNames)
Output:
['Name', 'Place', 'Animal', 'Flower']

Get data types of a dataframe using Dataframe.info() :

Dataframe.info() function is used to get simple summary of a dataframe. By using this method we can get information about a dataframe including the index dtype and column dtype, non-null values and memory usage.

#program :

import pandas as pd
import numpy as np
#list of tuples
game = [('riya',37,'delhi','cat','rose'),
('anjali',28,'agra','dog','lily'),
('tia',42,'jaipur','elephant','lotus'),
('kapil',51,'patna','cow','tulip'),
('raj',30,'banglore','lion','orchid')]
#Create a dataframe object
df = pd.DataFrame(game, columns=['Name','Age','Place','Animal','Flower'], index=['a','b','c','d','e'])
df.info()
Output:
<class 'pandas.core.frame.DataFrame'>
Index: 5 entries, a to e
Data columns (total 5 columns): 
#   Column  Non-Null Count  Dtype 
---  ------  --------------  -----  
a   Name     5 non-null      object 
b   Age        5 non-null      int64  
c   Place      5 non-null      object 
d   Animal   5 non-null      object 
e   Flower   5 non-null      object
dtypes: int64(1), object(4)
memory usage: 240.0+ bytes

Python Pandas : How to convert lists to a dataframe

How to convert lists to a dataframe in python ?

As we know in python, lists are used to store multiple values in an ordered sequence inside a single variable.  Each element inside the list is called an item.

Syntax : my_list = [ element1, element2, element3, .....]

where,

  • elements/items are placed inside square brackets [].
  • items are separated by , symbol.
  • It can contain any number of items.
  • elements can be of different types i.e string, float, integer etc.

The pandas library is one of the most preferred tool to do data manipulation and analysis.

pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)

Creating DataFrame from list of lists :

Now, let’s take an example.

#program :

import pandas as pd
#List of list
Data=[[‘apple’, ‘banana’, ‘orange’],[‘dog’, ‘cat’, ‘cow’],[‘potato’, ‘tomato’, ‘onion’]]
#creating a dataframe object from list of list
df=pd.DataFrame(Data)
Output:
       0          1               2
0  apple    banana    orange
1  dog        cat           cow
2  potato  tomato    onion

Creating DataFrame from list of tuples :

Now, let’s take an example.

#Program

import pandas as pd
#List of tuples
Data=[(‘apple’, ‘banana’, ‘orange’),(‘dog’, ‘cat’, ‘cow’),(‘potato’, ‘tomato’, ‘onion’)]
#creating a dataframe object from list of tuples
df=pd.DataFrame(Data)
Output:
       0             1                2
0  apple      banana     orange
1  dog          cat           cow
2  potato   tomato     onion

Converting list of tuples to dataframe and set column names and indexes :

We can also make the column and index names.

#Program 


import pandas as pd
#List of tuples 
Data=[(‘apple’, ‘banana’, ‘orange’),(‘dog’, ‘cat’, ‘cow’),(‘potato’, ‘tomato’, ‘onion’)] 
#Converting list of tuples to dataframe and set column names and indexes 
df=pd.DataFrame(Data, columns=[‘a’, ‘b’, ‘c’], index=[‘fruits’, ‘animals’, ‘vegetables’])
Output:
                       0              1                2
fruits             apple     banana      orange
animals          dog        cat           cow
vegetables    potato   tomato     onion

We can also skip one more than one columns like we have 3 rows and 3 columns

This can be used when you don’t need a column

so let’s try removing 1 column.

#Program 

import pandas as pd
#List of tuples 
Data=[(‘apple’, ‘banana’, ‘orange’),(‘dog’, ‘cat’, ‘cow’),(‘potato’, ‘tomato’, ‘onion’)] 
#Converting list of tuples to dataframe and set column names and indexes 
df=pd.DataFrame(Data, exclude=[‘2’], columns=[‘a’, ‘b’, ‘c’], index=[‘fruits’, ‘animals’, ‘vegetables’])
Output:
                       0              1
fruits           apple       banana
animals        dog           cat
vegetables  potato      tomato

Creating a dataframe from multiple lists :

We can also create a dataframe by giving multiple multiple lists.

Let’s try this:

#Program :

import pandas as pd
roll_no = [1, 2, 3]
name = [‘Tia’, ‘Raj’, ‘Rahul’]
state = [‘Goa’, ’Assam’, ‘Punjab’]
wrapoflist = list(zip(roll_no, name, state))
df = pd.DataFrame(wrapoflist, column=[‘roll_no’, ‘name’, ‘state’], index=[‘a’, ‘b’, ‘c’])
Output:
       roll_no  name        state
a        1         Tia           Goa
b        2         Raj          Assam
c        3        Rahul       Punjab

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Python : How to convert datetime object to string using datetime.strftime()

How to convert datetime object to string using datetime.strftime() function in python ?

In this article we’ll discuss how to convert datetime object to string using datetime.strtime() function.

Strptime() :

In Python, the time method strptime() generally parses a string which represents a time according to a format. The return value is a struct_time as returned by localtime().

Python strtime() is a class method in datetime class.

Let’s try with different examples:

#Program :

From datetime import datetime

#current date and time
now = datetime.now()
date_time = now.strftime(‘%m/%d/%Y , %H:%M:%S’)
print(‘date and time’, date_time)
Output :
date and time 04/24/2021 , 06:40:15

Here, ‘date_time’  is the string  and ‘now’ is a ‘datetime’ object.

Local’s appropriate date and time :

#Program

from datetime import datetime

timestamp = 1528797322
date_time = datetime.fromtimestamp(timestamp)

#Converting Date part to String
d = date_time.strftime(‘%c’)
print(‘1st output’, d)

#Converting Date part to String
d = date_time.strftime(‘%x’)
print(‘2nd output’, d)

#Converting datetime to text
d = date_time.strftime(‘%X’)
print(‘3rd output’, d)
Output :
date and time 04/24/2021 , 06:40:15
1st output: Sat Jan 12 09:55:22 2018
2nd output: 01/12/18
3rd output: 09:55:22

Here, format codes %c, %x, and  %X are used for local’s appropriate date and time representation.