Python

Pandas: Get sum of column values in a Dataframe

How to get the sum of column values in a dataframe in Python ?

In this article, we will discuss about how to get the sum To find the sum of values in a dataframe. So, let’s start exploring the topic.

Select the column by name and get the sum of all values in that column :

To find the sum of values of a single column we have to use the sum( ) or the loc[ ] function.

Using sum() :

Here by using sum( ) only, we selected a column from a dataframe by the column name and from that we can get the sum of values in that column.

Syntax- dataFrame_Object[‘column_name’].sum( )
#Program :

import numpy as np
import pandas as pd
# Example data
students = [('Jill',    16,     'Tokyo',  150),
('Rachel',    38,     'Texas',   177),
('Kirti',    39,     'New York',  97),
('Veena',   40,     'Texas',   np.NaN),
('Lucifer',   np.NaN, 'Texas',   130),
('Pablo', 30,     'New York',  155),
('Lionel',   45,     'Colombia', 121) ]
dfObj = pd.DataFrame(students, columns=['Name','Age','City','Score'])
#Sum of all values in the 'Score' column of the dataframe
totalSum = dfObj['Score'].sum()
print(totalSum)
Output :
830.0

Using loc[ ] :

Here by using loc[] and  sum( ) only, we selected a column from a dataframe by the column name and from that we can get the sum of values in that column.

Syntax- dataFrame_Object_name.loc[:, ‘column_name’].sum( )

So, let’s see the implementation of it by taking an example.

#Program :

import numpy as np
import pandas as pd
# Example data
students = [('Jill',    16,     'Tokyo',  150),
('Rachel',    38,     'Texas',   177),
('Kirti',    39,     'New York',  97),
('Veena',   40,     'Texas',   np.NaN),
('Lucifer',   np.NaN, 'Texas',   130),
('Pablo', 30,     'New York',  155),
('Lionel',   45,     'Colombia', 121) ]
dfObj = pd.DataFrame(students, columns=['Name','Age','City','Score'])
#Sum of all values in the 'Score' column of the dataframe using loc[ ]
totalSum = dfObj.loc[:, 'Score'].sum()
print(totalSum)
Output :
830.0

Select the column by position and get the sum of all values in that column :

In case we don’t know about the column name but we know its position, we can find the sum of all value in that column using both iloc[ ] and sum( ). The iloc[ ] returns a series of values which is then passed into the sum( ) function.

So, let’s see the implementation of it by taking an example.

#Program :

import numpy as np
import pandas as pd

# Example data
students = [('Jill',    16,     'Tokyo',  150),
('Rachel',    38,     'Texas',   177),
('Kirti',    39,     'New York',  97),
('Veena',   40,     'Texas',   np.NaN),
('Lucifer',   np.NaN, 'Texas',   130),
('Pablo', 30,     'New York',  155),
('Lionel',   45,     'Colombia', 121) ]
dfObj = pd.DataFrame(students, columns=['Name','Age','City','Score'])
column_number = 4
# Total sum of values in 4th column i.e. ‘Score’
totalSum = dfObj.iloc[:, column_number-1:column_number].sum()
print(totalSum)
Output :
Score    830.0
dtype: float64

Find the sum of columns values for selected rows only in Dataframe :

If we need the sum of values from a column’s specific entries we can-

So, let’s see the implementation of it by taking an example.

#Program :

import numpy as np
import pandas as pd

# Example data
students = [('Jill',    16,     'Tokyo',  150),
('Rachel',    38,     'Texas',   177),
('Kirti',    39,     'New York',  97),
('Veena',   40,     'Texas',   np.NaN),
('Lucifer',   np.NaN, 'Texas',   130),
('Pablo', 30,     'New York',  155),
('Lionel',   45,     'Colombia', 121) ]
dfObj = pd.DataFrame(students, columns=['Name','Age','City','Score'])
column_number = 4
entries = 3
#Sum of the first three values from the 4th column
totalSum = dfObj.iloc[0:entries, column_number-1:column_number].sum()
print(totalSum)
Output :
Score    424.0
dtype: float64

Find the sum of column values in a dataframe based on condition :

In case we want the sum of all values that follows our conditions, for example scores of a particular city like New York can be found out by –

So, let’s see the implementation of it by taking an example.

#Program :

import numpy as np
import pandas as pd

# Example data
students = [('Jill',    16,     'Tokyo',  150),
('Rachel',    38,     'Texas',   177),
('Kirti',    39,     'New York',  97),
('Veena',   40,     'Texas',   np.NaN),
('Lucifer',   np.NaN, 'Texas',   130),
('Pablo', 30,     'New York',  155),
('Lionel',   45,     'Colombia', 121) ]
dfObj = pd.DataFrame(students, columns=['Name','Age','City','Score'])
#Sum of all the scores from New York city
totalSum = dfObj.loc[dfObj['City'] == 'New York', 'Score'].sum()
print(totalSum)
Output :
252.0

Pandas: Get sum of column values in a Dataframe Read More »

Python : How to remove characters from a string by Index ?

Ways to remove i’th character from string in Python

Here we are going to discuss how to remove characters from a string in a given range of indices or at specific index position.

So we will discuss different different methods for this.

Naive Method

In this method, we have to first run the loop and append the characters .After that  build a new string from the existing one .

test_str = "WelcomeBtechGeeks"
  
# Printing original string 
print ("The original string is : " + test_str)
  
# Removing char at pos 3
# using loop
new_str = ""
  
for i in range(len(test_str)):
    if i != 2:
        new_str = new_str + test_str[i]
  
# Printing string after removal  
print ("The string after removal of i'th character : " + new_str)

Output:

RESTART: C:/Users/HP/Desktop/article3.py
The original string is : WelcomeBtechGeeks
The string after removal of i'th character : WecomeBtechGeeks

So in above eoutput you have seen that we have remove character of position three that is ‘l’.

This method is very slow if we compare with other methods.

Using str.replace()

str.replace() can replace the particular index with empty char, and hence solve the issue.

test_str = "WelcomeBtechGeeks"
  
# Printing original string 
print ("The original string is : " + test_str)
  
# Removing char at pos 3
# using replace
new_str = test_str.replace('e', '1')
  
# Printing string after removal  
# removes all occurrences of 'e'
print ("The string after removal of i'th character( doesn't work) : " + new_str)
  
# Removing 1st occurrence of e, i.e 2nd pos.
# if we wish to remove it.
new_str = test_str.replace('e', '', 1)
  
# Printing string after removal  
# removes first occurrences of e
print ("The string after removal of i'th character(works) : " + new_str)

Output:

RESTART: C:/Users/HP/Desktop/article3.py
The original string is : WelcomeBtechGeeks
The string after removal of i'th character( doesn't work) : W1lcom1Bt1chG11ks
The string after removal of i'th character(works) : WlcomeBtechGeeks

So in above output you can see that first we have replace all ‘e’ present in original word.After that we replace only first occurrence of e.This method is also not very useful but sometime we are using this.

Using slice + concatenation

In this method we will use string slicing.Then using string concatenation of both, i’th character can appear to be deleted from the string.

test_str = "WelcomeBtechGeeks"
  
# Printing original string 
print ("The original string is : " + test_str)
  
#Removing char at pos 3
# using slice + concatenation
new_str = test_str[:2] +  test_str[3:]
  
# Printing string after removal  
# removes ele. at 3rd index
print ("The string after removal of i'th character : " + new_str)

Output:

RESTART: C:/Users/HP/Desktop/article3.py
The original string is : WelcomeBtechGeeks
The string after removal of i'th character : WecomeBtechGeeks

Using str.join() and list comprehension

In this method each string is converted in list then each of them is joined to make a string.

test_str = "WelcomeBtechGeeks"
  
# Printing original string 
print ("The original string is : " + test_str)
  
# Removing char at pos 3
# using join() + list comprehension
new_str = ''.join([test_str[i] for i in range(len(test_str)) if i != 2])
  
# Printing string after removal  
# removes ele. at 3rd index
print ("The string after removal of i'th character : " + new_str)

Output:

RESTART: C:/Users/HP/Desktop/article3.py
The original string is : WelcomeBtechGeeks
The string after removal of i'th character : WecomeBtechGeeks

Conclusion:

So in this article we have seen different method to remove characters from a string in a given range of indices or at specific index position.Enjoy learning guys.

Python : How to remove characters from a string by Index ? Read More »

Pandas: Convert a dataframe column into a list using Series.to_list() or numpy.ndarray.tolist() in python

Get a list of a specified column of a Pandas DataFrame

This article is all about how to get a list of a specified column of a Pandas DataFrame using different methods.

Lets create a dataframe which we will use in this article.

import pandas as pd 
students = [('juli', 34, 'Sydney', 155),
           ('Ravi', 31, 'Delhi', 177.5),
           ('Aaman', 16, 'Mumbai', 81),
           ('Mohit', 31, 'Delhi', 167),
           ('Veena', 12, 'Delhi', 144),
           ('Shan', 35, 'Mumbai', 135),
           ('Sradha', 35, 'Colombo', 111)
           ]

student_df = pd.DataFrame(students, columns=['Name', 'Age', 'City', 'Score'])
print(student_df)

Output:

        Name    Age   City             Score
0      Julie      34     Sydney         155.0
1     Ravi       31      Delhi            177.5
2     Aman    16     Mumbai         81.0
3     Mohit    31     Delhi             167.0
4     Veena    12     Delhi             144.0
5      Shan     35    Mumbai         135.0
6     Sradha   35   Colombo        111.0

Now we are going to fetch a single column .

There are different ways to do that.

using Series.to_list()

We will use the same example we use above in this article.We select the column ‘Name’ .We will use [] that gives a series object.Series.to_list()  this function we use provided by the Series class to convert the series object and return a list.

import pandas as pd 
students = [('juli', 34, 'Sydney', 155),
           ('Ravi', 31, 'Delhi', 177.5),
           ('Aaman', 16, 'Mumbai', 81),
           ('Mohit', 31, 'Delhi', 167),
           ('Veena', 12, 'Delhi', 144),
           ('Shan', 35, 'Mumbai', 135),
           ('Sradha', 35, 'Colombo', 111)
           ]

student_df = pd.DataFrame(students, columns=['Name', 'Age', 'City', 'Score'])
list_of_names = student_df['Name'].to_list()
print('List of Names: ', list_of_names)
print('Type of listOfNames: ', type(list_of_names))

Output:

RESTART: C:/Users/HP/Desktop/article2.py
List of Names: ['juli', 'Ravi', 'Aaman', 'Mohit', 'Veena', 'Shan', 'Sradha']
Type of listOfNames: <class 'list'>

So in above example you have seen its working…let me explain in brief..

We have first select the column ‘Name’ from the dataframe using [] operator,it returns a series object names, and we have confirmed that by printing its type.

We used [] operator that gives a series object.Series.to_list()  this function we use provided by the series class to convert the series object and return a list.

This is how we converted a dataframe column into a list.

using numpy.ndarray.tolist()

From the give dataframe we will select the column “Name” using a [] operator that returns a Series object and uses

Series.Values to get a NumPy array from the series object. Next, we will use the function tolist() provided by NumPy array to convert it to a list.

import pandas as pd 
students = [('juli', 34, 'Sydney', 155),
           ('Ravi', 31, 'Delhi', 177.5),
           ('Aaman', 16, 'Mumbai', 81),
           ('Mohit', 31, 'Delhi', 167),
           ('Veena', 12, 'Delhi', 144),
           ('Shan', 35, 'Mumbai', 135),
           ('Sradha', 35, 'Colombo', 111)
           ]

student_df = pd.DataFrame(students, columns=['Name', 'Age', 'City', 'Score'])
list_of_names = student_df['Name'].values.tolist()
print('List of Names: ', list_of_names)
print('Type of listOfNames: ', type(list_of_names))

Output:

RESTART: C:/Users/HP/Desktop/article2.py
List of Names: ['juli', 'Ravi', 'Aaman', 'Mohit', 'Veena', 'Shan', 'Sradha']
Type of listOfNames: <class 'list'>
>>>

So now we are going to show you its working,

We converted the column ‘Name’ into a list in a single line.Select the column ‘Name’ from the dataframe using [] operator,

From Series.Values get a Numpy array

import pandas as pd 
students = [('juli', 34, 'Sydney', 155),
           ('Ravi', 31, 'Delhi', 177.5),
           ('Aaman', 16, 'Mumbai', 81),
           ('Mohit', 31, 'Delhi', 167),
           ('Veena', 12, 'Delhi', 144),
           ('Shan', 35, 'Mumbai', 135),
           ('Sradha', 35, 'Colombo', 111)
           ]

student_df = pd.DataFrame(students, columns=['Name', 'Age', 'City', 'Score'])
names = student_df['Name'].values
print('Numpy array: ', names)
print('Type of namesAsNumpy: ', type(names))

Output:

Numpy array: ['juli' 'Ravi' 'Aaman' 'Mohit' 'Veena' 'Shan' 'Sradha']
Type of namesAsNumpy: <class 'numpy.ndarray'>

Numpy array provides a function tolist() to convert its contents to a list.

This is how we selected our column ‘Name’ from Dataframe as a Numpy array and then turned it to a list.

Conclusion:

In this article i have shown you that how to get a list of a specified column of a Pandas DataFrame using different methods.Enjoy learning guys.Thank you!

Pandas: Convert a dataframe column into a list using Series.to_list() or numpy.ndarray.tolist() in python Read More »

Check if a String contains a Sub String Case Insensitive

Python : Check if a String contains a Sub String | Case Insensitive

Strings are one of the most commonly used types in Python. We can easily make them by enclosing characters in quotes. Python considers single quotes to be the same as double quotes. String creation is as easy as assigning a value to a variable.

Let’s look at different ways to solve the general problem of determining whether a specific piece of string is present in a larger string. This is a very common type of problem that every programmer encounters at least once in his or her career. This article discusses various methods for resolving it.

Given a string and substring ,the task is to check if the substring is present in given string.

Examples:

1)Case sensitive

Input:

string = 'This is BTechGeeks online platform'  substring = 'online'

Output:

Yes

2)Case insensitive

Input:

string = 'This is BTechGeeks Online platform'  substring = 'online'

Output:

Yes

Determine whether a String contains a Sub String

There are several ways to check whether a string contains given substring some of them are:

Method #1:Using in operator

The in operator is the most generic and fastest way to check for a substring in Python. The power of the in operator in Python is well known and is used in many operations throughout the language.

1)Case sensitive

Below is the implementation:

# given string
string = 'This is BTechGeeks online platform'
# given substring
substring = 'online'
# checking if the substring is present in string
if substring in string:
    print('Yes')
else:
    print('No')

Output:

Yes

2)Case Insensitive

We can check them by converting both given string and substring to lower case or uppercase.

Below is the implementation:

# given string
string = 'This is BTechGeeks Online platform'
# given substring
substring = 'online'
# checking if the substring is present in string by converting to lowercase
if substring.lower() in string.lower():
    print('Yes')
else:
    print('No')

Output:

Yes

Method #2: Using not in operator

Similarly, we can use the “not in” operator to test the opposite scenario, that is, to see if a string or character does not exist in another string.

1)Case sensitive

Below is the implementation:

# given string
string = 'This is BTechGeeks online platform'
# given substring
substring = 'online'
# checking if the substring is present in string
if substring not in string:
    print('No')
else:
    print('Yes')

Output:

Yes

2)Case insensitive

We can check them by converting both given string and substring to lower case or uppercase.

Below is the implementation:

# given string
string = 'This is BTechGeeks online platform'
# given substring
substring = 'online'
# checking if the substring is present in string by converting to lowercase
if substring.lower() not in string.lower():
    print('No')
else:
    print('Yes')

Output:

Yes

Method #3:Using str.find()

The str.find() method is generally used to get the lowest index at which the string occurs, but it also returns -1 if the string is not present; thus, if any value returns greater than zero, the string is present; otherwise, it is not present.

1)Case sensitive

Below is the implementation:

# given string
string = 'This is BTechGeeks online platform'
# given substring
substring = 'online'
# using find operator
result = string.find(substring)
# if result is greater than 0 then print yes
if result > 0:
    print('Yes')
else:
    print('No')

Output:

Yes

2)Case insensitive

We can check them by converting both given string and substring to lower case or uppercase.

Below is the implementation:

# given string to lower
string = 'This is BTechGeeks Online platform'.lower()
# given substring to lower
substring = 'online'.lower()
# using find operator
result = string.find(substring)
# if result is greater than 0 then print yes
if result > 0:
    print('Yes')
else:
    print('No')

Output:

Yes

Method #4:Using index() function

This method can be used to perform the same task as str.find(), but unlike str.find(), it does not return a value, but rather a ValueError if string is not present, so catching the exception is the only way to check for string in substring.

1)Case sensitive

Below is the implementation:

# given string
string = 'This is BTechGeeks Online platform'
# given substring
substring = 'online'
# using try and except
# using index function
try:
    result = string.find(substring)
    print('Yes')
except:
    print('No')

Output:

Yes

2)Case insensitive

We can check them by converting both given string and substring to lower case or uppercase.

Below is the implementation:

# given string to lower
string = 'This is BTechGeeks Online platform'.lower()
# given substring to lower
substring = 'online'.lower()
# using try and except
# using index function
try:
    result = string.find(substring)
    print('Yes')
except:
    print('No')

Output:

Yes

Related Programs:

Python : Check if a String contains a Sub String | Case Insensitive Read More »

map-function-with-examples

Python : map() Function with Examples

In this article, we will look at how to use map to transform the contents of an iterable sequence (). We’ll also go over how to use the map() function with lambda functions and how to use map() to transform a dictionary.

Map() function in Python

1)map() function details

Syntax:

map(function, iterable)

Parameters:

function :   It is a function to which each element of a given iterable is passed.
Iterable  :   It is an iterable that is to be mapped.

Return :

The map() feature applies to each item of an iterable function and returns the results list.

You can then pass the return value from map() (map object) to functions like list() (to create a list), set() (to create a set), etc.

2)Working of map function

In the given sequence, it iterates across all elements. It calls the given callback() function on every item and then stores the returned value in a new sequence during iteration of the sequence. Ultimately this new transformed element sequence will return.

Examples:

3)Squaring each list element using map

By writing a function that squares the number, we can square each list element.

Below is the implementation:

# function which squares the number
def squareNumber(number):
    # square the number
    return number*number


# given list
givenlist = [5, 2, 6, 9]
# using map function to square each list element
squarelist = list(map(squareNumber, givenlist))
# print squarelist
print(squarelist)

Output:

[25, 4, 36, 81]

4)Reverse each string in list

By writing a function which reverses the string, we can reverse each list element.

Below is the implementation:

# function which reverses the string
def revString(string):
    # reverse the string
    return string[::-1]


# given list
givenlist = ['hello', 'this', 'is', 'BTechGeeks']
# using map function to reverse each string in list
reverselist = list(map(revString, givenlist))
# print squarelist
print(reverselist)

Output:

['olleh', 'siht', 'si', 'skeeGhceTB']

5)Adding two lists

We can add two lists by using lambda function

Below is the implementation:

# given lists
list1 = [9, 3, 4]
list2 = [4, 8, 6]
# using map function to sum the two lists
sumlist = list(map(lambda a, b: a + b, list1, list2))
# print sumlist
print(sumlist)

Output:

[13, 11, 10]

 
Related Programs:

Python : map() Function with Examples Read More »

Python Pandas : Select Rows in DataFrame by conditions on multiple columns

About Pandas DataFrame

It  is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.

This article is all about showing different ways to select rows in DataFrame based on condition on single or multiple columns.

import pandas as pd
students = [ ('Shyam', 'books' , 24) ,
             ('ankur', 'pencil' , 28) ,
             ('Rekha', 'pen' , 30) ,
             ('Sarika', 'books', 62) ,
             ('Lata', 'file' , 33) ,
             ('Mayank', 'pencil' , 30) ] 
dataframeobj = pd.DataFrame(students, columns = ['Name' , 'Product', 'Sale'])
print(dataframeobj)

Output will be:

RESTART: C:/Users/HP/Desktop/dataframe.py
Name    Product    Sale
0   Shyam   books       24
1   Ankur    pencil       28
2   Rekha    pen          30
3   Sarika    books      62
4   Lata       file           33
5   Mayank  pencil     30

Select Rows based on value in column

Let’s see how to Select rows based on some conditions in  DataFrame.

Select rows in above example for which ‘Product’ column contains the value ‘books’,

import pandas as pd
students = [ ('Shyam', 'books' , 24) ,
             ('ankur', 'pencil' , 28) ,
             ('Rekha', 'pen' , 30) ,
             ('Sarika', 'books', 62) ,
             ('Lata', 'file' , 33) ,
             ('Mayank', 'pencil' , 30) ] 
dataframeobj = pd.DataFrame(students, columns = ['Name' , 'Product', 'Sale'])
subsetDataFrame = dataframeobj[dataframeobj['Product'] == 'books']
print(subsetDataFrame)

Output:

RESTART: C:/Users/HP/Desktop/dataframe.py
Name     Product   Sale
0     Shyam    books      24
3     Sarika     books      62

In above example we have seen that subsetDataFrame = dataframeobj[dataframeobj['Product'] == 'books']

using this it will return column which have ‘Product’ contains ‘Books’ only.

So,if we want to see whole functionality?See below.

When we apply [dataframeobj['Product'] == 'books']this condition,it will give output in true & false form.

0 True
1 False
2 False
3 True
4 False
5 False
Name: Product, dtype: bool

It will give true when the condition matches otherwise false.

If we pass this series object to [] operator of DataFrame, then it will be return a new DataFrame with only those rows that has True in the passed Series object i.e.

RESTART: C:/Users/HP/Desktop/dataframe.py

Name     Product   Sale

0     Shyam    books      24

3     Sarika     books      62

If we select any other product name it will return value accordingly.

Select Rows based on any of the multiple values in column

Select rows from above example for which ‘Product‘ column contains either ‘Pen‘ or ‘Pencil‘ i.e

import pandas as pd
students = [ ('Shyam', 'books' , 24) ,
             ('ankur', 'pencil' , 28) ,
             ('Rekha', 'pen' , 30) ,
             ('Sarika', 'books', 62) ,
             ('Lata', 'file' , 33) ,
             ('Mayank', 'pencil' , 30) ] 
dataframeobj = pd.DataFrame(students, columns = ['Name' , 'Product', 'Sale'])
subsetDataFrame = dataframeobj[dataframeobj['Product'].isin(['pen', 'pencil']) ]
print(subsetDataFrame)

We have given product name list by isin() function and it will return true if condition will match otherwise false.

Therefore, it will return a DataFrame in which Column ‘Product‘ contains either ‘Pen‘ or ‘Pencil‘ only i.e.

Output:

RESTART: C:/Users/HP/Desktop/dataframe.py
Name Product Sale
1 ankur     pencil  28
2 Rekha    pen      30
5 Mayank pencil   30

Select DataFrame Rows Based on multiple conditions on columns

In this method we are going to select rows in above example for which ‘Sale’ column contains value greater than 20 & less than 33.So for this we are going to give some condition.

import pandas as pd
students = [ ('Shyam', 'books' , 24) ,
             ('ankur', 'pencil' , 28) ,
             ('Rekha', 'pen' , 30) ,
             ('Sarika', 'books', 62) ,
             ('Lata', 'file' , 33) ,
             ('Mayank', 'pencil' , 30) ] 
dataframeobj = pd.DataFrame(students, columns = ['Name' , 'Product', 'Sale'])
filterinfDataframe = dataframeobj[(dataframeobj['Sale'] > 20) & (dataframeobj['Sale'] < 33) ]
print(filterinfDataframe)

It will return following DataFrame object in which Sales column  contains value between 20 to 33,

RESTART: C:/Users/HP/Desktop/dataframe.py
    Name      Product Sale
0 Shyam      books    24
1 ankur        pencil    28
2 Rekha       pen       30
5 Mayank    pencil    30

Conclusion:

In this article we have seen diferent methods to select rows in dataframe by giving some condition.Hope you find this informative.

Python Pandas : Select Rows in DataFrame by conditions on multiple columns Read More »

Pandas : Convert Data frame index into column using dataframe.reset_index() in python

In this article, we will be exploring ways to convert indexes of a data frame or a multi-index data frame into its a column.

There is a function provided in the Pandas Data frame class to reset the indexes of the data frame.

Dataframe.reset_index()

DataFrame.reset_index(self, level=None, drop=False, inplace=False, col_level=0, col_fill='')

It returns a data frame with the new index after resetting the indexes of the data frame.

  • level: By default, reset_index() resets all the indexes of the data frame. In the case of a multi-index dataframe, if we want to reset some specific indexes, then we can specify it as int, str, or list of str, i.e., index names.
  • Drop: If False, then converts the index to a column else removes the index from the dataframe.
  • Inplace: If true, it modifies the data frame in place.

Let’s use this function to convert the indexes of dataframe to columns.

The first and the foremost thing we will do is to create a dataframe and initialize it’s index.

Code:
empoyees = [(11, ‘jack’, 34, ‘Sydney’, 70000) ,
(12, ‘Riti’, 31, ‘Delhi’ , 77000) ,
(13, ‘Aadi’, 16, ‘Mumbai’, 81000) ,
(14, ‘Mohit’, 31,‘Delhi’ , 90000) ,
(15, ‘Veena’, 12, ‘Delhi’ , 91000) ,
(16, ‘Shaunak’, 35, ‘Mumbai’, 75000 ),
(17, ‘Shaun’, 35, ‘Colombo’, 63000)]
# Create a DataFrame object
empDfObj = pd.DataFrame(empoyees, columns=[‘ID’ , ‘Name’, ‘Age’, ‘City’, ‘Salary’])
# Set ‘ID’ as the index of the dataframe
empDfObj.set_index(‘ID’, inplace=True)
print(empDfObj)

dataframe

Now, we will try different things with this dataframe.

Convert index of a Dataframe into a column of dataframe

To convert the index ‘ID‘ of the dataframe empDfObj into a column, call the reset_index() function on that dataframe,

Code:
modified = empDfObj.reset_index()
print(“Modified Dataframe : “)
print(modified)

Modified Dataframe

Since we haven’t provided the inplace argument, so by default it returned the modified copy of a dataframe.

In which the indexID is converted into a column named ‘ID’ and automatically the new index is assigned to it.

Now, we will pass the inplace argument as True to proceed with the process.

Code:
empDfObj.reset_index(inplace=True)
print(empDfObj)

dataframe with inplace argument

Now, we will set the column’ID’ as the index of the dataframe.

Code:
empDfObj.set_index('ID', inplace=True)

Remove index  of dataframe instead of converting into column

Previously, what we have done is convert the index of the dataframe into the column of the dataframe but now we want to just remove it. We can do that by passing drop argument as True in the reset_index() function,

Code:
modified = empDfObj.reset_index(drop=True)
print("Modified Dataframe : ")
print(modified)

Remove index of dataframe instead of converting into column

We can see that it removed the dataframe index.

Resetting indexes of a Multi-Index Dataframe

Let’s convert the dataframe object empDfObj  into a multi-index dataframe with two indexes i.e. ID & Name.

Code:
empDfObj = pd.DataFrame(empoyees, columns=['ID', 'Name', 'Age', 'City', 'Salary'])
# set multiple columns as the index of the the dataframe to
# make it multi-index dataframe.
empDfObj.set_index(['ID', 'Name'], inplace=True)
print(empDfObj)

Resetting indexes of a Multi-Index Dataframe

Convert all the indexes of Multi-index Dataframe to the columns of Dataframe

In the previous module, we have made a dataframe with the multi-index but now here we will convert the indexes of multi-index dataframe to the columns of the dataframe.

To do this, all we have to do is just call the reset_index() on the dataframe object.

Code:
modified = empDfObj.reset_index()
print(modified)

Convert all the indexes of Multi-index Dataframe to the columns of Dataframe

It converted the index ID and Name to the column of the same name.

Suppose, we want to convert only one index from the multiple indexes. We can do that by passing a single parameter in the level argument.

Code:
modified = empDfObj.reset_index(level='ID')
print("Modified Dataframe: ")
print(modified)

convert only one index from the multiple indexes

It converted the index’ID’ to the column with the same index name. Similarly, we can follow this same procedure to carry out the task for converting the name index to the column.

You should try converting the code for changing Name index to column.

We can change both the indexes and make them columns by passing mutiple arguments in the level  parameter.

Code:
modified = empDfObj.reset_index(level=['ID', 'Name'])
print("Modified Dataframe: ")
print(modified)

change both the indexes and make them columns

The complete code:

import pandas as pd
def main():
 # List of Tuples
 empoyees = [(11, 'jack', 34, 'Sydney', 70000) ,
(12, 'Riti', 31, 'Delhi' , 77000) ,
(13, 'Aadi', 16, 'Mumbai', 81000) ,
(14, 'Mohit', 31,'Delhi' , 90000) ,
(15, 'Veena', 12, 'Delhi' , 91000) ,
(16, 'Shaunak', 35, 'Mumbai', 75000 ),
(17, 'Shaun', 35, 'Colombo', 63000)]
 # Create a DataFrame object
 empDfObj = pd.DataFrame(empoyees, columns=['ID' , 'Name', 'Age', 'City', 'Salary'])
 # Set 'ID' as the index of the dataframe
 empDfObj.set_index('ID', inplace=True)
print("Contents of the Dataframe : ")
print(empDfObj)
print('Convert the index of Dataframe to the column')
 # Reset the index of dataframe
 modified = empDfObj.reset_index()
print("Modified Dataframe : ")
print(modified)
print('Convert the index of Dataframe to the column - in place ')
 empDfObj.reset_index(inplace=True)
print("Contents of the Dataframe : ")
print(empDfObj)
 # Set 'ID' as the index of the dataframe
 empDfObj.set_index('ID', inplace=True)
print('Remove the index of Dataframe to the column')
 # Remove index ID instead of converting into a column
 modified = empDfObj.reset_index(drop=True)
print("Modified Dataframe : ")
print(modified)
print('Reseting indexes of a Multi-Index Dataframe')
 # Create a DataFrame object
 empDfObj = pd.DataFrame(empoyees, columns=['ID', 'Name', 'Age', 'City', 'Salary'])
 # set multiple columns as the index of the the dataframe to
 # make it multi-index dataframe.
 empDfObj.set_index(['ID', 'Name'], inplace=True)
print("Contents of the Multi-Index Dataframe : ")
print(empDfObj)
print('Convert all the indexes of Multi-index Dataframe to the columns of Dataframe')
 # Reset all indexes of a multi-index dataframe
 modified = empDfObj.reset_index()
print("Modified Mult-Index Datafrme : ")
print(modified)
print("Contents of the original Multi-Index Dataframe : ")
print(empDfObj)
 modified = empDfObj.reset_index(level='ID')
print("Modified Dataframe: ")
print(modified)
 modified = empDfObj.reset_index(level='Name')
print("Modified Dataframe: ")
print(modified)
 modified = empDfObj.reset_index(level=['ID', 'Name'])
print("Modified Dataframe: ")
print(modified)
if __name__ == '__main__':
main()

Hope this article was useful for you and you grabbed the knowledge from it.

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas – Modify a Dataframe

Pandas : Convert Data frame index into column using dataframe.reset_index() in python Read More »

Python : How to copy files from one location to another using shutil.copy()

In this article, we will discuss how to copy files from one directory to another using shutil.copy().

shutil.copy()

We have a function named shutil.copy() provided by python shutil module.

shutil.copy(src, dst, *, follow_symlinks=True)

It copies the file pointed by src to the directory pointed by dst.

Parameters:

  • src is the file path.
  • dst can be a directory path or file path.
  • if src is a path of symlinks then,
    • if follow_symlinks is True, it will copy the path.
    • if follow_symlinks is False, then it will create a new dst directory in a symbolic link.

It returns the path string of a newly created file.

Now, we will see what module is required, the first step is to import the module.

import shutil

Now, we will use this function to copy the files.

Copy a file to another directory

newPath = shutil.copy('sample1.txt', '/home/bahija/test')

The file ‘sample1.txt’ will be copied to the home directory ‘/home/bahija/test’ and after being copied it will return the path of the newly created file that is,

/home/bahija/test/sample1.txt
  • If the file name already exists in the destination directory, then it will be overwritten.
  • If no directory exists with the name test inside the /home/bahija then the source file will be copied with the name test.
  • If there is no existence of the source file, then it will give an error that is, FileNotFoundError.

Copy a File to another directory with a new name

Copy a file with new name
newPath = shutil.copy('sample1.txt', '/home/bahijaj/test/sample2.txt')
The new name will be assigned to the ‘sample1.txt’ as ‘sample2.txt’ and the file will be saved to another directory.
Few points to note:
  • The file will be overwritten if the destination file exists.
  • If the file is not available, then it will give FileNotFoundError.

Copy symbolic links using shutil.copy()

Suppose we are using a symbolic link named link.csv which points towards sample.csv.

link.csv -> sample.csv

Now, we will copy the symbolic link using shutil.copy() function.

shutil.copy(src, dst, *, follow_symlinks=True)

We can see that the follow_symlinks is True by default. So it will copy the file to the destination directory.

newPath = shutil.copy('/home/bahijaj/test/link.csv', '/home/bahijaj/test/sample2.csv')

The new path will be:

/home/bahijaj/test/sample2.csv

Sample2.txt is the actual copy of sample1.txt.

If follow_symlinks will be False,

newPath = shutil.copy('/home/bahijaj/test/link.csv', '/home/bahijaj/test/newlink.csv', follow_symlinks=False)

It will copy the symbolic link i.e. newlink.csv will be a link pointing to the same target file sample1.csv i.e.
newlink.csv -> sample1.txt.

If the file does not exist, then it will give an error.

Complete Code:

import shutil
def main():
 # Copy file to another directory
 newPath = shutil.copy('sample1.txt', '/home/bahijaj/test')
print("Path of copied file : ", newPath)
 #Copy a file with new name
 newPath = shutil.copy('sample1.txt', '/home/bahijaj/test/sample2.txt')
print("Path of copied file : ", newPath)
 # Copy a symbolic link as a new link
 newPath = shutil.copy('/home/bahijaj/test/link.csv', '/home/bahijaj/test/sample2.csv')
print("Path of copied file : ", newPath)
 # Copy target file pointed by symbolic link
 newPath = shutil.copy('/home/bahijaj/test/link.csv', '/home/bahijaj/test/newlink.csv', follow_symlinks=False)
print("Path of copied file : ", newPath)
if __name__ == '__main__':
main()

Hope this article was useful for you. Enjoy Reading!

 

Python : How to copy files from one location to another using shutil.copy() Read More »

Pandas: Replace NaN with mean or average in Dataframe using fillna()

In this article, we will discuss the replacement of NaN values with a mean of the values in rows and columns using two functions: fillna() and mean().

In data analytics, we have a large dataset in which values are missing and we have to fill those values to continue the analysis more accurately.

Python provides the built-in methods to rectify the NaN values or missing values for cleaner data set.

These functions are:

Dataframe.fillna():

This method is used to replace the NaN in the data frame.

The mean() method:

mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Parameters::

  • Axis is the parameter on which the function will be applied. It denotes a boolean value for rows and column.
  • Skipna excludes the null values when computing the results.
  • If the axis is a MultiIndex (hierarchical), count along with a particular level, collapsing into a Series.
  • Numeric_only will use the numeric values when None is there.
  • **kwargs: Additional keyword arguments to be passed to the function.

This function returns the mean of the values.

Let’s dig in deeper to get a thorough understanding!

Pandas: Replace NaN with column mean

We can replace the NaN values in the whole dataset or just in a column by getting the mean values of the column.

For instance, we will take a dataset that has the information about 4 students S1 to S4 with marks in different subjects.

Pandas: Replace NaN with column mean

Code:

import numpy as np
import pandas as pd
# A dictionary with list as values
sample_dict = { ‘S1’: [10, 20, np.NaN, np.NaN],
‘S2’: [5, np.NaN, np.NaN, 29],
‘S3’: [15, np.NaN, np.NaN, 11],
‘S4’: [21, 22, 23, 25],
‘Subjects’: [‘Maths’, ‘Finance’, ‘History’, ‘Geography’]}
# Create a DataFrame from dictionary
df = pd.DataFrame(sample_dict)
# Set column ‘Subjects’ as Index of DataFrame
df = df.set_index(‘Subjects’)
print(df)

Suppose we have to calculate the mean value of S2 columns, then we will see that a single value of float type is returned.

Mean values of S2 column

Code:

mean_value=df[‘S2’].mean()
print(‘Mean of values in column S2:’)
print(mean_value)

Replace NaN values in a column with mean of column values

Let’s see how to replace the NaN values in column S2 with the mean of column values.

Replace NaN values in a column with mean of column values

Code:

df['S2'].fillna(value=df['S2'].mean(), inplace=True)
print('Updated Dataframe:')
print(df)

We can see that the mean() method is called by the S2 column, therefore the value argument had the mean of column values. So the NaN values are replaced with the mean values.

Replace all NaN values in a Dataframe with mean of column values

Now, we will see how to replace all the NaN values in a data frame with the mean of S2 columns values.

We can simply apply the fillna() function with the entire data frame instead of a particular column.

Replace all NaN values in a Dataframe with mean of column values

Code:

df.fillna(value=df['S2'].mean(), inplace=True)
print('Updated Dataframe:')
print(df)

We can see that all the values got replaced with the mean value of the S2 column. The inplace = True has been assigned to make the permanent change.

Pandas: Replace NANs with mean of multiple columns

We will reinitialize our data frame with NaN values.

Pandas: Replace NANs with mean of multiple columns

Code:

df = pd.DataFrame(sample_dict)
# Set column 'Subjects' as Index of DataFrame
df = df.set_index('Subjects')
# Dataframe with NaNs
print(df)

If we want to make changes to multiple columns then we will mention multiple columns while calling the mean() functions.

Mean of values in column S2 & S3

Code:

mean_values=df[['S2','S3']].mean()
print(mean_values)

It returned the calculated mean of two columns that are S2 and the S3.

Now, we will replace the NaN values in columns S2 and S3 with the mean values of these columns.

replace the NaN values in the columns ‘S2’ and ‘S3’ by the mean of values in ‘S2’ and ‘S3’

Code:

df[['S2','S3']] = df[['S2','S3']].fillna(value=df[['S2','S3']].mean())
print('Updated Dataframe:')
print(df)

Pandas: Replace NANs with row mean

We can apply the same method as we have done above with the row. Previously, we replaced the NaN values with the mean of the columns but here we will replace the NaN values in the row by calculating the mean of the row.

For this, we need to use .loc(‘index name’) to access a row and then use fillna() and mean() methods.

Pandas: Replace NANs with row mean

Code:

df.loc['History'] = df.loc['History'].fillna(value=df.loc['History'].mean())
print('Updated Dataframe:')
print(df)

Conclusion

So, these were different ways to replace NaN values in a column, row or complete data frame with mean or average values.

Hope this article was useful for you!

 

Pandas: Replace NaN with mean or average in Dataframe using fillna() Read More »

Python : How to find keys by value in dictionary ?

In this article, we will see how to find all the keys associated with a single value or multiple values.

For instance, we have a dictionary of words.

dictOfWords = {"hello": 56,
"at" : 23 ,
"test" : 43,
"this" : 97,
"here" : 43,
"now" : 97
}

Now, we want all the keys in the dictionary having value 43, which means “here” and “test”.

Let’s see how to get it.

Find keys by value in the dictionary

Dict.items() is the module that returns all the key-value pairs in a dictionary. So, what we will do is check whether the condition is satisfied by iterating over the sequence. If the value is the same then we will add the key in a separate list.

def getKeysByValue(dictOfElements, valueToFind):
 listOfKeys = list()
listOfItems = dictOfElements.items()
for item in listOfitems:
if item[1] == valueToFind:
 listOfKeys.append(item[0])
return listOfKeys
We will use this function to get the keys by value 43.
listOfKeys = getKeysByValue(dictOfWords, 43)
print("Keys with value equal to 43")
#Iterate over the list of keys
for key in listOfKeys:
print(key)

Find keys by value in dictionary

We can achieve the same thing with a list comprehension.

listOfKeys = [key for (key, value) in dictOfWords.items() if value == 43]

Find keys in the dictionary by value list

Now, we want to find the keys in the dictionary whose values matches with the value we will give.

[43, 97]

We will do the same thing as we have done above but this time we will iterate the sequence and check whether the value matches with the given value.

def getKeysByValues(dictOfElements, listOfValues):
listOfKeys = list()
listOfItems = dictOfElements.items()
for item in listOfItems:
if item[1] in listOfValues:
 listOfKeys.append(item[0])
return listOfKeys

We will use the above function:

listOfKeys = getKeysByValues(dictOfWords, [43, 97] )
#Iterate over the list of values
for key in listOfKeys:
print(key)

Find keys in dictionary by value list

Complete Code:

'''Get a list of keys from dictionary which has the given value'''
def getKeysByValue(dictOfElements, valueToFind):
listOfKeys = list()
listOfItems = dictOfElements.items()
for item in listOfItems:
if item[1] == valueToFind:
listOfKeys.append(item[0])
return listOfKeys
'''
Get a list of keys from dictionary which has value that matches with any value in given list of values
'''
def getKeysByValues(dictOfElements, listOfValues):
listOfKeys = list()
listOfItems = dictOfElements.items()
for item in listOfItems:
if item[1] in listOfValues:
listOfKeys.append(item[0])
return listOfKeys
def main():
# Dictionary of strings and int
dictOfWords = {
"hello": 56,
"at" : 23 ,
"test" : 43,
"this" : 97,
"here" : 43,
"now" : 97}
print("Original Dictionary")
print(dictOfWords)
'''
Get list of keys with value 43
'''
listOfKeys = getKeysByValue(dictOfWords, 43)
print("Keys with value equal to 43")
#Iterate over the list of keys
for key in listOfKeys:
print(key)
print("Keys with value equal to 43")
'''
Get list of keys with value 43 using list comprehension
'''
listOfKeys = [key for (key, value) in dictOfWords.items() if value == 43]
#Iterate over the list of keys
for key in listOfKeys:
print(key)
print("Keys with value equal to any one from the list [43, 97] ")
'''
Get list of keys with any of the given values
'''
listOfKeys = getKeysByValues(dictOfWords, [43, 97] )
#Iterate over the list of values
for key in listOfKeys:
print(key)
if __name__ == '__main__':
main()

 

Hope this article was useful for you.

Enjoy Reading!

Python : How to find keys by value in dictionary ? Read More »