# Bahija Siddiqui

## Create an empty NumPy Array of given length or shape and data type in Python

In this article, we will be exploring different ways to create an empty 1D,2D, and 3D NumPy array of different data types like int, string, etc.

We have a Python module in NumPy that provides a function to create an empty() array.

numpy.empty(shape, dtype=float, order='C')
• The arguments are shape and data type.
• It returns the new array of the shape and data type we have given without initialising entries which means the array which is returned contain garbage values.
• If the data type argument is not provided then the array will take float as a default argument.

Now, we will use this empty() function to create an empty array of different data types and shape.

You can also delete column using numpy delete column tutorial.

## Create an empty 1D Numpy array of a given length

To create a 1D NumPy array of a given length, we have to insert an integer in the shape argument.

For example, we will insert 5 in the shape argument to the empty() function.

Code:

import numpy as np
# Create an empty 1D Numpy array of length 5
empty_array = np.empty(5)
print(empty_array)

## Create an empty Numpy array of given shape using numpy.empty()

In the above code, we saw how to create a 1D empty array. Now in this example, we will see how to create a 2D and 3D NumPy array numpy.empty() method.

### Create an empty 2D Numpy array using numpy.empty()

To create the 2D NumPy array, we will pass the shape of the 2D array that is rows and columns as a tuple to the numpy.empty() function.

For instance, here we will create a 2D NumPy array with 5 rows and 3 columns.

Code:

empty_array = np.empty((5, 3))
print(empty_array)

It returned an empty numpy array of 3 rows and 5 columns. Since we did not provide any data type so the function has taken a default value as a float.

### Create an empty 3D Numpy array using numpy.empty()

As we have seen with the 2D array, we will be doing the same thing to create an empty 3D NumPy array. We will create a 3D NumPy array with 2 matrix of 3 rows and 3 columns.

Code:

empty_array = np.empty((2, 3, 3))
print(empty_array)

The above code creates a 3D NumPy array with 2 matrix of 3 rows and 3 columns without initialising values.

In all the above examples, we have not provided any data type argument. Therefore, by default, all the values which were returned were in the float data type.

Now in the next section, we customize the data type. Let’s see how to do that.

## Create an empty Numpy array with custom data type

To create an empty NumPy array with different data types, all we have to do is initialise the data type in type argument in the numpy.empty() function.

Let’s see different data types examples.

### Create an empty Numpy array of 5 Integers

To create a NumPy array of integer 5, we have to initialise int in the type argument in the numpy.empty() function.

Code:

# Create an empty Numpy array of 5 integers
empty_array = np.empty(5, dtype=int)
print(empty_array)

### Create an empty Numpy array of 5 Complex Numbers

Now, to create the empty NumPy array of 5 complex numbers, all we have to do is write the data type complex in the dtype argument in numpy.empty() function.

Code:

empty_array = np.empty(5, dtype=complex)
print(empty_array)

### Create an empty Numpy array of 5 strings

In this, we will write the dtype argument as a string in the numpy.empty() function.

Code:

empty_array = np.empty(5, dtype='S3')
print(empty_array)

The complete code:

import numpy as np
def main():
print('*** Create an empty Numpy array of given length ***')
# Create an empty 1D Numpy array of length 5
empty_array = np.empty(5)
print(empty_array)
print('*** Create an empty Numpy array of given shape ***')
# Create an empty 2D Numpy array or matrix with 5 rows and 3 columns
empty_array = np.empty((5, 3))
print(empty_array)
# Create an empty 3D Numpy array
empty_array = np.empty((2, 3, 3))
print(empty_array)
print('*** Create an empty Numpy array with custom data type ***')
# Create an empty Numpy array of 5 integers
empty_array = np.empty(5, dtype=int)
print(empty_array)
# Create an empty Numpy array of 5 Complex Numbers
empty_array = np.empty(5, dtype=complex)
print(empty_array)
# Create an empty Numpy array of 5 strings of length 3, You also get an array with binary strings
empty_array = np.empty(5, dtype='S3')
print(empty_array)
if __name__ == '__main__':
main()

Happy learning guys!

## Python: Find indexes of an element in pandas dataframe | Python Pandas Index.get_loc()

In this tutorial, we will learn how to find the indexes of a row and column numbers using pandas in a dataframe. By learning from this tutorial, you can easily get a good grip on how to get row names in Pandas dataframe. Also, there is a possibility to learn about the Python Pandas Index.get_loc() function along with syntax, parameters, and a sample example program.

## Pandas Index.get_loc() Function in Python

PandasIndex.get_loc()function results integer location, slice, or boolean mask for the requested label. The function acts with both sorted as well as unsorted Indexes. It gives various options if the passed value is not present in the Index.

### Syntax:

Index.get_loc(key, method=None, tolerance=None)

### Parameters:

• key: label
• method: {None, ‘pad’/’ffill’, ‘backfill’/’bfill’, ‘nearest’}, optional
• default: exact matches only.
• pad / ffill: If not having the exact match, find the PREVIOUS index value.
• backfill / bfill: Utilize NEXT index value if no exact match
• nearest: Make use of the NEAREST index value if no exact match. Tied distances are broken by preferring the larger index value.

### Return Value:

loc : int if unique index, slice if monotonic index, else mask

### Example using Index.get_loc() function:

# importing pandas as pd
import pandas as pd

# Creating the Index
'Lhasa', 'Husky', 'Beagle'])

# Print the Index
idx

Also View:

## Creating a Dataframe in Python

The initial step is creating a dataframe.

Code:

# List of Tuples
empoyees = [('jack', 34, 'Sydney', 155),
('Riti', 31, 'Delhi', 177),
('Mohit', 31, 'Delhi', 167),
('Veena', 81, 'Delhi', 144),
('Shaunak', 35, 'Mumbai', 135),
('Shaun', 35, 'Colombo', 111)
]
# Create a DataFrame object
empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Marks'])
print(empDfObj)

Output:

Now, we want to find the location where the value ’81’ exists.

(4, 'Age')
(2, 'Marks')

We can see that value ’81’ exists at two different places in the data frame.

1. At row index 4 & column “Age”
2. At row index 2 & column “Marks”

Now, we will proceed to get the result of this.

## Find all indexes of an item in pandas dataframe

Dataframe object and the value as an argument is accepted by the function we have created.

It returns the list of index positions at all occurrences.

Code:

def getIndexes(dfObj, value):
''' Get index positions of value in dataframe i.e. dfObj.'''
listOfPos = list()
# Get bool dataframe with True at positions where the given value exists
result = dfObj.isin([value])
# Get list of columns that contains the value
seriesObj = result.any()
columnNames = list(seriesObj[seriesObj == True].index)
# Iterate over list of columns and fetch the rows indexes where value exists
for col in columnNames:
rows = list(result[col][result[col] == True].index)
for row in rows:
listOfPos.append((row, col))
# Return a list of tuples indicating the positions of value in the dataframe
return listOfPos

Output:

We got the exact row and column names of all the locations where the value ’81’ exists.

We will see what happened inside the getIndexes function.

### How did it work?

Now, we will learn step by step process on what happened inside the getIndexes() function.

Step 1: Get bool dataframe with True at positions where the value is 81 in the dataframe using pandas.DataFrame.isin()

DataFrame.isin(self, values)

This isin() function accepts a value and returns a bool dataframe. The original size and the bool data frame size is the same. When the given value exists, it contains True otherwise False.

We will see the bool dataframe where the value is ’81’.

# Get bool dataframe with True at positions where value is 81
result = empDfObj.isin([81])
print('Bool Dataframe representing existence of value 81 as True')
print(result)

Output:

It is of the same size as empDfObj. As 81 exists at 2 places inside the dataframe, so this bool dataframe contains True at only those two places. In all other places, it contains False.

Step 2: Get the list of columns that contains the value

We will get the name of the columns that contain the value ’81’.We will achieve this by fetching names in a column in the bool dataframe which contains True value.

Code:

# Get list of columns that contains the value i.e. 81
seriesObj = result.any()
columnNames = list(seriesObj[seriesObj == True].index)

print('Names of columns which contains 81:', columnNames)

Output:

Step 3: Iterate over selected columns and fetch the indexes of the rows which contains the value

We will iterate over each selected column and for each column, we will find the row which contains the True value.

Now these combinations of column names and row indexes where True exists are the index positions of 81 in the dataframe i.e.

Code:

# Iterate over each column and fetch the rows number where
for col in columnNames:
rows = list(result[col][result[col] == True].index)
for row in rows:
print('Index : ', row, ' Col : ', col)

Output:

Now it is clear that this is the way the getIndexes() function was working and finding the exact index positions of the given value & store each position as (row, column) tuple. In the end, it returns a list of tuples representing its index positions in the dataframe.

## Find index positions of multiple elements in the DataFrame

Suppose we have multiple elements,

[81, 'Delhi', 'abc']

Now we want to find index positions of all these elements in our dataframe empDfObj, like this,

81  :  [(4, 'Age'), (2, 'Marks')]
Delhi  :  [(1, 'City'), (3, 'City'), (4, 'City')]
abc  :  []


Let’s use the getIndexes() and dictionary comprehension to find the indexes of all occurrences of multiple elements in the dataframe empDfObj.

listOfElems = [81, 'Delhi', 'abc']
# Use dict comprhension to club index positions of multiple elements in dataframe
dictOfPos = {elem: getIndexes(empDfObj, elem) for elem in listOfElems}
print('Position of given elements in Dataframe are : ')
for key, value in dictOfPos.items():
print(key, ' : ', value)


Output:

dictOfPos is a dictionary of elements and their index positions in the dataframe. As ‘abc‘ doesn’t exist in the dataframe, therefore, its list is empty in dictionary dictOfPos.

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas – Find Elements in a Dataframe

## Python: How to append a new row to an existing csv file?

This tutorial will help you learn how to append a new row to an existing CSV file using some CSV modules like reader/writer and the most famous DictReader/DictWriter classes. Moreover, you can also get enough knowledge on all python concepts by visiting our provided tutorials.

## How to Append a new row to an existing csv file?

There are multiple ways in Python by which we can append rows into the CSV file. But here we will discuss two effective methods. Before going to learn those two methods, we have to follow the standard step which is explained ahead.

The basic step to proceed in this is to have a CSV file. For instance, here we have a CSV file named students.csv having the following contents:

Id,Name,Course,City,Session
21,Mark,Python,London,Morning
22,John,Python,Tokyo,Evening
23,Sam,Python,Paris,Morning

For reading and writing CSV files python provides a CSV module. There are two different classes for writing CSV files that is writer and DictWriter.

We can append the rows in a CSV file by either of them but some solutions are better than the other. We will see it in the next section.

Do Refer:

## Append a list as a new row to an old CSV file using csv.writer()

A writer class is in the CSV module which writes the rows in existing CSV files.

Let’s take a list of strings:

# List of strings
row_contents = [32,'Shaun','Java','Tokyo','Morning']

To add this list to an existing CSV file, we have to follow certain steps:

• Import CSV module’s writer class.
• Open our csv file in append mode and create a file object.
• Pass this file object to the csv.writer(), we can get a writer class object.
• This writer object has a function writerow(), pass the list to it and it will add the list’s contents as a new row in the associated csv file.
• A new row is added in the csv file, now close the file object.

By following the above steps, the list will be appended as a row in the CSV file as it is a simple process.

from csv import writer

def append_list_as_row(file_name, list_of_elem):
# Open file in append mode
with open(file_name, 'a+', newline='') as write_obj:
# Create a writer object from csv module
csv_writer = writer(write_obj)
# Add contents of list as last row in the csv file
csv_writer.writerow(list_of_elem)

Another Code:

We can see that the list has been added.

### Appending a row to csv with missing entries?

Suppose we have a list that does not contain all the values and we have to append it into the CSV file.

Suppose the list is:

list = [33, ‘Sahil’, ‘Morning’]

Example:

# A list with missing entries
row_contents = [33, 'Sahil', 'Morning']
# Appending a row to csv with missing entries
append_list_as_row('students.csv', row_contents)

Output:

We can see the data get appended at the wrong positions as the session got appended at the course.

csv’s writer class has no functionality to check if any of the intermediate column values are missing in the list or if they are in the correct order. It will just add the items in the list as column values of the last row in sequential order.

Therefore while adding a list as a row using csv.writer() we need to make sure that all elements are provided and are in the correct order.

If any element is missing like in the above example, then we should pass empty strings in the list like this,

row_contents = [33, 'Sahil', '' , '', 'Morning']

Since we have a huge amount of data in the CSV file, adding the empty strings in all of that will be a hectic task.

To save us from hectic work, the CSV provided us with the DictWriter class.

## Append a dictionary as a row to an existing csv file using DictWriter in python

As the name suggests, we can append a dictionary as a row to an existing CSV file using DictWriter in Python. Let’s see how we can use them.

Suppose, we have a dictionary-like below,

{'Id': 81,'Name': 'Sachin','Course':'Maths','City':'Mumbai','Session':'Evening'}

We can see that the keys are the columns of the CSV and the values will be the ones we will provide.

To append it, we have to follow some steps given below:

• import csv module’s DictWriter class,
• Open our csv file in append mode and create a file object,
• Pass the file object & a list of csv column names to the csv.DictWriter(), we can get a DictWriter class object
• This DictWriter object has a function writerow() that accepts a dictionary. pass our dictionary to this function, it adds them as a new row in the associated csv file,
• A new line is added in the csv file, now close the file object,

The above steps will append our dictionary as a new row in the csv. To make our life easier, we have created a separate function that performs the above steps,

Code:

from csv import DictWriter
def append_dict_as_row(file_name, dict_of_elem, field_names):
# Open file in append mode
with open(file_name, 'a+', newline='') as write_obj:
# Create a writer object from csv module
dict_writer = DictWriter(write_obj, fieldnames=field_names)
# Add dictionary as wor in the csv
dict_writer.writerow(dict_of_elem)


Output:

We can see that it added the row successfully. We can also consider this thought that what if our dictionary will have any missing entries? Or the items are in a different order?

The advantage of using DictWriter is that it will automatically handle the sort of things and columns with missing entries will remain empty. Let’s check an example:

field_names = ['Id','Name','Course','City','Session']
row_dict = {'Id': 81,'Name': 'Sachin','Course':'Maths','City':'Mumbai','Session':'Evening'}
# Append a dict as a row in csv file
append_dict_as_row('students.csv', row_dict, field_names)

Output:

We can see this module has its wonders.

## Pandas : Convert Data frame index into column using dataframe.reset_index() in python

In this article, we will be exploring ways to convert indexes of a data frame or a multi-index data frame into its a column.

There is a function provided in the Pandas Data frame class to reset the indexes of the data frame.

## Dataframe.reset_index()

DataFrame.reset_index(self, level=None, drop=False, inplace=False, col_level=0, col_fill='')

It returns a data frame with the new index after resetting the indexes of the data frame.

• level: By default, reset_index() resets all the indexes of the data frame. In the case of a multi-index dataframe, if we want to reset some specific indexes, then we can specify it as int, str, or list of str, i.e., index names.
• Drop: If False, then converts the index to a column else removes the index from the dataframe.
• Inplace: If true, it modifies the data frame in place.

Let’s use this function to convert the indexes of dataframe to columns.

The first and the foremost thing we will do is to create a dataframe and initialize it’s index.

Code:
empoyees = [(11, ‘jack’, 34, ‘Sydney’, 70000) ,
(12, ‘Riti’, 31, ‘Delhi’ , 77000) ,
(13, ‘Aadi’, 16, ‘Mumbai’, 81000) ,
(14, ‘Mohit’, 31,‘Delhi’ , 90000) ,
(15, ‘Veena’, 12, ‘Delhi’ , 91000) ,
(16, ‘Shaunak’, 35, ‘Mumbai’, 75000 ),
(17, ‘Shaun’, 35, ‘Colombo’, 63000)]
# Create a DataFrame object
empDfObj = pd.DataFrame(empoyees, columns=[‘ID’ , ‘Name’, ‘Age’, ‘City’, ‘Salary’])
# Set ‘ID’ as the index of the dataframe
empDfObj.set_index(‘ID’, inplace=True)
print(empDfObj)

Now, we will try different things with this dataframe.

### Convert index of a Dataframe into a column of dataframe

To convert the index ‘ID‘ of the dataframe empDfObj into a column, call the reset_index() function on that dataframe,

Code:
modified = empDfObj.reset_index()
print(“Modified Dataframe : “)
print(modified)

Since we haven’t provided the inplace argument, so by default it returned the modified copy of a dataframe.

In which the indexID is converted into a column named ‘ID’ and automatically the new index is assigned to it.

Now, we will pass the inplace argument as True to proceed with the process.

Code:
empDfObj.reset_index(inplace=True)
print(empDfObj)

Now, we will set the column’ID’ as the index of the dataframe.

Code:
empDfObj.set_index('ID', inplace=True)

### Remove index  of dataframe instead of converting into column

Previously, what we have done is convert the index of the dataframe into the column of the dataframe but now we want to just remove it. We can do that by passing drop argument as True in the reset_index() function,

Code:
modified = empDfObj.reset_index(drop=True)
print("Modified Dataframe : ")
print(modified)

We can see that it removed the dataframe index.

### Resetting indexes of a Multi-Index Dataframe

Let’s convert the dataframe object empDfObj  into a multi-index dataframe with two indexes i.e. ID & Name.

Code:
empDfObj = pd.DataFrame(empoyees, columns=['ID', 'Name', 'Age', 'City', 'Salary'])
# set multiple columns as the index of the the dataframe to
# make it multi-index dataframe.
empDfObj.set_index(['ID', 'Name'], inplace=True)
print(empDfObj)

### Convert all the indexes of Multi-index Dataframe to the columns of Dataframe

In the previous module, we have made a dataframe with the multi-index but now here we will convert the indexes of multi-index dataframe to the columns of the dataframe.

To do this, all we have to do is just call the reset_index() on the dataframe object.

Code:
modified = empDfObj.reset_index()
print(modified)



It converted the index ID and Name to the column of the same name.

Suppose, we want to convert only one index from the multiple indexes. We can do that by passing a single parameter in the level argument.

Code:
modified = empDfObj.reset_index(level='ID')
print("Modified Dataframe: ")
print(modified)

It converted the index’ID’ to the column with the same index name. Similarly, we can follow this same procedure to carry out the task for converting the name index to the column.

You should try converting the code for changing Name index to column.

We can change both the indexes and make them columns by passing mutiple arguments in the level  parameter.

Code:
modified = empDfObj.reset_index(level=['ID', 'Name'])
print("Modified Dataframe: ")
print(modified)

The complete code:

import pandas as pd

def main():
# List of Tuples
empoyees = [(11, 'jack', 34, 'Sydney', 70000) ,
(12, 'Riti', 31, 'Delhi' , 77000) ,
(13, 'Aadi', 16, 'Mumbai', 81000) ,
(14, 'Mohit', 31,'Delhi' , 90000) ,
(15, 'Veena', 12, 'Delhi' , 91000) ,
(16, 'Shaunak', 35, 'Mumbai', 75000 ),
(17, 'Shaun', 35, 'Colombo', 63000)]
# Create a DataFrame object
empDfObj = pd.DataFrame(empoyees, columns=['ID' , 'Name', 'Age', 'City', 'Salary'])
# Set 'ID' as the index of the dataframe
empDfObj.set_index('ID', inplace=True)
print("Contents of the Dataframe : ")
print(empDfObj)
print('Convert the index of Dataframe to the column')
# Reset the index of dataframe
modified = empDfObj.reset_index()
print("Modified Dataframe : ")
print(modified)
print('Convert the index of Dataframe to the column - in place ')
empDfObj.reset_index(inplace=True)
print("Contents of the Dataframe : ")
print(empDfObj)
# Set 'ID' as the index of the dataframe
empDfObj.set_index('ID', inplace=True)
print('Remove the index of Dataframe to the column')
# Remove index ID instead of converting into a column
modified = empDfObj.reset_index(drop=True)
print("Modified Dataframe : ")
print(modified)
print('Reseting indexes of a Multi-Index Dataframe')
# Create a DataFrame object
empDfObj = pd.DataFrame(empoyees, columns=['ID', 'Name', 'Age', 'City', 'Salary'])
# set multiple columns as the index of the the dataframe to
# make it multi-index dataframe.
empDfObj.set_index(['ID', 'Name'], inplace=True)
print("Contents of the Multi-Index Dataframe : ")
print(empDfObj)
print('Convert all the indexes of Multi-index Dataframe to the columns of Dataframe')
# Reset all indexes of a multi-index dataframe
modified = empDfObj.reset_index()
print("Modified Mult-Index Datafrme : ")
print(modified)
print("Contents of the original Multi-Index Dataframe : ")
print(empDfObj)
modified = empDfObj.reset_index(level='ID')
print("Modified Dataframe: ")
print(modified)
modified = empDfObj.reset_index(level='Name')
print("Modified Dataframe: ")
print(modified)
modified = empDfObj.reset_index(level=['ID', 'Name'])
print("Modified Dataframe: ")
print(modified)
if __name__ == '__main__':
main()

Hope this article was useful for you and you grabbed the knowledge from it.

Want to expert in the python programming language? Exploring Python Data Analysis using Pandas tutorial changes your knowledge from basic to advance level in python concepts.

Read more Articles on Python Data Analysis Using Padas – Modify a Dataframe

## Python : How to copy files from one location to another using shutil.copy()

In this article, we will discuss how to copy files from one directory to another using shutil.copy().

### shutil.copy()

We have a function named shutil.copy() provided by python shutil module.

shutil.copy(src, dst, *, follow_symlinks=True)

It copies the file pointed by src to the directory pointed by dst.

Parameters:

• src is the file path.
• dst can be a directory path or file path.
• if src is a path of symlinks then,
• if follow_symlinks is True, it will copy the path.
• if follow_symlinks is False, then it will create a new dst directory in a symbolic link.

It returns the path string of a newly created file.

Now, we will see what module is required, the first step is to import the module.

import shutil

Now, we will use this function to copy the files.

## Copy a file to another directory

newPath = shutil.copy('sample1.txt', '/home/bahija/test')

The file ‘sample1.txt’ will be copied to the home directory ‘/home/bahija/test’ and after being copied it will return the path of the newly created file that is,

/home/bahija/test/sample1.txt
• If the file name already exists in the destination directory, then it will be overwritten.
• If no directory exists with the name test inside the /home/bahija then the source file will be copied with the name test.
• If there is no existence of the source file, then it will give an error that is, FileNotFoundError.

## Copy a File to another directory with a new name

Copy a file with new name
newPath = shutil.copy('sample1.txt', '/home/bahijaj/test/sample2.txt')
The new name will be assigned to the ‘sample1.txt’ as ‘sample2.txt’ and the file will be saved to another directory.
Few points to note:
• The file will be overwritten if the destination file exists.
• If the file is not available, then it will give FileNotFoundError.

## Copy symbolic links using shutil.copy()

Suppose we are using a symbolic link named link.csv which points towards sample.csv.

link.csv -> sample.csv

Now, we will copy the symbolic link using shutil.copy() function.

shutil.copy(src, dst, *, follow_symlinks=True)

We can see that the follow_symlinks is True by default. So it will copy the file to the destination directory.

newPath = shutil.copy('/home/bahijaj/test/link.csv', '/home/bahijaj/test/sample2.csv')

The new path will be:

/home/bahijaj/test/sample2.csv

Sample2.txt is the actual copy of sample1.txt.

newPath = shutil.copy('/home/bahijaj/test/link.csv', '/home/bahijaj/test/newlink.csv', follow_symlinks=False)

It will copy the symbolic link i.e. newlink.csv will be a link pointing to the same target file sample1.csv i.e.

If the file does not exist, then it will give an error.

Complete Code:

import shutil
def main():
# Copy file to another directory
newPath = shutil.copy('sample1.txt', '/home/bahijaj/test')
print("Path of copied file : ", newPath)
#Copy a file with new name
newPath = shutil.copy('sample1.txt', '/home/bahijaj/test/sample2.txt')
print("Path of copied file : ", newPath)
print("Path of copied file : ", newPath)
# Copy target file pointed by symbolic link
print("Path of copied file : ", newPath)
if __name__ == '__main__':
main()

## Pandas: Replace NaN with mean or average in Dataframe using fillna()

In this article, we will discuss the replacement of NaN values with a mean of the values in rows and columns using two functions: fillna() and mean().

In data analytics, we have a large dataset in which values are missing and we have to fill those values to continue the analysis more accurately.

Python provides the built-in methods to rectify the NaN values or missing values for cleaner data set.

These functions are:

## Dataframe.fillna():

This method is used to replace the NaN in the data frame.

### The mean() method:

mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)


Parameters::

• Axis is the parameter on which the function will be applied. It denotes a boolean value for rows and column.
• Skipna excludes the null values when computing the results.
• If the axis is a MultiIndex (hierarchical), count along with a particular level, collapsing into a Series.
• Numeric_only will use the numeric values when None is there.
• **kwargs: Additional keyword arguments to be passed to the function.

This function returns the mean of the values.

Let’s dig in deeper to get a thorough understanding!

### Pandas: Replace NaN with column mean

We can replace the NaN values in the whole dataset or just in a column by getting the mean values of the column.

For instance, we will take a dataset that has the information about 4 students S1 to S4 with marks in different subjects.

Code:


import numpy as np
import pandas as pd
# A dictionary with list as values
sample_dict = { ‘S1’: [10, 20, np.NaN, np.NaN],
‘S2’: [5, np.NaN, np.NaN, 29],
‘S3’: [15, np.NaN, np.NaN, 11],
‘S4’: [21, 22, 23, 25],
‘Subjects’: [‘Maths’, ‘Finance’, ‘History’, ‘Geography’]}
# Create a DataFrame from dictionary
df = pd.DataFrame(sample_dict)
# Set column ‘Subjects’ as Index of DataFrame
df = df.set_index(‘Subjects’)
print(df)

Suppose we have to calculate the mean value of S2 columns, then we will see that a single value of float type is returned.

Code:


mean_value=df[‘S2’].mean()
print(‘Mean of values in column S2:’)
print(mean_value)

### Replace NaN values in a column with mean of column values

Let’s see how to replace the NaN values in column S2 with the mean of column values.

Code:

df['S2'].fillna(value=df['S2'].mean(), inplace=True)
print('Updated Dataframe:')
print(df)

We can see that the mean() method is called by the S2 column, therefore the value argument had the mean of column values. So the NaN values are replaced with the mean values.

### Replace all NaN values in a Dataframe with mean of column values

Now, we will see how to replace all the NaN values in a data frame with the mean of S2 columns values.

We can simply apply the fillna() function with the entire data frame instead of a particular column.

Code:

df.fillna(value=df['S2'].mean(), inplace=True)
print('Updated Dataframe:')
print(df)

We can see that all the values got replaced with the mean value of the S2 column. The inplace = True has been assigned to make the permanent change.

### Pandas: Replace NANs with mean of multiple columns

We will reinitialize our data frame with NaN values.

Code:

df = pd.DataFrame(sample_dict)
# Set column 'Subjects' as Index of DataFrame
df = df.set_index('Subjects')
# Dataframe with NaNs
print(df)

If we want to make changes to multiple columns then we will mention multiple columns while calling the mean() functions.

Code:

mean_values=df[['S2','S3']].mean()
print(mean_values)

It returned the calculated mean of two columns that are S2 and the S3.

Now, we will replace the NaN values in columns S2 and S3 with the mean values of these columns.

Code:

df[['S2','S3']] = df[['S2','S3']].fillna(value=df[['S2','S3']].mean())
print('Updated Dataframe:')
print(df)

### Pandas: Replace NANs with row mean

We can apply the same method as we have done above with the row. Previously, we replaced the NaN values with the mean of the columns but here we will replace the NaN values in the row by calculating the mean of the row.

For this, we need to use .loc(‘index name’) to access a row and then use fillna() and mean() methods.

Code:

df.loc['History'] = df.loc['History'].fillna(value=df.loc['History'].mean())
print('Updated Dataframe:')
print(df)

Conclusion

So, these were different ways to replace NaN values in a column, row or complete data frame with mean or average values.

## Python : How to find keys by value in dictionary ?

In this article, we will see how to find all the keys associated with a single value or multiple values.

For instance, we have a dictionary of words.

dictOfWords = {"hello": 56,
"at" : 23 ,
"test" : 43,
"this" : 97,
"here" : 43,
"now" : 97
}

Now, we want all the keys in the dictionary having value 43, which means “here” and “test”.

Let’s see how to get it.

## Find keys by value in the dictionary

Dict.items() is the module that returns all the key-value pairs in a dictionary. So, what we will do is check whether the condition is satisfied by iterating over the sequence. If the value is the same then we will add the key in a separate list.

def getKeysByValue(dictOfElements, valueToFind):
listOfKeys = list()
listOfItems = dictOfElements.items()
for item in listOfitems:
if item[1] == valueToFind:
listOfKeys.append(item[0])
return listOfKeys
We will use this function to get the keys by value 43.
listOfKeys = getKeysByValue(dictOfWords, 43)
print("Keys with value equal to 43")
#Iterate over the list of keys
for key in listOfKeys:
print(key)

We can achieve the same thing with a list comprehension.

listOfKeys = [key for (key, value) in dictOfWords.items() if value == 43]

### Find keys in the dictionary by value list

Now, we want to find the keys in the dictionary whose values matches with the value we will give.

[43, 97]

We will do the same thing as we have done above but this time we will iterate the sequence and check whether the value matches with the given value.

def getKeysByValues(dictOfElements, listOfValues):
listOfKeys = list()
listOfItems = dictOfElements.items()
for item in listOfItems:
if item[1] in listOfValues:
listOfKeys.append(item[0])
return listOfKeys

We will use the above function:

listOfKeys = getKeysByValues(dictOfWords, [43, 97] )
#Iterate over the list of values
for key in listOfKeys:
print(key)

Complete Code:

'''Get a list of keys from dictionary which has the given value'''
def getKeysByValue(dictOfElements, valueToFind):
listOfKeys = list()
listOfItems = dictOfElements.items()
for item in listOfItems:
if item[1] == valueToFind:
listOfKeys.append(item[0])
return listOfKeys
'''
Get a list of keys from dictionary which has value that matches with any value in given list of values
'''
def getKeysByValues(dictOfElements, listOfValues):
listOfKeys = list()
listOfItems = dictOfElements.items()
for item in listOfItems:
if item[1] in listOfValues:
listOfKeys.append(item[0])
return listOfKeys
def main():
# Dictionary of strings and int
dictOfWords = {
"hello": 56,
"at" : 23 ,
"test" : 43,
"this" : 97,
"here" : 43,
"now" : 97}
print("Original Dictionary")
print(dictOfWords)
'''
Get list of keys with value 43
'''
listOfKeys = getKeysByValue(dictOfWords, 43)
print("Keys with value equal to 43")
#Iterate over the list of keys
for key in listOfKeys:
print(key)
print("Keys with value equal to 43")
'''
Get list of keys with value 43 using list comprehension
'''
listOfKeys = [key for (key, value) in dictOfWords.items() if value == 43]
#Iterate over the list of keys
for key in listOfKeys:
print(key)
print("Keys with value equal to any one from the list [43, 97] ")
'''
Get list of keys with any of the given values
'''
listOfKeys = getKeysByValues(dictOfWords, [43, 97] )
#Iterate over the list of values
for key in listOfKeys:
print(key)
if __name__ == '__main__':
main()

## numpy.where() – Explained with examples

In this article, we will see various examples of numpy.where() function and how it works in python. For instance, like,

• Using numpy.where() with single condition.
• Using numpy.where() with multiple condition
• Use numpy.where() to select indexes of elements that satisfy multiple conditions
• Using numpy.where() without condition expression

In Python’s NumPy module, we can select elements with two different sequences based on conditions on the different array.

## Syntax of np.where()

numpy.where(condition[, x, y])

Explanation:

• The condition returns a NumPy array of bool.
• X and Y are the arrays that are optional, which means either both are passed or not passed.
• If it is passed then it returns the elements from x and y based on the condition depending on values in the bool array.
• If x & y arguments are not passed and only condition argument is passed then it returns the indices of the elements that are True in bool numpy array.

Let’s dig in to see some examples.

### Using numpy.where() with single condition

Let’s say we have two lists of the same size and a NumPy array.

arr = np.array([11, 12, 13, 14])
high_values = ['High', 'High', 'High', 'High']
low_values = ['Low', 'Low', 'Low', 'Low']

Now, we want to convert this numpy array to the array of the same size, where the values will be included from the list high_values and low_values. For instance, if the value in an array is less than 12, then replace it with the ‘low‘ and if the value in array arr is greater than 12 then replace it with the value ‘high‘.

So ultimately, the array will look like this:

['Low' 'Low' 'High' 'High']


We can also do this with for loops, but this numpy module is designed to carry out tasks like this only.

We will use numpy.where() to see the results.

arr = np.array([11, 12, 13, 14])
high_values = ['High', 'High', 'High', 'High']
low_values = ['Low', 'Low', 'Low', 'Low']
# numpy where() with condition argument
result = np.where(arr > 12,['High', 'High', 'High', 'High'],['Low', 'Low', 'Low', 'Low'])
print(result)

We have converted the two arrays in a single array by using the where function like evaluating based on conditions of less than 12 and greater than 12.

The first value came out low because the value in the array arr is smaller than 12. Similarly, the last values returned are high because the value in array arr is greater than 12.

Let’s see how it worked.

Numpy.where() contains three arguments.

The first argument is the numpy array which got converted to a bool array.

arr > 12 ==> [False False True True]

Then numpy.where() iterated over the bool array and for every True, it yields corresponding element from list 1 i.e. high_values and for every False, it yields corresponding element from 2nd list i.e. low_values i.e.

[False False True True] ==> [‘Low’, ‘Low’, ‘High’, ‘High’]

This is how we created a new array from the older arrays.

### Using numpy.where() with multiple conditions

In the above example, we have used single conditions. Here we will see the example with multiple conditions.

arr = np.array([11, 12, 14, 15, 16, 17])
# pass condition expression only
result = np.where((arr > 12) & (arr < 16),['A', 'A', 'A', 'A', 'A', 'A'],['B', 'B', 'B', 'B', 'B', 'B'])
print(result)

We executed multiple conditions on the array arr and it returned a bool value. Then numpy.where() iterated over the bool array and for every True it yields corresponding element from the first list and for every False it yields the corresponding element from the 2nd list. Then constructs a new array by the values selected from both the lists based on the result of multiple conditions on numpy array arr i.e.

• Conditional expression returns true for 14 and 15 values, so they are replaced by values in list1.
• Conditional expression returns False for 11,12,16, and 17, so they are replaced by values in list2.

Now, we will pass the different values and see what the array returns.

arr = np.array([11, 12, 14, 15, 16, 17])
# pass condition expression only
result = np.where((arr > 12) & (arr < 16),['A', 'B', 'C', 'D', 'E', 'F'],[1, 2, 3, 4, 5, 6])

### Use np.where() to select indexes of elements that satisfy multiple conditions

We will take a new NumPy array:

arr = np.array([11, 12, 13, 14, 15, 16, 17, 15, 11, 12, 14, 15, 16, 17])

Now, we will give a condition and find the indexes of elements that satisfy that condition that is an element should be greater than 12 and less than 16. For this, we will use numpy.where(),

arr = np.array([11, 12, 13, 14, 15, 16, 17, 15, 11, 12, 14, 15, 16, 17])
# pass condition expression only
result = np.where((arr > 12) & (arr < 16))
print(result)

A tuple containing an array of indexes is returned where the conditions met True in the original array.

How did it work?

Here, the condition expression is evaluated to a bool numpy array, numpy.where() is passed to it. Then where() returned a tuple of arrays for each dimension. As our array was one dimension only, so it contained an element only i.e. a new array containing the indices of elements where the value was True in bool array i.e. indexes of items from original array arr where value is between 12 & 16.

### Using np.where() without any condition expression

In this case, we will directly pass the bool array as a conditional expression.

result = np.where([True, False, False],[1, 2, 4],[7, 8, 9])
print(result)

The True value is yielded from the first list and the False value is yielded from the second list when numpy.where()  iterates over the bool value.

So, basically, it returns an array of elements from the first list where the condition is True and elements from a second list elsewhere.

## Python: Read a CSV file line by line with or without header

In this article, we will be learning about how to read a CSV file line by line with or without a header. Along with that, we will be learning how to select a specified column while iterating over a file.

Let us take an example where we have a file named students.csv.

Id,Name,Course,City,Session
21,Mark,Python,London,Morning
22,John,Python,Tokyo,Evening
23,Sam,Python,Paris,Morning
32,Shaun,Java,Tokyo,Morning
What we want is to read all the rows of this file line by line.
Note that, we will not be reading this CSV file into lists of lists because that will be very space-consuming and time-consuming. It will also cause problems with the large data. We have to look for a solution that works as an interpreter where we can read a line one at a time so that less memory consumption will take place.
Let’s get started with it!
In python, we have two modules to read the CSV file, one is csv.reader and the second is csv.DictReader. We will use them one by one to read a CSV file line by line.

By using the csv.reader module, a reader class object is made through which we can iterate over the lines of a CSV file as a list of values, where each value in the list is a cell value.

Code:

from csv import reader
# open file in read mode
# pass the file object to reader() to get the reader object
# Iterate over each row in the csv using reader object
# row variable is a list that represents a row in csv
print(row)
The above code iterated over each row of the CSV file. It fetched the content of each row as a list and printed that generated list.

### How did it work?

It performed a few steps:

1. Opened the students.csv file and created a file object.
3. Now with the reader object, we iterated it by using the for loop so that it can read each row of the csv as a list of values.
4. At last, we printed this list.

By using this module, only one line will consume memory at a time while iterating through a csv file.

What if we want to skip a header and print the files without the header. In the previous example, we printed the values including the header but in this example, we will remove the header and print the values without the header.

Code:

from csv import reader
# skip first line i.e. read header first and then iterate over each row od csv as a list
# Check file as empty
# Iterate over each row after the header in the csv
# row variable is a list that represents a row in csv
print(row)
We can see in the image above, that the header is not printed and the code is designed in such a way that it skipped the header and printed all the other values in a list.

## Read csv file line by line using csv module DictReader object

Now, we will see the example using csv.DictReader module. CSV’s module dictReader object class iterates over the lines of a CSV file as a dictionary, which means for each row it returns a dictionary containing the pair of column names and values for that row.

Code:

from csv import DictReader
# open file in read mode
# pass the file object to DictReader() to get the DictReader object
# iterate over each line as a ordered dictionary
# row variable is a dictionary that represents a row in csv
print(row)
The above code iterated over all the rows of the CSV file. It fetched the content of the row for each row and put it as a dictionary.

### How did it work?

It performed a few steps:

1. Opened the students.csv file and created a file object.
3. Now with the reader object, we iterated it by using the for loop so that it can read each row of the csv as a dictionary of values. Where each pair in this dictionary represents contains the column name & column value for that row.

It also saves the memory as only one row at a time is in the memory.

## Get column names from the header in the CSV file

We have a member function in the DictReader class that returns the column names of a csv file as a list.

Code:

# open file in read mode
# pass the file object to DictReader() to get the DictReader object
# get column names from a csv file
print(column_names)

## Read specific columns from a csv file while iterating line by line

#### Read specific columns (by column name) in a CSV file while iterating row by row

We will iterate over all the rows of the CSV file line by line but will print only two columns of each row.
Code:
from csv import DictReader
# iterate over each line as a ordered dictionary and print only few column by column name
print(row['Id'], row['Name'])
DictReader returns a dictionary for each line during iteration. As in this dictionary, keys are column names and values are cell values for that column. So, for selecting specific columns in every row, we used column name with the dictionary object.

#### Read specific columns (by column Number) in a CSV file while iterating row by row

We will iterate over all the rows of the CSV file line by line but will print the contents of the 2nd and 3rd column.

Code:

from csv import reader
# iterate over each line as a ordered dictionary and print only few column by column Number
print(row[1], row[2])
With the csv.reader each row of the csv file is fetched as a list of values, where each value represents a column value. So, selecting the 2nd & 3rd column for each row, select elements at index 1 and 2 from the list.
The complete code:
from csv import reader
def main():
print('*** Read csv file line by line using csv module reader object ***')
print('*** Iterate over each row of a csv file as list using reader object ***')
# open file in read mode
# pass the file object to reader() to get the reader object
# Iterate over each row in the csv using reader object
# row variable is a list that represents a row in csv
print(row)
# skip first line i.e. read header first and then iterate over each row od csv as a list
# Check file as empty
# Iterate over each row after the header in the csv
# row variable is a list that represents a row in csv
print(row)
print('*** Read csv file line by line using csv module DictReader object ***')
# open file in read mode
# pass the file object to DictReader() to get the DictReader object
# iterate over each line as a ordered dictionary
# row variable is a dictionary that represents a row in csv
print(row)
print('*** select elements by column name while reading csv file line by line ***')
# open file in read mode
# pass the file object to DictReader() to get the DictReader object
# iterate over each line as a ordered dictionary
# row variable is a dictionary that represents a row in csv
print(row['Name'], ' is from ' , row['City'] , ' and he is studying ', row['Course'])
print('*** Get column names from header in csv file ***')
# open file in read mode
# pass the file object to DictReader() to get the DictReader object
# get column names from a csv file
print(column_names)
print('*** Read specific columns from a csv file while iterating line by line ***')
print('*** Read specific columns (by column name) in a csv file while iterating row by row ***')
# iterate over each line as a ordered dictionary and print only few column by column name
print(row['Id'], row['Name'])
print('*** Read specific columns (by column Number) in a csv file while iterating row by row ***')
# iterate over each line as a ordered dictionary and print only few column by column Number
print(row[1], row[2])
if __name__ == '__main__':
main()

## Pandas: Apply a function to single or selected columns or rows in Dataframe

In this article, we will be applying given function to selected rows and column.

For example, we have a dataframe object,

matrix = [(22, 34, 23),
(33, 31, 11),
(44, 16, 21),
(55, 32, 22),
(66, 33, 27),
(77, 35, 11)
]
# Create a DataFrame object
dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef'))
Contents of this dataframe object dgObj are,
Original Dataframe
x    y   z
a 22 34 23
b 33 31 11
c 44 16 21
d 55 32 22
e 66 33 27
f 77 35 11

Now what if we want to apply different functions on all the elements of a single or multiple column or rows. Like,

• Multiply all the values in column ‘x’ by 2
• Multiply all the values in row ‘c’ by 10
• Add 10 in all the values in column ‘y’ & ‘z’

We will use different techniques to see how we can do this.

## Apply a function to a single column in Dataframe

What if we want to square all the values in any of the column for example x,y or z.

We can do such things by applying different methods. We will discuss few methods below:

#### Method 1: Using Dataframe.apply()

We will apply lambda function to all the columns using the above method. And then we will check if column name is whatever we want say x,y or z inside the lambda function. After this, we will square all the values. In this we will be taking z column.

Code:

modDfObj = dfObj.apply(lambda x: np.square(x) if x.name == 'z' else x)
print("Modified Dataframe : Squared the values in column 'z'", modDfObj, sep='\n')
Output:
Modified Dataframe : Squared the values in column 'z'
x y z
a 22 34 529
b 33 31 121
c 44 16 441
d 55 32 484
e 66 33 729
f 77 35 121

#### Method 2: Using [] operator

Using [] operator we will select the column from dataframe and apply numpy.square() method. Later, we will assign it back to the column.

dfObj['z'] = dfObj['z'].apply(np.square)

It will square all the values in column ‘z’.

#### Method 3: Using numpy.square()

dfObj['z'] = np.square(dfObj['z'])

This function will also square all the values in ‘z’.

## Apply a function to a single row in Dataframe

Now, we saw what we have done with the columns. Same thing goes with rows. We will square all the values in row ‘b’. We can use different methods for that.

#### Method 1:Using Dataframe.apply()

We will apply lambda function to all the rows and will use the above function. We will check the label inside the lambda function and will square the row.

Code:

modDfObj = dfObj.apply(lambda x: np.square(x) if x.name == 'b' else x, axis=1)
print("Modified Dataframe : Squared the values in row 'b'", modDfObj, sep='\n')

Output:

Modified Dataframe : Squared the values in row 'b'
x y z
a 22 34 23
b 1089 961 121
c 44 16 21
d 55 32 22
e 66 33 27
f 77 35 11

#### Method 2 : Using [] Operator

We will do what we have done above. We will select the row from dataframe.loc[] operator and apply numpy.square() method on it. Later, we will assign it back to the row.

dfObj.loc['b'] = dfObj.loc['b'].apply(np.square)

It will square all the values in the row ‘b’.

#### Method 3 : Using numpy.square()

dfObj.loc['b'] = np.square(dfObj.loc['b'])

This will also square the values in row ‘b’.

## Apply a function to a certain columns in Dataframe

We can apply the function in whichever column we want. For instance, squaring the values in ‘x’ and ‘y’.

Code:

modDfObj = dfObj.apply(lambda x: np.square(x) if x.name in ['x', 'y'] else x)
print("Modified Dataframe : Squared the values in column x & y :", modDfObj, sep='\n')
All we have to do is modify the if condition in lambda function and square the values with the name of the variables.

## Apply a function to a certain rows in Dataframe

We can apply the function to specified row. For instance, row ‘b’ and ‘c’.

Code:

modDfObj = dfObj.apply(lambda x: np.square(x) if x.name in ['b', 'c'] else x, axis=1)
print("Modified Dataframe : Squared the values in row b & c :", modDfObj, sep='\n')

The complete code is:

import pandas as pd
import numpy as np
def main():
# List of Tuples
matrix = [(22, 34, 23),
(33, 31, 11),
(44, 16, 21),
(55, 32, 22),
(66, 33, 27),
(77, 35, 11)]
# Create a DataFrame object
dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef'))
print("Original Dataframe", dfObj, sep='\n')
print('********* Apply a function to a single row or column in DataFrame ********')
print('*** Apply a function to a single column *** ')
# Method 1:
# Apply function numpy.square() to square the value one column only i.e. with column name 'z'
modDfObj = dfObj.apply(lambda x: np.square(x) if x.name == 'z' else x)
print("Modified Dataframe : Squared the values in column 'z'", modDfObj, sep='\n')
# Method 2
# Apply a function to one column and assign it back to the column in dataframe
dfObj['z'] = dfObj['z'].apply(np.square)
# Method 3:
# Apply a function to one column and assign it back to the column in dataframe
dfObj['z'] = np.square(dfObj['z']
print('*** Apply a function to a single row *** ')
dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef'))
# Method 1:
# Apply function numpy.square() to square the values of one row only i.e. row with index name 'b'
modDfObj = dfObj.apply(lambda x: np.square(x) if x.name == 'b' else x, axis=1)
print("Modified Dataframe : Squared the values in row 'b'", modDfObj, sep='\n')
# Method 2:
# Apply a function to one row and assign it back to the row in dataframe
dfObj.loc['b'] = dfObj.loc['b'].apply(np.square)
# Method 3:
# Apply a function to one row and assign it back to the column in dataframe
dfObj.loc['b'] = np.square(dfObj.loc['b'])
print('********* Apply a function to certains row or column in DataFrame ********')
dfObj = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef'))
print('Apply a function to certain columns only')
# Apply function numpy.square() to square the value 2 column only i.e. with column names 'x' and 'y' only
modDfObj = dfObj.apply(lambda x: np.square(x) if x.name in ['x', 'y'] else x)
print("Modified Dataframe : Squared the values in column x & y :", modDfObj, sep='\n')
print('Apply a function to certain rows only') #
Apply function numpy.square() to square the values of 2 rows only i.e. with row index name 'b' and 'c' only
modDfObj = dfObj.apply(lambda x: np.square(x) if x.name in ['b', 'c'] else x, axis=1)
print("Modified Dataframe : Squared the values in row b & c :", modDfObj, sep='\n')
if __name__ == '__main__':
main()
Output:
Original Dataframe
x y z
a 22 34 23
b 33 31 11
c 44 16 21
d 55 32 22
e 66 33 27
f 77 35 11
********* Apply a function to a single row or column in DataFrame ********
*** Apply a function to a single column ***
Modified Dataframe : Squared the values in column 'z'
x y z
a 22 34 529
b 33 31 121
c 44 16 441
d 55 32 484
e 66 33 729
f 77 35 121
*** Apply a function to a single row ***
Modified Dataframe : Squared the values in row 'b'
x y z
a 22 34 23
b 1089 961 121
c 44 16 21
d 55 32 22
e 66 33 27
f 77 35 11
********* Apply a function to certains row or column in DataFrame ********
Apply a function to certain columns only
Modified Dataframe : Squared the values in column x & y :
x y z
a 484 1156 23
b 1089 961 11
c 1936 256 21
d 3025 1024 22
e 4356 1089 27
f 5929 1225 11
Apply a function to certain rows only
Modified Dataframe : Squared the values in row b & c :
x y z
a 22 34 23
b 1089 961 121
c 1936 256 441
d 55 32 22
e 66 33 27
f 77 35 11