Python

Python Data Persistence – Excel with Pandas

Python Data Persistence – Excel with Pandas

Pandas library is extremely popular with data scientists as it provides easy-to-use tools for data manipulation and analysis. Different types of data structures are available in Pandas. Of which, the data frame is most commonly used. Dataframe in Pandas represents a two-dimensional tabular data structure with labeled columns which may be of different data types.

Before we explore the DataFrame object and its relationship with Excel, we have to ensure that the Pandas package is installed in the current Python environment. If you are using Anaconda distribution, Pandas is already installed in it. Otherwise, you may have to get it using pip utility in the virtual environment we have created for this chapter.

E:\excelenv>scripts\activate
(excelenv) E:\excelenv>scripts\pip3 install pandas

During installation, few more libraries like NumPy and others are also installed as they are internally used by Pandas.
As mentioned earlier, the DataFrame object of Pandas is a two-dimensional table-like structure, with labeled columns that may be of different data types (This is similar to SQL table isn’t it?). It can be constructed using various data objects as inputs such as Python lists or dictionaries. Of particular relevance to us in this chapter is creating a DataFrame object using a list of dictionary items.
Let us first define a list, each item in which is a dictionary object. It has three k-v pairs as shown below:

Example

>>> pricelist=[{1ProductID1 1, 'Name ':'Laptop', 'price': 25000} ,
{'ProductID' : 2 , 'Name':' TV' ,'price' : 40000} ,
{'ProductID' : 3 , 'Name 1 : ' Router', 'price’ : 2 0 0 0 } ,
{'ProductID' :4, 'Name':' Scanner', 'price' : 5000 } ,
{'ProductID1 : 5, 'Name':' Printer 1,'price' : 9000} ]

Use this list object as an argument to the constructor of the DataFrame object. Example 10.15

>>> import pandas as pd
>>> df=pd.DataFrame(pricelist)
>>> df
Name ProductID price
0 Laptop 1 25000
1 TV 2 40000
2 Router 3 2000
3 Scanner 4 5000
4 Printer 5 9000

Example

Incidentally, conversion to/from DataFrame and many other data formats is possible. This includes JSON, CSV, pickle, SQL, and so on. As a quick example, we shall try to read SQLite table data, using the read_sql_ query () function.

Example

>>> import pandas as pd
>>> import sqlite3
>>> con=sqlite3.connect('mydb.sqlite')
>>> df = pd.read_sql_query("SELECT * FROM Products;", con)
>>> df
ProductID Name Price
0 1 Laptop 27500
1 3 Router 3000
2 4 Scanner 5500
3 5 Printer 11000
4 6 Mobile 16500

At the conclusion of this chapter, you must have got a fair idea of how you can use Python to manipulate Excel workbook documents. While openpyxl package is all about automating the functionality of Excel software, data in Excel sheets can be brought in Pandas data frames for high-level manipulations and analysis and exported back.

The next two chapters of this book deal with the exciting world of NOSQL databases and the way Python can interact with two of very popular NOSQL databases – MongoDB, and Cassandra.

Python Data Persistence – Excel with Pandas Read More »

Python Data Persistence – Installation of MongoDB

Python Data Persistence – Installation of MongoDB

MongoDB server software is available in two forms: Community edition (open source release) and Enterprise edition (having additional features such as administration, and monitoring).

The MongoDB community edition is available for Windows, Linux as well as MacOS operating systems at https://www.mongodb.com/ download-center/community. Choose the appropriate version as per the OS and architecture of your machine and install it as per the instructions on the official website. Examples in this chapter assume that MongoDB is installed on Windows in the e:\mongodb folder.
Start MongoDB server from command terminal using the following command:

E:\mongodb\bin>mongod
. . . 
waiting for connections on port 27017

The server is now listening to connection requests from clients at port number 22017 of the localhost. (Server’s startup logs are omitted in the above display). To stop it, press ctrl-C. MongoDB databases are stored in the bin\data directory. You can specify alternative location though by specifying –dbpath option as follows:

Example

E:\mongodb\bin>mongod --dbpath e:\test

Now, start Mongo shell in another terminal.

E:\mongodb\bin>mongo 
MongoDB shell version v4.0.6 
connecting to:
mongodb://127.0.0.1:27017/?gssapiServiceName=mongodb Implicit session: 
session { "id" : UUID("0d848bll- acf7-4d30-83df-242dld7fa693") }
MongoDB server version: 4.0.6
---
>

Mongo shell is a Javascript interface to MongoDB server. It is similar to the SQLite shell or MsSQL console, as we have seen earlier chapter. The CRUD operations on the MongoDB database can be performed from here.

Python Data Persistence – Installation of MongoDB Read More »

Python Data Persistence – Creating a workbook

Python Data Presistence – Creating a workbook

An object of the Workbook class represents an empty workbook with one worksheet. Set it to be active so that data can be added to it.

Example

>>> from openpyxl import Workbook
>>> wb=Workbook( )
>>> sheet1=wb.active
>>> sheetl.title='PriceList'

Each cell in the worksheet is identified by a string made of Column name and row number. The top-left cell is ‘A1’. The normal assignment operator is used to storing data in a cell.

>>> sheet1['A1']='Hello World'

(NB: These operations as well as others that will be described in this chapter will not be immediately visualized in the Python environment itself. The workbook so created needs to be saved and then opened using Excel application to see the effect)

There is another way to assign value to a cell. The cell( ) method accepts row and column parameters with integer values. Column names A, B, C, and so on; will be denoted by 1,2,3, and so on. Rows are also numbered from 1.

>>> sheet1.cell(row=1, column=1).value='Hello World'

Contents of cells are retrieved from their value attribute.

>>> sheet1['a1'].value 
'Hello World'

Use the save ( ) method to store the workbook object as an Excel document. Later, open it to verify the above process, (figure 10.2)

Python Data Presistence - Creating a workbook chapter 10 img 1

Python Data Persistence – Creating a workbook Read More »

Python Data Persistence – Read Data from Worksheet

Python Data Persistence – Read Data from Worksheet

To read data from an existing Excel document, we need to load it with the load_workbook ( ) function.

>>> from openpyxl import load_workbook 
>>> wb=load_workbook (filename= ' test. xlsx ' )

Set the desired worksheet as active and retrieve the value of any cell.

Example

>>> sheet1.cell (row=1, column=1).value
'Hello World'
>>> #or
>>> sheetl['A1'] . value
'Hello World'

Following script writes data in a list object, each item being a tuple comprising of ProductID, name, and price.

Example

#saveworkbook.py 
from openpyxl import Workbook 
wb = Workbook( ) 
sheet1 = wb.active 
sheet1.title='PriceList' 
sheet1.cell(column=l, row=l, value='Pricelist') pricelist=[('ProductID', 'Name', 'Price'), 
                          (1,'Laptop',25000),(2, 'TV',40000), 
                          (3, 'Router' ,2000) , (4, 'Scanner',5000) , 
                          (5, 'Printer 1,9000) , (6, 'Mobile',15000)] 
        for col in range(1,4): 
                  for row in range(1,7): 
                     sheet1.cell(column=col, row=1+row, 
value=pricelist[row-1] [col-1]) 
wb. save (filename = "test.xlsx")

The Excel document in the current directory looks like this: (figure 10.3)

Python Data Presistence - Read Data from Worksheet chapter 10 img 1

Let us find out how to perform certain formatting actions on worksheet data.

Python Data Persistence – Read Data from Worksheet Read More »

Python Data Persistence – Querying Cassandra Table

Python Data Persistence – Querying Cassandra Table

Predictably. CQL also lias the SELECT statement to fetch data from a Cassandra table. The easiest usage is employing to fetch data from all columns in a table.

cq1sh:mykeyspace> select * from products;
productid   |    name        |  price
------------+-------------+---------
      5          |   ’Printer'      |  9000
      1           |  'Laptop'     |   25000
      2          |  'TV'             |  40000
      4           |  'Scanner'    |   5000
      6           | 'Mobile'      |   15000
      3           | 'Router'       |  2000
(6 rows)

All conventional logical operators are allowed in the filter criteria specified with the WHERE clause. The following statement returns product names with prices greater than 10000.

cq1sh:mykeyspace> select * from products where price>10000 allow filtering;
productid   |    name        |  price
------------+-------------+---------
      1          |  'Laptop'     |   25000
      2          |  'TV'            |  40000
      6          | 'Mobile'      |   15000
(3 rows)

Use of ALLOW FILTERING is necessary here. By default, CQL only allows select queries where all records read will be returned in the result set. Such queries have predictable performance. The ALLOW FILTERING option allows to explicitly allow (some) queries that require filtering. If the filter criteria consist of partition key columns only = and IN operators are allowed.

UPDATE and DELETE statements of CQL are used as in SQL. However, both must have filter criteria based on the primary key. (Note the use of’—’ as a commenting symbol)

cq1sh:mykeyspace> - -update syntax
cq1sh:mykeyspace> update new products set price=45000
where productID=2;
cq1sh:mykeyspace> --delete syntax
cq1sh:mykeyspace> delete from new products where
productID=6;

Python Data Persistence – Querying Cassandra Table Read More »

Python Data Persistence – Python Cassandra Driver

Python Data Persistence – Python Cassandra Driver

Cassandra’s Python module has been provided by apache itself. It works with the latest version CQL version 3 and uses Cassandra’s native protocol. This Python driver also has ORM API in addition to core API which is similar in many ways to DB-API.

To install this module, use the pip installer as always.

E:\python37>scripts\pip3 install Cassandra-driver

Verify successful installation by following commands:

Example

>>> import cassandra 
>>> print (cassandra.__version__)
3.17.0

To execute CQL queries, we have to set up a Cluster object, first.

Example

>>> from cassandra.cluster import Cluster 
>>> clstr=Cluster( )

Next up, we need to start a session by establishing a connection with our keyspace in the cluster.

Example

>>> session=clstr.connect('mykeyspace')

The ubiquitous execute( ) method of the session object is used to perform all CQL operations. For instance, the primary SELECT query over the ‘products’ table in ‘niykeypace’ returns a result set object. Using atypical for loop, all rows can be traversed.

Example

#cassandra-select.py from cassandra.cluster import 
Cluster clstr=Cluster() session=clstr.connect(1mykeyspace1) . 
rows=session.execute("select * from products;") for row in rows: 
print (’Manufacturer: {} ProductID:{} Name:{ }
 priceformat(row[1] ,row [0] , 
row [2], row [3]) )

Output

E:\python37>python cassandra-select.py Manufacturer: ’Epson’ ProductID:5 Name:1 Printer’
price:9000
Manufacturer: 1IBall’ ProductID:10 Name:’Keyboard1
price : 1000
Manufacturer: ’Acer’ ProductID:l Name:’Laptop’
price:25000
Manufacturer: ’Acer’ ProductID:8 Name:’Tab’
price:10000
Manufacturer: ’Samsung’ ProductID:2 Name:’TV’
price:40000
Manufacturer: ’Epson’ ProductID:4 Name:1 Scanner’
price:5000
Manufacturer: ’IBall’ ProductID:7 Name:’Mouse' price:500
Manufacturer: ’Samsung’ ProductID:6 Name:’Mobile’
price:15000
Manufacturer: ’Samsung’ ProductID:9 Name:’AC’
price:35000
Manufacturer: ’IBall’ ProductID:3 Name:’Router’
price:2000

Python Data Persistence – Python Cassandra Driver Read More »

Python Data Persistence – Parameterized Queries

Python Data Persistence – Parameterized Queries

The cassandra.query submodule defines the following Statement classes:
SimpleStatement: A simple, unprepared CQL query contained in a query string. For example:

Example

from cassandra.query import SimpleStatement 

stmt=SimpleStatement("select * from products;") 

rows=session.execute(stmt)

BatchStatement: A batch combines multiple DML operations (such as INSERT, UPDATE, and DELETE) and executes at once to achieve atomicity. For the following example, first create a ‘customers’ table in the current keyspace.

create table customers
. . . (
. . . custID int primary key, 
. . . name text,
. . . GSTIN text
. . . ) ;

Customer data is provided in the form of a list of tuples. Individual INSERT query is populated with each tuple and added in a BatchStatement. The batch is then executed at once.

Example

#cassandra-batch.py
from cassandra.cluster import Cluster 
clstr=Cluster( )
session=clstr.connect(1mykeyspace') 
custlist= [ (1, 'Ravikumar', '2 7AAJPL7103N1ZF') ,
(2, 'Pate1' , ' 24ASDFG1234N1ZN' ) ,
(3, 'Nitin' , '27AABBC7895N1ZT') ,
, (4, 1Nair' , '32MMAF8963N1ZK') ,
(5,'Shah','24BADEF2002N1ZB'),
(6,'Khurana','07KABCS1002N1ZV'),
(7,'Irfan','05IIAAV5103N1ZA1),
(8,'Kiran','12PPSDF22431ZC'},
(9,'Divya','15ABCDE1101N1ZA'),
(10, 'John', '2 9AAEEC42 58E1ZR' )] 
from cassandra.query import SimpleStatement, 
BatchStatement 
batch=BatchStatement( ) 
for cst in custlist:
             batch . add(SimpleStatement("INSERT INTO customers 
(custID,name,GSTIN) VALUES (%s, %s, %s) ") , 
\
                                      (cst [0], cst[1],cst [2] ) ) 
session.execute(batch)

 

Run the above code and then check rows in the ‘customers’ table in the CQL shell.

cq1sh:mykeyspace> select * from customers;


custid       |  gstin                              |    name
-----------+--------------------------+-------------
    5           |  24BADEF2002N1ZB      |     Shah
   10          |  29AAEEC4258E1ZK       |     John
    1           |  27AAJPL7103N1ZF       |     Ravikumar
    8           |  12PPSDF22431ZC         |     Kiran
    2           |  24ASDFG1234N1ZN     |    Patel
    4           |  32MMAF8963N1ZK      |    Nair
     7          | 05IIAAV5103N1ZA         |    Irfan
     6          | 07KABCS1002N1ZV       | Khurana
     9          |  15ABCDEU01N1ZA       |    Divya
     3          | 27AABBC7895N1ZT       |    Nitin
(10 rows)

PreparedStatement: Prepared statement contains a query string that is parsed by Cassandra and then saved for later use. Subsequently, it only needs to send the values of parameters to bind. This reduces network traffic and CPU utilization because Cassandra does not have to re-parse the query each time. The Session.prepare( ) method returns a PreparedStatement instance.

Example

#cassandra-prepare.py from Cassandra.cluster import Cluster 
from cassandra.query import PreparedStatement clstr=Cluster( ) 
session=clstr.connect{'mykeyspace') stmt=session.prepare("INSERT INTO 
customers (custID, name,GSTIN) VALUES (?,?,?)") 
boundstmt=stmt.bind{[11,'HarishKumar1, '12 PQRDF2 2431ZN'] ) 
session.execute(boundstmt)

Each time, the prepared statement can be executed by binding it with a new set of parameters. Note that, the PreparedStatement uses ‘?’ as a placeholder and not ‘%s’ as in BatchStatement.

 

Python Data Persistence – Parameterized Queries Read More »

Python Data Persistence – Table with Compound Partition Key

Python Data Persistence – Table with Compound Partition Key

In the above example, the products table had been defined to have a partition key with a single primary key. Rows in such a table are stored in different nodes depending upon the hash value of the primary key. However, data is stored across the cluster using a slightly different method when the table has a compound primary key. The following table’s primary key comprises two columns.

cq1sh:mykeyspace> create table products
                       . . . (
                       . . . productID int,
                       . . . manufacturer text,
                       . . . name text,
                       . . . price int,
                       . . . primary key(manufacturer, productID)
                       . . . ) ;

For this table, ‘manufacturer’ is the partition key and ‘productID’ behaves as a cluster key. As a result products with similar ‘manufacturer’ are stored in the same node. Let us understand with the help of the following example. The table contains the following data.

Example

cq1sh:mykeyspace> select * from products;
productid       |   manufacturer  |    name            |       price
---------------+-----------------+----------------+-----------
        5             |     'Epson'            |   'Printer'          |     9000
       10           |    'IBall'               |   'Keyboard'     |     1000
        1            |     'Acer'               |   'Laptop'          |    25000   
        8            |     'Acer'              |   'Tab'              |    10000 
        2            |    'Samsung'       |    'TV'               |     40000
        4            |     'Epson'           |    'Scanner'      |     5000
        7            |     'IBall'              |     'Mouse'        |     500
        6            |    'Samsung'      |    'Mobile'        |   15000
        9            |    'Samsung'      |    'AC'              |    35000
        3            |    ’IBall'              |    'Router'        |   2000  
(10 rows)

Rows in the above table will be stored among nodes such that products from the same manufacturer are together, (figure 12.5)

Python Data Presistence - Table with Compound Partition Key chapter 12 img 1

Python Data Persistence – Table with Compound Partition Key Read More »

Python Data Persistence – List Comprehension

Python Data Persistence – List Comprehension

Python supports many functional programming features. List comprehension is one of them. This technique follows mathematical set builder notation. It is a very concise and efficient way of creating a new list by performing a certain process on each item of the existing list. List comprehension is considerably efficient than processing a list by for loop.
Suppose we want to compute the square of each number in a list and store squares in another list object. We can do it by a for loop as shown below:

Example

#new list with loop
list1= [4,7,2,5,8]
list2= [ ]
for num in listl:
sqr=num*num
list2.append(sqr)
print ('new list:', list2)

The new list will store squares of the existing list.
The list comprehension method achieves the same result more efficiently. List comprehension statement uses the following syntax:

new list = [x for x in sequence]

We use the above format to construct a list of squares by using list conmprehension.

Example

>>> list1=[ 4 , 7 , 2 , 5 , 8 ]
>>> list2=[num*num for num in list1]
>>> list2
[ 16 , 49 , 4 , 25 , 64 ]
> > >

We can even generate a dictionary or tuple object as a result of list comprehension.

Example

>>> list1= [ 4 , 7 , 2 , 5 , 8 ]
>>> dict1=[{num:num*num} for num in list1]
>>> dict1
[{4 16}, {7: 49}, {2:4}, {5 : 25},{8 : 64 } ]
>>>

List comprehension works with any iterable. Nested loops can also be used in a list comprehension expression. To obtain list of all combinations of characters from two strings:

Example

>>> list1= [x+y for x in 'ABC' for y in ' 123 ' ]
>>> list1
[ 'A1' , 'A2', 'A3', 'B11,' B2 ', 'B3', 'Cl', 'C2', ' C3 ']
>>>

The resulting list stores all combinations of one character from each string.

We can even have if condition in a list comprehension. The following statement will result in a list of all non-vowel alphabets in a string.

Example

>>> consonents=[char for char in "Simple is better
than complex" if char not in [ 'a', 'e' , 'i' , 'o' , 'U' ] ]
>>> consonents
[ 'S' , 'm' , 'p' , '1' , ' ' , 'b' , 't' , 'r' , ' ' , 't' , 'h' , 'n' , ' ' , 'c' , 'm' , 'p' , 'l' , 'x' ]

Conditionals and looping constructs are the two most important tools in a programmer’s armory. Along with them, we also learned about controlling repetition with break and continue statements.
The next chapter introduces the concept of structured programming through the use of functions and modules.

Python Data Persistence – List Comprehension Read More »

Python Data Persistence – Using range

Python Data Persistence – Using range

Python’s built-in range ( ) function returns an immutable sequence of numbers that can be iterated over by for loop. The sequence generated by the range ( ) function depends on three parameters.

The start and step parameters are optional. If it is not used, then the start is always 0 and the step is 1. The range contains numbers between start and stop-1, separated by a step. Consider an example 2.15:

Example

range (10) generates 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9

range ( 1 , 5 ) results in 1 , 2 , 3 , 4

range ( 20 , 30 , 2 ) returns 20 , 22 , 24 , 26 , 28

We can use this range object as iterable as in example 2.16. It displays squares of all odd numbers between 11-20. Remember that the last number in the range is one less than the stop parameter (and step is 1 by default)

Example

#for-3.py
for num in range( 11 , 21 , 2 ):
sqr=num*num
print ( ' sqaure of { } is { } ' . format( num , sqr ) )

Output:

E:\python37>python for-3.py 
square of 11 is 121 
square of 13 is 169 
square of 15 is 225 
square of 17 is 289 
square of 19 is 361

In the previous chapter, you have used len ( ) function that returns a number of items in a sequence object. In the next example, we use len ( ) to construct a range of indices of items in a list. We traverse the list with the help of the index.

Example

#for-4.py
numbers=[ 4 , 7 , 2 , 5 , 8 ]
for indx in range(len(numbers)):
sqr=numbers[indx]*numbers[indx]
print ( ' sqaure of { } is { } ' . format ( numbers [ indx ] , sqr ) )

Output:

E:\python3 7 >python for - 4.py 
sqaure of 4 is 16 
sqaure of 7 is 49 
sqaure of 2 is 4 
sqaure of 5 is 25 
sqaure of 8 is 64 

E:\python37>

Have a look at another example of employing for loop over a range. The following script calculates the factorial value of a number. Note that the factorial of n (mathematical notation is n!) is the cumulative product of all integers between the range of 1 to n.

Example

#factorial.py
n=int ( input ( " enter number . . " ) )
#calculating factorial of n
f = 1
for i in range ( 1 , n+1 ):
f = f * i
print ( ' factorial of { } = { } ' . format ( n , f ) )

Output:

E:\python37>python factorial.py 
enter number..5 
factorial of 5 = 120

Python Data Persistence – Using range Read More »