Author name: Prasanna

Python Data Persistence – PyMongo – Relationships

Python Data Persistence – PyMongo – Relationships

MongoDB is a non-relational database. However, you can still establish relationships between documents in a database. MongoDB uses two different approaches for this purpose. One is an embedded approach and the other is a referencing approach.

Embedded Relationship

In this case, the documents appear in a nested manner where another document is used as the value of a certain key. The following code represents a ‘customer’ document showing a customer (with ‘_ id ’= 1) buys two products. A list of two product documents is the value of the ‘prods’ key.

Example

>>> cust.insert_one({'_id':1,'name':'Ravi',
                                          'prods':[
                                                 { 'Name':'TV',
'price':40000},


{'Name':'Scanner','price':5000}
                                ]
              })

Querying such an embedded document is straightforward as all data is available in the parent document itself.

Example

>>> doc=cust .find_one ({ '_id' : 1}, { 'prods ' : 1})
>>> doc
{' id': 1, 'prods': [{'Name': 'TV', 'price': 40000},
{'Name': 'Scanner', 'price': 5000}]}

The embedded approach has a major drawback. The database is not normalized and, hence, data redundancy arises. Assize grows, it may affect the performance of reading/write operations.

Reference Relationship

This approach is somewhat similar to the relations in a SQL-based database. The collections (equivalent to the RDBMS table) are normalized for optimum performance. One document refers to the other with its ‘_id’ key.

Recollecting that instead of automatically generated random values for ‘_id’, they can be explicitly specified while inserting a document in a collection, following is the constitution of ‘products’ collection.

Example

>>> list (prod.find ( ) '
[{ '_4d' : 1, 'Name' 'Laptop', 'price': 25000}, {'_
id': 2, Name': 'TV', 'price': 40000}, {'_id': 3,
'Name': Router', price': 2000}, {'_id': 4, 'Name':
'Scanner , 'price' 5000}, {' id': 5, 'Name':
'Printer , 'price' 9000}]

We now create a ‘customers’ collection.

Example

>>> db.create_collection('customers 1) 
>>> cust=db['customers']

The following document is inserted with one key ‘prods’ being a list of ‘ _id’ s from products collection.

Example

>>> cust .insert_one({'_id':1, 'Name' 'Ravi',
'prods': [2,4] })

However, in such a case, you may have to run two queries: one on the parent collection, and another on related collection. First, fetch the _ids of the related table.

Example

>>> doc=cust .find_one ({'_id' : 1}, {'prods ' : 1})
>>> prods
[2, 4]

Then, iterate over the list and access the required field from the related document.

Example

>>> for each in prods:
doc=prod .find_one ({ ' id' :each})
print (doc['Name'])
TV
Scanner

The reference approach can be used to build one-to-one or one-to-many types of relationships. The choice of approach (embedded or reference) largely depends on data usage, the projected growth of the size of the document, and the atomicity of the transaction.

In this chapter, we had an overview of the MongoDB database and its Python interface in the form of the PyMongo module. In the next chapter, another NoSQL database – Cassandra – is going to be explained along with its association with Python.

Python Data Persistence – PyMongo – Relationships Read More »

Python Data Presistence – PyMongo – Add Collection

Python Data Presistence – PyMongo – Add Collection

Create a new database object by using any name currently not in the list.

Example

>>> db=client.newdb

The Database is actually created when the first document is inserted. The following statement will implicitly create a ‘products’ collection and multiple documents from the given list of dictionary objects.

Example

>>> pricelist=[{ 'ProductID : 1 / 'Name' :'Laptop',
'price' 25000},{ 'ProductID :2, 'Name' :'TV',
'price' 40000},{ 'ProductID : 3, 'Name' :'Router',
'price' 2000},{' ProductID' 4, 'Name': 'Scanner',
'price' 'price' 5000},{' 9000}] ProductID' 5, Name': 'Printer',
>>> db.products.insert_many(pricelist)

You can confirm the insertion operation by find ( ) method in the mongo shell, as we have done earlier.

We create a collection object explicitly by using the create_collection( ) method of the database object.

Example

>>> db.create_collection(1 customers')

Now, we can add one or more documents in it. The following script adds documents in ‘customers’ collection.

Example

from pymongo import MongoClient client=MongoClient() 
db=client.newdb db.create_collection("customers") 
cust=db['customers' ]
 custlist=[{'CustID':1,1 Name' :'Ravikumar', 'GS- TIN':'27AAJPL7103N1ZF'}, 
{'CustID' :2, 'Name': 'Patel', 'GSTIN': '24ASDFG1234N- 1ZN'}, 
{'CustID' : 3, ' Name' : 'Nitin', 'GSTIN': '27AABBC7895N- 1ZT'}, 
{'CustID' : 4, ' Name' : 'Nair',' GSTIN':' 32MMAF8963N1ZK'} 
{'CustID' :5, ' Name' : 'Shah',' GSTIN':' 24BADEF2002N- 1ZB'}, 
{'CustID' :6, ' Name' : 'Khurana .', 'GSTIN ':'07KABCS1002N- lZV'}, 
{'CustID' :7, ' Name' : 'Irfan', 'GSTIN': '05IIAAV5103N- 1ZA'}, 
{'CustID' :8, ' Name' : 'Kiran', 'GSTIN': '12PPSD- F22431ZC' 
{'CustID' }. :9, ' Name' : 'Divya', 'GSTIN': '15ABCDE1101N- 1ZA'}, 
{'CustID' : 10 ,'Name' :'John', 'GS- TIN':'29AAEEC4258E1ZK'}] 
cust.insert_many(custlist) / client.close()

 

Python Data Presistence – PyMongo – Add Collection Read More »

Python Data Persistence – PyMongo Module

Python Data Persistence – PyMongo Module

PyMongo module is an official Python driver for MongoDB database developed by Mongo Inc. It can be used on Windows, Linux, as well as MacOS. As always, you need to install this module using the pip3 utility.

pip3 install pymongo

Before attempting to perform any operation on a database, ensure that you have started the server using ‘mongod’ command and the server is listening at port number 22017.

To let your Python interpreter interact with the server, establish a connection with the object of MongoClient class.

Example

>>> from pymongo import MongoClient 
>>> client=MongoClient( )

The following syntax is also valid for setting up connection with server.

>>> client = MongoClient('localhost', 27017)
#or
client = MongoClient('mongodb://localhost:27017')

In order to display currently available databases use list_database_ namesO method of MongoClient class.

Example

>>> client.list_database_names( )
[' admin' , ' config' , 1 local' , ' mydb' ]

Python Data Persistence – PyMongo Module Read More »

Python Data Persistence – User-defined Types

Python Data Persistence – User-defined Types

While executing the queries, Python data types are implicitly parsed to corresponding CQL types as per the following table:(figure 12.1)

Python TypeCQL Type
NoneNULL
BoolBoolean
FloatFloat, double
Int, longInt, bigint , variant , smallest, tinyint , counter
Decimal, decimalDecimal
Str , UnicodeAscii, varchar , test
Buffer, byte arrayBlob
DateDate
DatetimeTimestamp
TimeTime
List , tuple

Generator

List
Set , frozenestSet
Dict , OrderedDistMap
Uuid.UUIDTimeuuid.uuid

 

In addition to the above built-in CQL data types, the Cassandra table may have a column of a user-defined type to which an object of Python class can be mapped.

Cassandra provides a CREATE TYPE statement to define a new user-defined type which is used as a type for a column in a table defined with the CREATE TABLE statement.

In the script given below (Cassandra-udt.py), we define a Cassandra user-defined type named ‘contacts’ and use it as the data type of the ‘contact’ column in the ‘users’ table. The register__user_type ( ) method of cluster object helps us to map Python class ‘Contactlnfo’ to the user-defined type.

Example

#cassandra-udt.py 
from cassandra.cluster import Cluster 
cluster = Cluster(protocol_version=3) 
session = cluster.connect( ) 
session.set_keyspace(1mykeyspace1) 
session.execute("CREATE TYPE contact (email text, phone text)") 
session.execute("CREATE TABLE users (userid int PRIMARY KEY, name text, contact frozen<contact>)") 
class Contactlnfo:          
      def__init__(self, email, phone):                       
           self.email = email                       
            self.phone = phone 
cluster.register_user_type('mykeyspace', 'contact', Contactlnfo) 
# insert a row using an instance of Contctlnfo session.execute("INSERT INTO users (userid, name, contact) 
VALUES (%s, %s, %s)",                           
   (1, .'Admin', Contactlnfo
("admin@ testserver.com", '9988776655')))

The following display of the CQL shell confirms the insertion operation of the above script.

cq1sh:mykeyspace> select * from users; 
    userid          |                            contact                                            |  name
------------+------------------------------------------------------------+-----------
        1        | {email: '[email protected]', phone: '9988776655'}  |   Admin 
(1 rows)

In this chapter, we learned about the basic features of the Cassandra database, and importantly how to perform read/write operations on it with Python.

Final thoughts

This is also the last chapter of this book. Starting with the basics of Python, we learned how to work with different data storage formats. This work is neither a Python handbook nor a SQL/NoSQL database guide. Instead, it is intended to be a simple and practical explanation of Python interfaces with different data persistence avenues available today.

This book concentrates primarily on basic CRUD operations on SQL and NoSQL databases and other data file formats. However, these techniques are merely complementary to the real data crunching a data scientist needs to perform. The next logical thing for a curious reader would be to acquire data manipulation and visualization skills for which Python has very rich and powerful libraries.
1 hope you enjoyed this book as much as 1 did bring to you. Feedback, suggestions, and corrections if any are most welcome so that subsequent editions can be improved.
Thanks.

Python Data Persistence – User-defined Types Read More »

Python Data Persistence – MongoDB – Querying Collection

Python Data Persistence – MongoDB – Querying Collection

Retrieving data from the database is always an important operation. MongoDB’s collection has found ( ) method with which documents are fetched. Without any argument, the find ( ) method returns a result set of all documents in a collection. In effect, it is equivalent to ‘SELECT * FROM <table>‘ in SQL.

> db . products . find () 
{ "_id" : Objectld("5c8d420c7bebaca49b767db3"), 
"ProductID" : 1, "Name" : "Laptop", "price" : 25000 
} 
{ "_id" : Objectld("5c8d420c7bebaca49b767db4"), 
"ProductID" : 2, "Name" : "TV", "price" : 40000 } 
{ "_id" : Objectld("5c8d420c7bebaca49b767db5"), 
"ProductID" : 3, "Name" : "Router", "price" : 2000 } 
{ "__id" : Objectld("5c8d420c7bebaca49b767db6"), 
"ProductID" : 4, "Name" : "Scanner", "price" : 5000 } 
{ "_id" : Objectld("5c8d420c7bebaca49b767db7"), 
"ProductID" : 5, "Name" : "Printer", "price" : 9000 
}

Note that, ‘_id’ key is automatically added to each document. The value of each _id is of ObjectId type and is unique for each document.
Invariably, you would want to apply to the result set returned by find ( ). It is done by putting the key-value pair in its parenthesis. In its generalized form, the conditional query is written as follows:

Example

db. collection .find ({ "key" : "value"})

The following statement retrieves a document whose ‘Name’ key has ‘TV’ value.

Example

> db .products .find ({"Name": "TV"}) 
{ "_id" : Objectld("5c8d420c7bebaca4 9b76 7db4"), 
"ProductID" : 2, "Name" : "TV", "price" : 40000 }

MongoDB doesn’t use traditional logical operator symbols. Instead, it has its own operators, as listed below:

The operators are used in the find ( ) method to apply the filter. The following statement returns products with a price> 10000.

> db.products .find ({"price" : {$gt: 10000 }}) 
{ 
"_id" : Objectld("5c8d420c7bebaca49b767db3"), 
"ProductID" : 1, "Name" : "Laptop", "price" : 25000 
} 
{ "_id" : Objectld("5c8d420c7bebaca49b767db4"), 
"ProductID" : 2, "Name" : "TV", "price" : 40000
}

The Sand, as well as Sor operators, are available for compound logical expressions. Their usage is, as follows:

Example

db. collection.find ($and: [ {"keyl" :"value 1"},
{"key2":"value2"} ] )

Use the following command to fetch products with prices between 1000 and 10000.

> db . products . find ( { $and: [ { "price" : { $gt: 1000 } } , {"price":{$lt:10000}} ] }) 
{ "_id" : Objectld("5c8d420c7bebaca49b767db5 " ) , 
"ProductID" : 3, "Name" : "Router", "price" : 2000 
} 
{ "_id" : Objectld("5c8d420c7bebaca49b767db6"), 
"ProductID" : 4, "Name" : "Scanner", "price" : 5000 
} 
{ "_id" : Obj ectld("5c8d42 0c7bebaca4 9b767db7") ,
 "ProductID" : 5, "Name" : "Printer", "price" : 9000
}

 

Python Data Persistence – MongoDB – Querying Collection Read More »

Python Data Persistence – Charts

Python Data Persistence – Charts

One of the most attractive features of the MS Excel application is to dynamically generate various types of charts based upon data in worksheets. The openpyxl package has a chart module that offers the required functionality. In this section, we shall see how we can render charts programmatically.

Python Data Presistence - Charts chapter 10 img 1

The chart module defines classes for all types of charts such as BarChart and LineChart. The chart requires data range and category range to be defined. These ranges are defines with the Reference () function. It stipulates row and column numbers of top-left and bottom-right cells of the desired range.

In the above worksheet, B2:D7 is the data range including the column labels, and A1: A7 range is the categories range.

Example

from openpyxl import load_workbook 
wb = load_workbook(1 example.xlsx') 
ws = wb.active 
from openpyxl.chart import BarChart, Reference 
values = Ref erence (ws, min_col=2, min__row=2, max_ col=4, max_row=7) 
ctg=Reference(ws, min_col=l,min_row=3, max_col=l, max_row=7)

The chart object is configured by add_data() and set_ categories ( ) methods. The add_data() method takes the data range as the first parameter. If the titles_f rom_data parameter is set to True, the first row in the data range is used for the series legend title. The title of the chart, X-axis and Y-axis are also set by respective properties.

Example

c1 = BarChart( )
c1.add_data(values, titles_from_data=True
c1 . title = "Bar Chart"
c1.x_axis.title = 'Months1
c1. y__axis . title = 'Sales'
ws . add__chart (cl, "A10")
c1.set_categories(ctg)
wb. save (filename= ' example .xlsx' )

Run the above script and then open the workbook document. It will now have the bar chart stored in it. (figure 10.7)

Python Data Presistence - Charts chapter 10 img 2

Another example of a Line chart is explained below. The chart configuration has only one change. The chart object itself is of LineChart ( ) type.

Example

from openpyxl,chart import LineChart 
c2 = LineChart()
c2.add_data(values, titles_from_data=True)#legends
c2.title = "Line Chart"
c2,x_axis.title = 'Months'
c2.y_axis.title = 'Sales'
ws.add_chart(c2, "F2")
c2.set_categories(ctg)

The line chart is stored in the ‘example.xlsx’. Open it to view, (figure 10.8)

Python Data Presistence - Charts chapter 10 img 3

Python Data Persistence – Charts Read More »

Python Data Persistence – Excel with Pandas

Python Data Persistence – Excel with Pandas

Pandas library is extremely popular with data scientists as it provides easy-to-use tools for data manipulation and analysis. Different types of data structures are available in Pandas. Of which, the data frame is most commonly used. Dataframe in Pandas represents a two-dimensional tabular data structure with labeled columns which may be of different data types.

Before we explore the DataFrame object and its relationship with Excel, we have to ensure that the Pandas package is installed in the current Python environment. If you are using Anaconda distribution, Pandas is already installed in it. Otherwise, you may have to get it using pip utility in the virtual environment we have created for this chapter.

E:\excelenv>scripts\activate
(excelenv) E:\excelenv>scripts\pip3 install pandas

During installation, few more libraries like NumPy and others are also installed as they are internally used by Pandas.
As mentioned earlier, the DataFrame object of Pandas is a two-dimensional table-like structure, with labeled columns that may be of different data types (This is similar to SQL table isn’t it?). It can be constructed using various data objects as inputs such as Python lists or dictionaries. Of particular relevance to us in this chapter is creating a DataFrame object using a list of dictionary items.
Let us first define a list, each item in which is a dictionary object. It has three k-v pairs as shown below:

Example

>>> pricelist=[{1ProductID1 1, 'Name ':'Laptop', 'price': 25000} ,
{'ProductID' : 2 , 'Name':' TV' ,'price' : 40000} ,
{'ProductID' : 3 , 'Name 1 : ' Router', 'price’ : 2 0 0 0 } ,
{'ProductID' :4, 'Name':' Scanner', 'price' : 5000 } ,
{'ProductID1 : 5, 'Name':' Printer 1,'price' : 9000} ]

Use this list object as an argument to the constructor of the DataFrame object. Example 10.15

>>> import pandas as pd
>>> df=pd.DataFrame(pricelist)
>>> df
Name ProductID price
0 Laptop 1 25000
1 TV 2 40000
2 Router 3 2000
3 Scanner 4 5000
4 Printer 5 9000

Example

Incidentally, conversion to/from DataFrame and many other data formats is possible. This includes JSON, CSV, pickle, SQL, and so on. As a quick example, we shall try to read SQLite table data, using the read_sql_ query () function.

Example

>>> import pandas as pd
>>> import sqlite3
>>> con=sqlite3.connect('mydb.sqlite')
>>> df = pd.read_sql_query("SELECT * FROM Products;", con)
>>> df
ProductID Name Price
0 1 Laptop 27500
1 3 Router 3000
2 4 Scanner 5500
3 5 Printer 11000
4 6 Mobile 16500

At the conclusion of this chapter, you must have got a fair idea of how you can use Python to manipulate Excel workbook documents. While openpyxl package is all about automating the functionality of Excel software, data in Excel sheets can be brought in Pandas data frames for high-level manipulations and analysis and exported back.

The next two chapters of this book deal with the exciting world of NOSQL databases and the way Python can interact with two of very popular NOSQL databases – MongoDB, and Cassandra.

Python Data Persistence – Excel with Pandas Read More »

Python Data Persistence – Installation of MongoDB

Python Data Persistence – Installation of MongoDB

MongoDB server software is available in two forms: Community edition (open source release) and Enterprise edition (having additional features such as administration, and monitoring).

The MongoDB community edition is available for Windows, Linux as well as MacOS operating systems at https://www.mongodb.com/ download-center/community. Choose the appropriate version as per the OS and architecture of your machine and install it as per the instructions on the official website. Examples in this chapter assume that MongoDB is installed on Windows in the e:\mongodb folder.
Start MongoDB server from command terminal using the following command:

E:\mongodb\bin>mongod
. . . 
waiting for connections on port 27017

The server is now listening to connection requests from clients at port number 22017 of the localhost. (Server’s startup logs are omitted in the above display). To stop it, press ctrl-C. MongoDB databases are stored in the bin\data directory. You can specify alternative location though by specifying –dbpath option as follows:

Example

E:\mongodb\bin>mongod --dbpath e:\test

Now, start Mongo shell in another terminal.

E:\mongodb\bin>mongo 
MongoDB shell version v4.0.6 
connecting to:
mongodb://127.0.0.1:27017/?gssapiServiceName=mongodb Implicit session: 
session { "id" : UUID("0d848bll- acf7-4d30-83df-242dld7fa693") }
MongoDB server version: 4.0.6
---
>

Mongo shell is a Javascript interface to MongoDB server. It is similar to the SQLite shell or MsSQL console, as we have seen earlier chapter. The CRUD operations on the MongoDB database can be performed from here.

Python Data Persistence – Installation of MongoDB Read More »

Python Data Persistence – Querying Cassandra Table

Python Data Persistence – Querying Cassandra Table

Predictably. CQL also lias the SELECT statement to fetch data from a Cassandra table. The easiest usage is employing to fetch data from all columns in a table.

cq1sh:mykeyspace> select * from products;
productid   |    name        |  price
------------+-------------+---------
      5          |   ’Printer'      |  9000
      1           |  'Laptop'     |   25000
      2          |  'TV'             |  40000
      4           |  'Scanner'    |   5000
      6           | 'Mobile'      |   15000
      3           | 'Router'       |  2000
(6 rows)

All conventional logical operators are allowed in the filter criteria specified with the WHERE clause. The following statement returns product names with prices greater than 10000.

cq1sh:mykeyspace> select * from products where price>10000 allow filtering;
productid   |    name        |  price
------------+-------------+---------
      1          |  'Laptop'     |   25000
      2          |  'TV'            |  40000
      6          | 'Mobile'      |   15000
(3 rows)

Use of ALLOW FILTERING is necessary here. By default, CQL only allows select queries where all records read will be returned in the result set. Such queries have predictable performance. The ALLOW FILTERING option allows to explicitly allow (some) queries that require filtering. If the filter criteria consist of partition key columns only = and IN operators are allowed.

UPDATE and DELETE statements of CQL are used as in SQL. However, both must have filter criteria based on the primary key. (Note the use of’—’ as a commenting symbol)

cq1sh:mykeyspace> - -update syntax
cq1sh:mykeyspace> update new products set price=45000
where productID=2;
cq1sh:mykeyspace> --delete syntax
cq1sh:mykeyspace> delete from new products where
productID=6;

Python Data Persistence – Querying Cassandra Table Read More »

Python Data Persistence – Read Data from Worksheet

Python Data Persistence – Read Data from Worksheet

To read data from an existing Excel document, we need to load it with the load_workbook ( ) function.

>>> from openpyxl import load_workbook 
>>> wb=load_workbook (filename= ' test. xlsx ' )

Set the desired worksheet as active and retrieve the value of any cell.

Example

>>> sheet1.cell (row=1, column=1).value
'Hello World'
>>> #or
>>> sheetl['A1'] . value
'Hello World'

Following script writes data in a list object, each item being a tuple comprising of ProductID, name, and price.

Example

#saveworkbook.py 
from openpyxl import Workbook 
wb = Workbook( ) 
sheet1 = wb.active 
sheet1.title='PriceList' 
sheet1.cell(column=l, row=l, value='Pricelist') pricelist=[('ProductID', 'Name', 'Price'), 
                          (1,'Laptop',25000),(2, 'TV',40000), 
                          (3, 'Router' ,2000) , (4, 'Scanner',5000) , 
                          (5, 'Printer 1,9000) , (6, 'Mobile',15000)] 
        for col in range(1,4): 
                  for row in range(1,7): 
                     sheet1.cell(column=col, row=1+row, 
value=pricelist[row-1] [col-1]) 
wb. save (filename = "test.xlsx")

The Excel document in the current directory looks like this: (figure 10.3)

Python Data Presistence - Read Data from Worksheet chapter 10 img 1

Let us find out how to perform certain formatting actions on worksheet data.

Python Data Persistence – Read Data from Worksheet Read More »