Python

Python Data Persistence – ORM – Update Data

Python Data Persistence – ORM – Update Data

Modifying attributes of an object is very easy in SQLAlchemy. First, you have to fetch the desired object, either by the primary key (using the get ( ) method) or by applying the proper filter. All you have to do is assign a new value to its attribute and commit the change.

Following code will fetch an object from the ‘Products’ table whose ProductJD=2 (Product name is TV and price is 40000 as per sample data)

Example

p=q.get(2) 
SELECT "Products"."ProductID" AS "Products_ ProductID", 
"Products".name AS "Products_name", 
"Products".price AS "Products_price" 
FROM "Products" 
WHERE "Products"."ProductID" =? 
2, )

Change the price to 45000 and commit the session.

p.price=45000 
sessionobj.commit( )

SQLAlchemy internally executes the following UPDATE statement:

UPDATE "Products" SET price=? 
WHERE "Products"."ProductID" = ? 
(45000, 2)
 COMMIT

 

Python Data Persistence – ORM – Update Data Read More »

Python Data Persistence – ORM – Filter Criteria

Python Data Persistence – ORM – Filter Criteria

The query object has a filter ( ) method that implements the WHERE clause as used in the raw SQL SELECT statement. The argument to filter can be any Boolean expression. In the following snippet, the filter is ‘price>20000’.

rows=q.filter (Product .price>20000)

This will translate into corresponding SQL statements as under:

Example

SELECT "Products"."ProductID" AS "Products_ ProductID", "Products". 
name AS "Products_name", "Products".price AS "Products_price" 
FROM "Products" 
WHERE "Products".price >?
(20000, )

SQLAlchemy supports the use of wild cards for filter operations on string columns. The LIKE keyword in SQL is implemented by applying a ( ) filter. Products. name. like (‘ %er’) filters rows with product name ending with ‘er’>

rows=q.filter (Product .name. like (1 %er') )

In effect above statement is equivalent to the following SQL query:

Example

SELECT "Products"."ProductID" AS "Products_ ProductID", 
"Products".name AS "Products_name", "Products".price AS "Products_price" 
FROM "Products" 
WHERE "Products".name LIKE ? 
(' %er' ,)

 

As you will expect, the following output will be displayed:

name: Router price: 2000 
name: Scanner Price: 5000 
name: Printer price: 9000

The filter ( ) can have AND/OR conjunctions implemented by and_ ( ) and or_( ).

Following filter returns products with a price between 10000 and 30000

from sqlalchemy 
import and_ rows=q.filter(and_
(Product.price>10000, Product. price<30000) )

Here is the generated SQL:

Example

SELECT "Products"."ProductID" AS "Products_ ProductID", 
"Products".name AS "Products_name", "Products" .price AS "Products__price" 
FROM "Products" 
WHERE "Products".price > ? AND "Products".price < ? 
(10000, 30000)

The OR operation is performed by the following statement

Example

from sqlalchemy import or_ 
rows=q.filter(or_(Product.price>20000, Product.name. like (' %er' ) ) )

which is equivalent to the following SQL statement:

Example

SELECT "Products"."ProductID" AS "Products_ ProductID", 
"Products".name AS "Products_name", "Products".price AS "Products_price" 
FROM "Products", 
"Customers" WHERE "Products".price <=? 
OR "Customers".name LIKE? 
(5000, '%er')

 

Python Data Persistence – ORM – Filter Criteria Read More »

Python Data Persistence – ORM – Querying

Python Data Persistence – ORM – Querying

In order to fetch data from a database table, we need to obtain a query object. The query () method is defined in the Session class. It needs the mapped class as an argument.

q=sessionobj.query(Product)

The query object itself has access to various methods to fetch rows from the underlying table and return objects of the mapped class.

The query .get ( ) method accepts the primary key as an argument and returns the corresponding object. For example, the following statement returns object with ProductID=2 (ProductID being the primary key of Products table)

p=q . get(2)

Because the ‘echo’ parameter is set to True in the Engine constructor, the console shows the corresponding SQL statement generated by SQLAIchemy as below:

Example

BEGIN (implicit) 
SELECT "Products"."ProductID" AS 
"Products_ ProductID", "Products".name AS "Products_name", "Products".price AS "Products_price" 
FROM "Products" 
WHERE "ProductsProductID" = ? sqlalchemy.engine.base.Engine (2,)

Attributes of the object (p.name and p.price) can now be displayed. Still better, provide a__str__( ) method in Products class. Modify it in myclasses.py script as under:

Example

class Product(base):
__tablename ___= 'Products'

ProductID = Column(Integer, primary_key=True)
name = Column(String)
price = Column(Integer)
def__str__(self):
return 'name:{ } price: { } ' . format ( self . name , self . price )

The query, all ( ) method returns a list of all objects which can be traversed using a loop. Here is a fetchllrecs.py script:

Example

#fetchallrecs.py 
from sqlalchemy import Column, Integer, String 
from sqlalchemy import create_engine 
from myclasses import Product,base, Customers 
engine = create_engine('sqlite:///mydb.sqlite', echo=True) 
base.metadata.create_all(engine) 
from sqlalchemy.orm import sessionmaker 
Session = sessionmaker(bind=engine) 
sessionobj = Session ( ) 
q=sessionobj.query(Products) 
rows=q.all ( ) 
for row in rows: 
print (row)

Shown below is the output of this script along with shortened log in the console window (figure 9.4):

(SQLAlchemyEnv) E: \ SQLA1chemyEnv>python 
fetchallrecs.py PRAGMA table info( BEGIN (implicit) "Products") 
SELECT "Products". "ProductID" AS "Products_ 
ProductID", "Products".name AS 
"Products_name", "Products".price AS 
"Products_price" FROM "Products"
 ( ) 
 name: Laptop price: 25000 
 name: TV price: 40000 
 name: Router price: 2000 
 name: Scanner Price : 5000 
 name: Printer price : 9000 
 name: Mobile price: 15000

 

Python Data Persistence – ORM – Querying Read More »

Python Data Persistence – Python – Cassandra

Python Data Persistence – Python – Cassandra

In this last chapter, we are going to deal with another important NOSQL database – Cassandra. Today some of the biggest IT giants (including FaceBook, Twitter, Cisco, and so on) use Cassandra because of its high scalability, consistency, and fault-tolerance. Cassandra is a distributed database from Apache Software Foundation. It is a wide column store database. A large amount of data is stored across many commodity servers which makes data highly available.

Cassandra Architecture

The fundamental unit of data storage is a node. A node is a single server in which data is stored in the form of the keyspace. For understanding, you can think of keyspace as a single database. Just as any server running a SQL engine can host multiple databases, a node can have many key spaces. Again, like in a SQL database, keyspace may have multiple column families which are similar to tables.

However, the architecture of Cassandra is logically as well as physically different from any SQL-oriented server (Oracle, MySQL, PostgreSQL, and so on). Cassandra is designed to be a foolproof database without a single point of failure. Hence, data in one node is replicated across a peer-to-peer network of nodes. The network is called a data center, and if required, multiple data centers are interconnected to form a cluster. Replication strategy and replication factor can be defined at the time of the creation of a keyspace. (figure 12.1) ‘

Python Data Presistence - Python - Cassandra chapter 12 img 1
Each ‘write’ operation over a keyspace is stored in Commit Log, which acts as a crash-recovery system. After recording here, data is stored in a Mem-table. Mem-table is just a cache or buffer in the memory. Data from the mem-table is periodically flushed in SSTables, which are physical disk files on the node.

Cassandra’s data model too is entirely different from a typical relational database. It is often, described as a column store or column-oriented NOSQL database. A keyspace holds one or more column families, similar to the table in RDBMS. Each table (column family) is a collection of rows, each of which stores columns in an ordered manner. Column, therefore, is the basic unit of data in Cassandra. Each column is characterized by its name, value, and timestamp.

The difference between a SQL table and Cassandra’s table is that the latter is schema-free. You don’t need to define your column structure ahead of time. As a result, each row in a Cassandra table may have columns with different names and variable numbers.

Python Data Presistence - Python - Cassandra chapter 12 img 2

Installation

The latest version of Cassandra is available for download athttp://cassandra. apache.org/download/. Community distributions of Cassandra (DDC) can be found at https://academy.datastax.com/planet-cassandra/cassandra. Code examples in this chapter are tested on DataStax distribution installed on Windows OS.

Just as any relational database uses SQL for performing operations on data in tables, Cassandra has its own query language CQL which stands for Cassandra Query Language. The DataStax distribution comes with a useful front-end IDE for CQL. All operations such as creating keyspace and table, running different queries, and so on can be done both visually as well as using text queries. The following diagram shows a view of DataStax DevCenter IDE.(figure 12.3)

Python Data Presistence - Python - Cassandra chapter 12 img 3

CQL Shell

Cassandra installation also provides a shell inside which you can execute CQL queries. It is similar to> MySQL console, SQLite console, or Oracle’s SQL Plus terminal. (figure 12.4)

Python Data Presistence - Python - Cassandra chapter 12 img 4

We shall first learn to perform basic CRUD operations with Cassandra from inside CQLSH and then use Python API for the purpose.

Inserting Rows

INSERT statement in CQL is exactly similar to one in SQL. However, the column list before the ‘VALUES’ clause is not optional as is the case in SQL. That is because, in Cassandra, the table may have a variable number of columns.

cq1sh:mykeyspace> insert into products (productID, name, price) 
values (1, 1 Laptop 1,25000);

Issue INSERT statement multiple numbers of times to populate ‘products’ table with sample data given in chapter 9. You can also import data from a CSV file using the copy command, as follows:

cq1sh:mykeyspace> copy products (productID, name, price)
                             . . . from ’pricelist.csv’ with 
the delimiter,’ and header=true;

 

Python Data Persistence – Python – Cassandra Read More »

Python Data Persistence – PyMongo – Relationships

Python Data Persistence – PyMongo – Relationships

MongoDB is a non-relational database. However, you can still establish relationships between documents in a database. MongoDB uses two different approaches for this purpose. One is an embedded approach and the other is a referencing approach.

Embedded Relationship

In this case, the documents appear in a nested manner where another document is used as the value of a certain key. The following code represents a ‘customer’ document showing a customer (with ‘_ id ’= 1) buys two products. A list of two product documents is the value of the ‘prods’ key.

Example

>>> cust.insert_one({'_id':1,'name':'Ravi',
                                          'prods':[
                                                 { 'Name':'TV',
'price':40000},


{'Name':'Scanner','price':5000}
                                ]
              })

Querying such an embedded document is straightforward as all data is available in the parent document itself.

Example

>>> doc=cust .find_one ({ '_id' : 1}, { 'prods ' : 1})
>>> doc
{' id': 1, 'prods': [{'Name': 'TV', 'price': 40000},
{'Name': 'Scanner', 'price': 5000}]}

The embedded approach has a major drawback. The database is not normalized and, hence, data redundancy arises. Assize grows, it may affect the performance of reading/write operations.

Reference Relationship

This approach is somewhat similar to the relations in a SQL-based database. The collections (equivalent to the RDBMS table) are normalized for optimum performance. One document refers to the other with its ‘_id’ key.

Recollecting that instead of automatically generated random values for ‘_id’, they can be explicitly specified while inserting a document in a collection, following is the constitution of ‘products’ collection.

Example

>>> list (prod.find ( ) '
[{ '_4d' : 1, 'Name' 'Laptop', 'price': 25000}, {'_
id': 2, Name': 'TV', 'price': 40000}, {'_id': 3,
'Name': Router', price': 2000}, {'_id': 4, 'Name':
'Scanner , 'price' 5000}, {' id': 5, 'Name':
'Printer , 'price' 9000}]

We now create a ‘customers’ collection.

Example

>>> db.create_collection('customers 1) 
>>> cust=db['customers']

The following document is inserted with one key ‘prods’ being a list of ‘ _id’ s from products collection.

Example

>>> cust .insert_one({'_id':1, 'Name' 'Ravi',
'prods': [2,4] })

However, in such a case, you may have to run two queries: one on the parent collection, and another on related collection. First, fetch the _ids of the related table.

Example

>>> doc=cust .find_one ({'_id' : 1}, {'prods ' : 1})
>>> prods
[2, 4]

Then, iterate over the list and access the required field from the related document.

Example

>>> for each in prods:
doc=prod .find_one ({ ' id' :each})
print (doc['Name'])
TV
Scanner

The reference approach can be used to build one-to-one or one-to-many types of relationships. The choice of approach (embedded or reference) largely depends on data usage, the projected growth of the size of the document, and the atomicity of the transaction.

In this chapter, we had an overview of the MongoDB database and its Python interface in the form of the PyMongo module. In the next chapter, another NoSQL database – Cassandra – is going to be explained along with its association with Python.

Python Data Persistence – PyMongo – Relationships Read More »

Python Data Presistence – PyMongo – Add Collection

Python Data Presistence – PyMongo – Add Collection

Create a new database object by using any name currently not in the list.

Example

>>> db=client.newdb

The Database is actually created when the first document is inserted. The following statement will implicitly create a ‘products’ collection and multiple documents from the given list of dictionary objects.

Example

>>> pricelist=[{ 'ProductID : 1 / 'Name' :'Laptop',
'price' 25000},{ 'ProductID :2, 'Name' :'TV',
'price' 40000},{ 'ProductID : 3, 'Name' :'Router',
'price' 2000},{' ProductID' 4, 'Name': 'Scanner',
'price' 'price' 5000},{' 9000}] ProductID' 5, Name': 'Printer',
>>> db.products.insert_many(pricelist)

You can confirm the insertion operation by find ( ) method in the mongo shell, as we have done earlier.

We create a collection object explicitly by using the create_collection( ) method of the database object.

Example

>>> db.create_collection(1 customers')

Now, we can add one or more documents in it. The following script adds documents in ‘customers’ collection.

Example

from pymongo import MongoClient client=MongoClient() 
db=client.newdb db.create_collection("customers") 
cust=db['customers' ]
 custlist=[{'CustID':1,1 Name' :'Ravikumar', 'GS- TIN':'27AAJPL7103N1ZF'}, 
{'CustID' :2, 'Name': 'Patel', 'GSTIN': '24ASDFG1234N- 1ZN'}, 
{'CustID' : 3, ' Name' : 'Nitin', 'GSTIN': '27AABBC7895N- 1ZT'}, 
{'CustID' : 4, ' Name' : 'Nair',' GSTIN':' 32MMAF8963N1ZK'} 
{'CustID' :5, ' Name' : 'Shah',' GSTIN':' 24BADEF2002N- 1ZB'}, 
{'CustID' :6, ' Name' : 'Khurana .', 'GSTIN ':'07KABCS1002N- lZV'}, 
{'CustID' :7, ' Name' : 'Irfan', 'GSTIN': '05IIAAV5103N- 1ZA'}, 
{'CustID' :8, ' Name' : 'Kiran', 'GSTIN': '12PPSD- F22431ZC' 
{'CustID' }. :9, ' Name' : 'Divya', 'GSTIN': '15ABCDE1101N- 1ZA'}, 
{'CustID' : 10 ,'Name' :'John', 'GS- TIN':'29AAEEC4258E1ZK'}] 
cust.insert_many(custlist) / client.close()

 

Python Data Presistence – PyMongo – Add Collection Read More »

Python Data Persistence – PyMongo Module

Python Data Persistence – PyMongo Module

PyMongo module is an official Python driver for MongoDB database developed by Mongo Inc. It can be used on Windows, Linux, as well as MacOS. As always, you need to install this module using the pip3 utility.

pip3 install pymongo

Before attempting to perform any operation on a database, ensure that you have started the server using ‘mongod’ command and the server is listening at port number 22017.

To let your Python interpreter interact with the server, establish a connection with the object of MongoClient class.

Example

>>> from pymongo import MongoClient 
>>> client=MongoClient( )

The following syntax is also valid for setting up connection with server.

>>> client = MongoClient('localhost', 27017)
#or
client = MongoClient('mongodb://localhost:27017')

In order to display currently available databases use list_database_ namesO method of MongoClient class.

Example

>>> client.list_database_names( )
[' admin' , ' config' , 1 local' , ' mydb' ]

Python Data Persistence – PyMongo Module Read More »

Python Data Persistence – MongoDB – Querying Collection

Python Data Persistence – MongoDB – Querying Collection

Retrieving data from the database is always an important operation. MongoDB’s collection has found ( ) method with which documents are fetched. Without any argument, the find ( ) method returns a result set of all documents in a collection. In effect, it is equivalent to ‘SELECT * FROM <table>‘ in SQL.

> db . products . find () 
{ "_id" : Objectld("5c8d420c7bebaca49b767db3"), 
"ProductID" : 1, "Name" : "Laptop", "price" : 25000 
} 
{ "_id" : Objectld("5c8d420c7bebaca49b767db4"), 
"ProductID" : 2, "Name" : "TV", "price" : 40000 } 
{ "_id" : Objectld("5c8d420c7bebaca49b767db5"), 
"ProductID" : 3, "Name" : "Router", "price" : 2000 } 
{ "__id" : Objectld("5c8d420c7bebaca49b767db6"), 
"ProductID" : 4, "Name" : "Scanner", "price" : 5000 } 
{ "_id" : Objectld("5c8d420c7bebaca49b767db7"), 
"ProductID" : 5, "Name" : "Printer", "price" : 9000 
}

Note that, ‘_id’ key is automatically added to each document. The value of each _id is of ObjectId type and is unique for each document.
Invariably, you would want to apply to the result set returned by find ( ). It is done by putting the key-value pair in its parenthesis. In its generalized form, the conditional query is written as follows:

Example

db. collection .find ({ "key" : "value"})

The following statement retrieves a document whose ‘Name’ key has ‘TV’ value.

Example

> db .products .find ({"Name": "TV"}) 
{ "_id" : Objectld("5c8d420c7bebaca4 9b76 7db4"), 
"ProductID" : 2, "Name" : "TV", "price" : 40000 }

MongoDB doesn’t use traditional logical operator symbols. Instead, it has its own operators, as listed below:

The operators are used in the find ( ) method to apply the filter. The following statement returns products with a price> 10000.

> db.products .find ({"price" : {$gt: 10000 }}) 
{ 
"_id" : Objectld("5c8d420c7bebaca49b767db3"), 
"ProductID" : 1, "Name" : "Laptop", "price" : 25000 
} 
{ "_id" : Objectld("5c8d420c7bebaca49b767db4"), 
"ProductID" : 2, "Name" : "TV", "price" : 40000
}

The Sand, as well as Sor operators, are available for compound logical expressions. Their usage is, as follows:

Example

db. collection.find ($and: [ {"keyl" :"value 1"},
{"key2":"value2"} ] )

Use the following command to fetch products with prices between 1000 and 10000.

> db . products . find ( { $and: [ { "price" : { $gt: 1000 } } , {"price":{$lt:10000}} ] }) 
{ "_id" : Objectld("5c8d420c7bebaca49b767db5 " ) , 
"ProductID" : 3, "Name" : "Router", "price" : 2000 
} 
{ "_id" : Objectld("5c8d420c7bebaca49b767db6"), 
"ProductID" : 4, "Name" : "Scanner", "price" : 5000 
} 
{ "_id" : Obj ectld("5c8d42 0c7bebaca4 9b767db7") ,
 "ProductID" : 5, "Name" : "Printer", "price" : 9000
}

 

Python Data Persistence – MongoDB – Querying Collection Read More »

Python Data Persistence – User-defined Types

Python Data Persistence – User-defined Types

While executing the queries, Python data types are implicitly parsed to corresponding CQL types as per the following table:(figure 12.1)

Python TypeCQL Type
NoneNULL
BoolBoolean
FloatFloat, double
Int, longInt, bigint , variant , smallest, tinyint , counter
Decimal, decimalDecimal
Str , UnicodeAscii, varchar , test
Buffer, byte arrayBlob
DateDate
DatetimeTimestamp
TimeTime
List , tuple

Generator

List
Set , frozenestSet
Dict , OrderedDistMap
Uuid.UUIDTimeuuid.uuid

 

In addition to the above built-in CQL data types, the Cassandra table may have a column of a user-defined type to which an object of Python class can be mapped.

Cassandra provides a CREATE TYPE statement to define a new user-defined type which is used as a type for a column in a table defined with the CREATE TABLE statement.

In the script given below (Cassandra-udt.py), we define a Cassandra user-defined type named ‘contacts’ and use it as the data type of the ‘contact’ column in the ‘users’ table. The register__user_type ( ) method of cluster object helps us to map Python class ‘Contactlnfo’ to the user-defined type.

Example

#cassandra-udt.py 
from cassandra.cluster import Cluster 
cluster = Cluster(protocol_version=3) 
session = cluster.connect( ) 
session.set_keyspace(1mykeyspace1) 
session.execute("CREATE TYPE contact (email text, phone text)") 
session.execute("CREATE TABLE users (userid int PRIMARY KEY, name text, contact frozen<contact>)") 
class Contactlnfo:          
      def__init__(self, email, phone):                       
           self.email = email                       
            self.phone = phone 
cluster.register_user_type('mykeyspace', 'contact', Contactlnfo) 
# insert a row using an instance of Contctlnfo session.execute("INSERT INTO users (userid, name, contact) 
VALUES (%s, %s, %s)",                           
   (1, .'Admin', Contactlnfo
("admin@ testserver.com", '9988776655')))

The following display of the CQL shell confirms the insertion operation of the above script.

cq1sh:mykeyspace> select * from users; 
    userid          |                            contact                                            |  name
------------+------------------------------------------------------------+-----------
        1        | {email: '[email protected]', phone: '9988776655'}  |   Admin 
(1 rows)

In this chapter, we learned about the basic features of the Cassandra database, and importantly how to perform read/write operations on it with Python.

Final thoughts

This is also the last chapter of this book. Starting with the basics of Python, we learned how to work with different data storage formats. This work is neither a Python handbook nor a SQL/NoSQL database guide. Instead, it is intended to be a simple and practical explanation of Python interfaces with different data persistence avenues available today.

This book concentrates primarily on basic CRUD operations on SQL and NoSQL databases and other data file formats. However, these techniques are merely complementary to the real data crunching a data scientist needs to perform. The next logical thing for a curious reader would be to acquire data manipulation and visualization skills for which Python has very rich and powerful libraries.
1 hope you enjoyed this book as much as 1 did bring to you. Feedback, suggestions, and corrections if any are most welcome so that subsequent editions can be improved.
Thanks.

Python Data Persistence – User-defined Types Read More »

Python Data Persistence – Charts

Python Data Persistence – Charts

One of the most attractive features of the MS Excel application is to dynamically generate various types of charts based upon data in worksheets. The openpyxl package has a chart module that offers the required functionality. In this section, we shall see how we can render charts programmatically.

Python Data Presistence - Charts chapter 10 img 1

The chart module defines classes for all types of charts such as BarChart and LineChart. The chart requires data range and category range to be defined. These ranges are defines with the Reference () function. It stipulates row and column numbers of top-left and bottom-right cells of the desired range.

In the above worksheet, B2:D7 is the data range including the column labels, and A1: A7 range is the categories range.

Example

from openpyxl import load_workbook 
wb = load_workbook(1 example.xlsx') 
ws = wb.active 
from openpyxl.chart import BarChart, Reference 
values = Ref erence (ws, min_col=2, min__row=2, max_ col=4, max_row=7) 
ctg=Reference(ws, min_col=l,min_row=3, max_col=l, max_row=7)

The chart object is configured by add_data() and set_ categories ( ) methods. The add_data() method takes the data range as the first parameter. If the titles_f rom_data parameter is set to True, the first row in the data range is used for the series legend title. The title of the chart, X-axis and Y-axis are also set by respective properties.

Example

c1 = BarChart( )
c1.add_data(values, titles_from_data=True
c1 . title = "Bar Chart"
c1.x_axis.title = 'Months1
c1. y__axis . title = 'Sales'
ws . add__chart (cl, "A10")
c1.set_categories(ctg)
wb. save (filename= ' example .xlsx' )

Run the above script and then open the workbook document. It will now have the bar chart stored in it. (figure 10.7)

Python Data Presistence - Charts chapter 10 img 2

Another example of a Line chart is explained below. The chart configuration has only one change. The chart object itself is of LineChart ( ) type.

Example

from openpyxl,chart import LineChart 
c2 = LineChart()
c2.add_data(values, titles_from_data=True)#legends
c2.title = "Line Chart"
c2,x_axis.title = 'Months'
c2.y_axis.title = 'Sales'
ws.add_chart(c2, "F2")
c2.set_categories(ctg)

The line chart is stored in the ‘example.xlsx’. Open it to view, (figure 10.8)

Python Data Presistence - Charts chapter 10 img 3

Python Data Persistence – Charts Read More »