Python Data Persistence – Excel with Pandas
Pandas library is extremely popular with data scientists as it provides easy-to-use tools for data manipulation and analysis. Different types of data structures are available in Pandas. Of which, the data frame is most commonly used. Dataframe in Pandas represents a two-dimensional tabular data structure with labeled columns which may be of different data types.
Before we explore the DataFrame object and its relationship with Excel, we have to ensure that the Pandas package is installed in the current Python environment. If you are using Anaconda distribution, Pandas is already installed in it. Otherwise, you may have to get it using pip utility in the virtual environment we have created for this chapter.
E:\excelenv>scripts\activate (excelenv) E:\excelenv>scripts\pip3 install pandas
During installation, few more libraries like NumPy and others are also installed as they are internally used by Pandas.
As mentioned earlier, the DataFrame object of Pandas is a two-dimensional table-like structure, with labeled columns that may be of different data types (This is similar to SQL table isn’t it?). It can be constructed using various data objects as inputs such as Python lists or dictionaries. Of particular relevance to us in this chapter is creating a DataFrame object using a list of dictionary items.
Let us first define a list, each item in which is a dictionary object. It has three k-v pairs as shown below:
Example
>>> pricelist=[{1ProductID1 1, 'Name ':'Laptop', 'price': 25000} , {'ProductID' : 2 , 'Name':' TV' ,'price' : 40000} , {'ProductID' : 3 , 'Name 1 : ' Router', 'price’ : 2 0 0 0 } , {'ProductID' :4, 'Name':' Scanner', 'price' : 5000 } , {'ProductID1 : 5, 'Name':' Printer 1,'price' : 9000} ]
Use this list object as an argument to the constructor of the DataFrame object. Example 10.15
>>> import pandas as pd >>> df=pd.DataFrame(pricelist) >>> df Name ProductID price 0 Laptop 1 25000 1 TV 2 40000 2 Router 3 2000 3 Scanner 4 5000 4 Printer 5 9000
Example
Incidentally, conversion to/from DataFrame and many other data formats is possible. This includes JSON, CSV, pickle, SQL, and so on. As a quick example, we shall try to read SQLite table data, using the read_sql_ query () function.
Example
>>> import pandas as pd >>> import sqlite3 >>> con=sqlite3.connect('mydb.sqlite') >>> df = pd.read_sql_query("SELECT * FROM Products;", con) >>> df ProductID Name Price 0 1 Laptop 27500 1 3 Router 3000 2 4 Scanner 5500 3 5 Printer 11000 4 6 Mobile 16500
At the conclusion of this chapter, you must have got a fair idea of how you can use Python to manipulate Excel workbook documents. While openpyxl package is all about automating the functionality of Excel software, data in Excel sheets can be brought in Pandas data frames for high-level manipulations and analysis and exported back.
The next two chapters of this book deal with the exciting world of NOSQL databases and the way Python can interact with two of very popular NOSQL databases – MongoDB, and Cassandra.