veer

Python Data Persistence – A Quick Guide

Almost every other computer application, whether it is a web based application, a standalone data logger, a mobile app or a desktop application with or without GUI, stores and retrieves data from some persistent storage device such as hard disk or a flash drive. Such storage device may either be connected to computer or it may be available on a network. Without this ability to recurrently access, update and retrieve stored data, most computer applications would have been reduced to programmable calculators!

Python Data Persistence – A Quick Guide

Data storage format depends on the logical structure of data and on the processing logic. Data may be stored in flat computer files, in tables of relational databases or different store formats of NOSQL databases. You will know more about these terms in subsequent chapters of this Page.

Back-end process of a computer application stores, modifies and retrieves data in response to front-end user’s requirements. Almost every programming language offers tools to interact with files/databases. This book aims to familiarize the reader with Python’s functions and modules that handle persistent data processing.

Python Data Persistence – Getting Started

Python Data Persistence – Program Flow Control

Python Data Persistence – Structured Python

Python Data Persistence Object Oriented Programming OOP

Python Data Persistence – File IO

Python Data Persistence – Object Serialization

Python Data Persistence – RDBMS Concepts

Python Data Persistence – Python DB-API

Python Data Persistence – Python – SQLAlchemy

Python Data Persistence – Python and Excel

Python Data Persistence – Python – PyMongo

Python Data Persistence – Python – Cassandra

Read Also: 

Why Python?

Popularity of Python has increased by many fold recently because of the emergence of powerful libraries for data analysis, visualization and machine learning. These libraries use data stored in different formats such as text files and relational databases. Hence to be a proficient data scientist, it is important to have a sound understanding of Python tools for data persistence.

Features of Python

Python is easy!: So why has Python been so popular? First and foremost, Python is very easy to learn and use. “Simple is better than Complex”. This is one of the guiding principles of Python’s design philosophy. It has clean and simple syntax resembling to a natural language. It saves a lot of development time.

Open source: Python is free and open source having a very active and supportive developer community. As a result of plenty of documentation resources, guides tutorials and forums me available in public domain. Any newcomer can access, learn, collaborate and seek advice from community.

Object oriented: Python is completely object oriented, although it supports multiple programming paradigms including functional programming style.

Extensible: Python can be easily integrated with other languages such as C/C++, Java, .NET, etc. Corporate support: Python enjoys great corporate support. Google for example promotes Python in a big way as Python is extensively used in many of its applications. Many other companies use Python as their development platform.

How To Scrape LinkedIn Public Company Data

How To Scrape LinkedIn Public Company Data – Beginners Guide

Nowadays everybody is familiar with how big the LinkedIn community is. LinkedIn is one of the largest professional social networking sites in the world which holds a wealth of information about industry insights, data on professionals, and job data.

Now, the only way to get the entire data out of LinkedIn is through Web Scraping.

Why Scrape LinkedIn public data?

There are multiple reasons why one wants to scrape the data out of LinkedIn. The scrape data can be useful when you are associated with the project or for hiring multiple people based on their profile while looking at their data and selecting among them who all are applicable and fits for the company best.

This scraping task will be less time-consuming and will automate the process of searching for millions of data in a single file which will make the task easy.

Another benefit of scraping is when one wants to automate their job search. As every online site has thousands of job openings for different kinds of jobs, so it must be hectic for people who are looking for a job in their field only. So scraping can help them automate their job search by applying filters and extracting all the information at only one page.

In this tutorial, we will be scraping the data from LinkedIn using Python.

Prerequisites:

In this tutorial, we will use basic Python programming as well as some python packages- LXML and requests.

But first, you need to install the following things:

  1. Python accessible here (https://www.python.org/downloads/)
  2. Python requests accessible here(http://docs.python-requests.org/en/master/user/install/)
  3. Python LXML( Study how to install it here: http://lxml.de/installation.html)

Once you are done with installing here, we will write the python code to extract the LinkedIn public data from company pages.

This below code will only run on python 2 and not above them because the sys function is not supported in it.

import json

import re

from importlib import reload

import lxml.html

import requests

import sys

reload(sys)

sys.setdefaultencoding('cp1251')




HEADERS = {'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',

          'accept-encoding': 'gzip, deflate, sdch',

          'accept-language': 'en-US,en;q=0.8',

          'upgrade-insecure-requests': '1',

          'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36'}

file = open('company_data.json', 'w')

file.write('[')

file.close()

COUNT = 0




def increment():

   global COUNT

   COUNT = COUNT+1




def fetch_request(url):

   try:

       fetch_url = requests.get(url, headers=HEADERS)

   except:

       try:

           fetch_url = requests.get(url, headers=HEADERS)

       except:

           try:

               fetch_url = requests.get(url, headers=HEADERS)

           except:

               fetch_url = ''

   return fetch_url




def parse_company_urls(company_url):




   if company_url:

       if '/company/' in company_url:

           parse_company_data(company_url)

       else:

           parent_url = company_url

           fetch_company_url=fetch_request(company_url)

           if fetch_company_url:

               sel = lxml.html.fromstring(fetch_company_url.content)

               COMPANIES_XPATH = '//div[@class="section last"]/div/ul/li/a/@href'

               companies_urls = sel.xpath(COMPANIES_XPATH)

               if companies_urls:

                   if '/company/' in companies_urls[0]:

                       print('Parsing From Category ', parent_url)

                       print('-------------------------------------------------------------------------------------')

                   for company_url in companies_urls:

                       parse_company_urls(company_url)

           else:

               pass







def parse_company_data(company_data_url):




   if company_data_url:

       fetch_company_data = fetch_request(company_data_url)

       if fetch_company_data.status_code == 200:

           try:

               source = fetch_company_data.content.decode('utf-8')

               sel = lxml.html.fromstring(source)

               # CODE_XPATH = '//code[@id="stream-promo-top-bar-embed-id-content"]'

               # code_text = sel.xpath(CODE_XPATH).re(r'<!--(.*)-->')

               code_text = sel.get_element_by_id(

                   'stream-promo-top-bar-embed-id-content')

               if len(code_text) > 0:

                   code_text = str(code_text[0])

                   code_text = re.findall(r'<!--(.*)-->', str(code_text))

                   code_text = code_text[0].strip() if code_text else '{}'

                   json_data = json.loads(code_text)

                   if json_data.get('squareLogo', ''):

                       company_pic = 'https://media.licdn.com/mpr/mpr/shrink_200_200' + \

                                     json_data.get('squareLogo', '')

                   elif json_data.get('legacyLogo', ''):

                       company_pic = 'https://media.licdn.com/media' + \

                                     json_data.get('legacyLogo', '')

                   else:

                       company_pic = ''

                   company_name = json_data.get('companyName', '')

                   followers = str(json_data.get('followerCount', ''))




                   # CODE_XPATH = '//code[@id="stream-about-section-embed-id-content"]'

                   # code_text = sel.xpath(CODE_XPATH).re(r'<!--(.*)-->')

                   code_text = sel.get_element_by_id(

                       'stream-about-section-embed-id-content')

               if len(code_text) > 0:

                   code_text = str(code_text[0]).encode('utf-8')

                   code_text = re.findall(r'<!--(.*)-->', str(code_text))

                   code_text = code_text[0].strip() if code_text else '{}'

                   json_data = json.loads(code_text)

                   company_industry = json_data.get('industry', '')

                   item = {'company_name': str(company_name.encode('utf-8')),

                           'followers': str(followers),

                           'company_industry': str(company_industry.encode('utf-8')),

                           'logo_url': str(company_pic),

                           'url': str(company_data_url.encode('utf-8')), }

                   increment()

                   print(item)

                   file = open('company_data.json', 'a')

                   file.write(str(item)+',\n')

                   file.close()

           except:

               pass

       else:

           pass
fetch_company_dir = fetch_request('https://www.linkedin.com/directory/companies/')

if fetch_company_dir:

   print('Starting Company Url Scraping')

   print('-----------------------------')

   sel = lxml.html.fromstring(fetch_company_dir.content)

   SUB_PAGES_XPATH = '//div[@class="bucket-list-container"]/ol/li/a/@href'

   sub_pages = sel.xpath(SUB_PAGES_XPATH)

   print('Company Category URL list')

   print('--------------------------')

   print(sub_pages)

   if sub_pages:

       for sub_page in sub_pages:

           parse_company_urls(sub_page)

else:

   pass
How To Scrape Amazon Data Using Python Scrapy

How To Scrape Amazon Data Using Python Scrapy

Will it not be good if all the information related to some product will be placed in only one table? I guess it will be really awesome and accessible if we can get the entire information at one place.

Since, Amazon is a huge website containing millions of data so scraping the data is quite challenging. Amazon is a tough website to scrape for beginners and people often get blocked by Amazon’s anti-scraping technology.

In this blog, we will be aiming to provide the information about the scrapy and how to scrape the Amazon website using it.

What is Scrapy?

Scrapy is a free and open-source web-crawling Python’s framework. It was originally designed for web scraping, extracting the data using API’s and or general-purpose web crawler.

This framework is used in data mining, information processing or historical archival. The applications of this framework is used widely in different industries and has been proven very useful. It not only scrapes the data from the website, but it is able to scrape the data from the web services also. For example, Amazon API, Facebook API, and many more.

How to install Scrapy?

Firstly, there are some third-party softwares which needs to be installed in order to install the Scrapy module.

  • Python: As Scrapy has the base of the Python language, one has to install it first.
  • pip: pip is a python package manager tool which maintains a package repository and installs python libraries, and its dependencies automatically. It is better to install pip according to system OS, and then try to follow the standard way of installing Scrapy.

There are different ways in which we can download Scrapy globally as well as locally but the most standard way of downloading it is by using pip.

Run the below command to install Scrapy using pip:

Pip install scrapy

How to get started with Scrapy?

Since we know that Scrapy is an application framework and it provides multiple commands to create an application and use them. But before everything, we have to set up a new Scrapy project. Enter a directory where you’d like to store your code and run:

Scrapy startproject new_project

This will create a directory:

Scrapy is an application framework which follows object oriented programming style for the definition of items and spiders for overall applications.

The project structure contains different the following files:

  1. Scrapy.cfg : This file is a root directory of the project, which includes project name with the project settings.
  2. Test_project : It is an application directory with many different files which actually make running and scraping responsible from the web URLs.
  3. Items.py :Items are containers that will be loaded with the scraped data and they work like simple python dictionaries.Items provide additional prevention against typos and populating undeclared fields.
  4. Pipelines.py :After the scraping of an item has been done by the spider, it is sent to the item pipeline which processes it through several components. Each class has to implement a method called process_item for the processing of scraped items.
  5. Settings.py : It allows the customization of the behaviour of all scrapy components, including the core, extensions, pipelines, and spiders themselves.
  6. Spiders : It is a directory which contains all spiders/crawlers as python classes.

Scrape Amazon Data: How to Scrape an Amazon Web Page

For a better understanding of how the scrapy works, we will be scraping the product name, price, category, and it’s availability from the Amazon.com website.

Let’s name this project amazon_pro. You can use the project name according to your choice.

Start by writing the below code:

Scrapy startproject test_project

The directory will be created in the local folder by the name mentioned above.

Now we need three things which will help in the scraping process.

  1. Update items.py field which we want to scrape. For example names, price, availability, and so on.
  2. We have to create a new spider with all the necessary elements in it like allowed domains, start_urls and parse method.
  3. For data processing, we have to update pipelines.py file.

Now after creating the spider, follow thee further steps given in the terminal:

  1. Scrapy genspider amazon amazon.com

Now, we need to define the name, URLs, and possible domains to scrape the data.

How to Scrape an Amazon Web Page 2

An item object is defined in the parse method and is filled with required information using the utility of XPath response object. It is a search function that is used to find elements in the HTML tree structure. Lastly let’s yield the items object, so that scrapy can do further processing on it.

Next, after scraping data, scrapy calls Item pipelines to process them. These are called Pipeline classes and we can use these classes to store data in a file or database or in any other way. It is a default class like Items that scrapy generates for users.

How to Scrape an Amazon Web Page 3

The process_item is implemented by the pipeline classes method and items are being yielded by a Spider each and every time. It takes the item and spider class as arguments and returns a dict object. So for this example, we are just returning item dict as it is.

Now, we have to enable the ITEM_PIPELINES from settings.py file.

How to Scrape an Amazon Web Page 4

Now, after completing the entire code, we need to scrape the item by sending requests and accepting response objects.

We will call a spider by its unique name and scrapy will easily search from it.

Scrapy crawl amazon

Now, after the items have been scraped, we can save it to different formats using their extensions. For example, .json, .csv, and many more.

Scrapy crawl amazon -o data.csv

The above command will save the scraped data in the csv format in data.csv file.

Here is the output of the above code:

[ 
{"product_category": "Electronics,Computers & Accessories,Data Storage,External Hard Drives", "product_sale_price": "$949.95", "product_name": "G-Technology G-SPEED eS PRO High-Performance Fail-Safe RAID Solution for HD/2K Production 8TB (0G01873)", "product_availability": "Only 1 left in stock."},
{"product_category": "Electronics,Computers & Accessories,Data Storage,USB Flash Drives", "product_sale_price": "", "product_name": "G-Technology G-RAID with Removable Drives High-Performance Storage System 4TB (Gen7) (0G03240)", "product_availability": "Available from these sellers."},
{"product_category": "Electronics,Computers & Accessories,Data Storage,USB Flash Drives", "product_sale_price": "$549.95", "product_name": "G-Technology G-RAID USB Removable Dual Drive Storage System 8TB (0G04069)", "product_availability": "Only 1 left in stock."},
{"product_category": "Electronics,Computers & Accessories,Data Storage,External Hard Drives", "product_sale_price": "$89.95", "product_name": "G-Technology G-DRIVE ev USB 3.0 Hard Drive 500GB (0G02727)", "product_availability": "Only 1 left in stock."}
]

We have successfully scraped the data from Amazon.com using scrapy.

How to Scrape Wikipedia Articles with Python

How to Scrape Wikipedia Articles with Python

We are going to make a scraper which will scrape the wikipedia page.

The scraper will get directed to the wikipedia page and then it will go to the random link.

I guess it will be fun looking at the pages the scraper will go.

Setting up the scraper:

Here, I will be using Google Colaboratory, but you can use Pycharm or any other platform you want to do your python coding.

I will be making a colaboratory named Wikipedia. If you will use any python platform then you need to create a .py file followed by the any name for your file.

To make the HTTP requests, we will be installing the requests module available in python.

Pip install requests

 We will be using a wiki page for the starting point.

Import requests

Response = requests.get(url = "https://en.wikipedia.org/wiki/Web_scraping")

print(response.status_code)

 When we run the above command, it will show 200 as a status code.

How to Scrape Wikipedia Articles with Python 1

Okay!!! Now we are ready to step on the next thing!!

Extracting the data from the page:

We will be using beautifulsoup to make our task easier. Initial step is to install the beautiful soup.

Pip install beautifulsoup4

Beautiful soup allows you to find an element by the ID tag.

Title = soup.find( id=”firstHeading”)

 Bringing everything together, our code will look like:

How to Scrape Wikipedia Articles with Python 3

As we can see, when the program is run, the output is the title of the wiki article i.e Web Scraping.

 Scraping other links:

Other than scraping the title of the article, now we will be focusing on the rest of the things we want.

We will be grabbing <a> tag to another wikipedia article and scrape that page.

To do this, we will scrape all the <a> tags within the article and then I will shuffle it.

Do not forget to import the random module.

How to Scrape Wikipedia Articles with Python 3

You can see, the link is directed to some other wikipedia article page named as IP address.

Creating an endless scraper:

Now, we have to make the scraper scrape the new links.

For doing this, we have to move everything into scrapeWikiArticle function.

How to Scrape Wikipedia Articles with Python 4

The function scrapeWikiArticle will extract the links and and title. Then again it will call this function and will create an endless cycle of scrapers that bounce around the wikipedia.

After running the program, we got:

How to Scrape Wikipedia Articles with Python 5

Wonderful! In only few steps, we got the “web scraping” to “Wikipedia articles with NLK identifiers”.

Conclusion:

We hope that this article is useful to you and you learned how to extract random wikipedia pages. It revolves around wikipedia by following random links.

How to Code a Scraping Bot with Selenium and Python

How to Code a Scraping Bot with Selenium and Python

Selenium is a powerful tool for controlling web browsers through programs and performing browser automation. Selenium is also used in python for scraping the data. It is also useful for interacting with the page before collecting the data, this is the case that we will discuss in this article.

In this article, we will be scraping the investing.com to extract the historical data of dollar exchange rates against one or more currencies.

There are other tools in python by which we can extract the financial information. However, here we want to explore how selenium helps with data extraction.

The Website we are going to Scrape:

Understanding of the website is the initial step before moving on to further things.

Website consists of historical data for the exchange rate of dollars against euros.

In this page, we will find a table in which we can set the date range which we want.

That is the thing which we will be using.

We only want the currencies exchange rate against the dollar. If that’s not the case then replace the “usd” in the URL.

The Scraper’s Code:

The initial step is starting with the imports from the selenium, the Sleep function to pause the code for some time and the pandas to manipulate the data whenever necessary.

How to Code a Scraping Bot with Selenium and Python

Now, we will write the scraping function. The function will consists of:

  • A list of currency codes.
  • A start date.
  • An End date.
  • A boolean function to export the data into .csv file. We will be using False as a default.

We want to make a scraper that scrapes the data about the multiple currencies. We also have to initialise the empty list to store the scraped data.

How to Code a Scraping Bot with Selenium and Python 1

As we can see that the function has the list of currencies and our plan is to iterate over this list and get the data.

For each currency we will create a URL, instantiate the driver object, and we will get the page by using it.

Then the window function will be maximized but it will only be visible when we will keep the option.headless as False.

Otherwise, all the work will be done by the selenium without even showing you.

How to Code a Scraping Bot with Selenium and Python 2

Now, we want to get the data for any time period.

Selenium provides some awesome functionalities for getting connected to the website.

We will click on the date and fill the start date and end dates with the dates we want and then we will hit apply.

We will use WebDriverWait, ExpectedConditions, and By to make sure that the driver will wait for the elements we want to interact with.

The waiting time is 20 seconds, but it is to you whichever the way you want to set it.

We have to select the date button and it’s XPath.

The same process will be followed by the start_bar, end_bar, and apply_button.

The start_date field will take in the date from which we want the data.

End_bar will select the date till which we want the data.

When we will be done with this, then the apply_button will come into work.

How to Code a Scraping Bot with Selenium and Python 3

Now, we will use the pandas.read_html file to get all the content of the page. The source code of the page will be revealed and then finally we will quit the driver.

How to Code a Scraping Bot with Selenium and Python 4

How to handle Exceptions In Selenium:

The collecting data process is done. But selenium is sometimes a little unstable and fail to perform the function we are performing here.

To prevent this we have to put the code in the try and except block so that every time it faces any problem the except block will be executed.

So, the code will be like:

for currency in currencies:

        while True:

            try:

                # Opening the connection and grabbing the page

                my_url = f'https://br.investing.com/currencies/usd-{currency.lower()}-historical-data'

                option = Options()

                option.headless = False

                driver = webdriver.Chrome(options=option)

                driver.get(my_url)

                driver.maximize_window()

                  

                # Clicking on the date button

                date_button = WebDriverWait(driver, 20).until(

                            EC.element_to_be_clickable((By.XPATH,

                            "/html/body/div[5]/section/div[8]/div[3]/div/div[2]/span")))

               

                date_button.click()

               

                # Sending the start date

                start_bar = WebDriverWait(driver, 20).until(

                            EC.element_to_be_clickable((By.XPATH,

                            "/html/body/div[7]/div[1]/input[1]")))

                           

                start_bar.clear()

                start_bar.send_keys(start)




                # Sending the end date

                end_bar = WebDriverWait(driver, 20).until(

                            EC.element_to_be_clickable((By.XPATH,

                            "/html/body/div[7]/div[1]/input[2]")))

                           

                end_bar.clear()

                end_bar.send_keys(end)

              

                # Clicking on the apply button

                apply_button = WebDriverWait(driver,20).until(

                      EC.element_to_be_clickable((By.XPATH,

                      "/html/body/div[7]/div[5]/a")))

               

                apply_button.click()

                sleep(5)

               

                # Getting the tables on the page and quiting

                dataframes = pd.read_html(driver.page_source)

                driver.quit()

                print(f'{currency} scraped.')

                break

           

            except:

                driver.quit()

                print(f'Failed to scrape {currency}. Trying again in 30 seconds.')

                sleep(30)

                Continue

For each DataFrame in this dataframes list, we will check if the name matches, Now we will append this dataframe to the list we assigned in the beginning.

Then we will need to export a csv file. This will be the last step and then we will be over with the extraction.

How to Code a Scraping Bot with Selenium and Python 5

Wrapping up:

This is all about extracting the data from the website.So far this code gets the historical data of the exchange rate of a list of currencies against the dollar and returns a list of DataFrames and several .csv files.

https://www.investing.com/currencies/usd-eur-historical-data

Python Built in Functions List with Syntax and Examples

Python is a Programming Language that has three types of functions namely user-defined functions, lambda functions, built-in functions. In this article, we will deal with the Python Built-in Functions that are ready to use along with their syntax and examples. So without further delay let’s get started with what are python built-in functions, everything you need to know about the python built-in functions list.

What are Python Built-In Functions?

Python Built-in Functions are the ones whose functionality is predefined in the language. Python interpreter provides us with a lot of functions already existing that are ready to use.

List of Built-in Functions in Python

Below is the list of Python Built-in Functions that you need to know. Learn about each one of them along with their respective descriptions through the following list. In order to use them all, you need to do is call them and pass the concerned argument as specified by us in the description of each built-in function in the list below. That’s it they perform the respective operation when executed.

Python String Methods Examples

Python List Methods Examples

Python Dictionary Methods Examples

Python Set Methods Examples

Python Mathematical Methods Examples

Python cmath Methods Examples

Python Statistics Methods Examples

Python Random Methods Examples

Python Calendar Module Examples

Python Itertools Module Examples

Python Collections Module Examples

Python Programs

We wish the information existing on the page with regards to Python Built-in Functions has been extremely useful to you. In case of any doubts do ask us or leave your suggestions through the comment section so that we can get back to you. Keep connected to our site to have latest updates on other Python Functions too in no time.

Stack Tutorial: an Implementation Beginner’s Guide

Prerequisites

In order to understand this stack tutorial, you’ll need to learn the Stack data structure which requires the following:

  1. Python 3
  2. Basic data structure concepts like List (Click here to refresh List concepts)
  3. OOP concepts

What is a Stack?

Howdy! If you are reading this article, you are about to learn one of the very basic and most useful data structure concepts. If you know other languages like C or C++, implementation of stack might be tricky (since you will have to keep track of pointers!) but not with Python. Python is so amazing that you can use lists to easily implement them but you will also learn how to implement using pointers to adopt a language agnostic way. But first, let’s understand what they are. If you are already familiar with this, you can skip to the implementation section.

When you hear the word Stack, the first thing that comes to your mind may be a stack of books, and we can use this analogy to explain stacks easily! Some of the commonalities include:

  1. There is a book at the top of the stack (if there is only one book in the stack, then that will be considered the topmost book).
  2. Only when you remove the topmost book can you get access to the bottom ones. No Jenga games here! (Also assume that you can only lift one book at a time).
  3. Once you remove all the books from the top one by one, there will be none left and hence you cannot remove any more books.

Check out this fun game called Towers of Hanoi which beautifully demonstrates how a Stack works. Read the instructions carefully and turn off the music (it is awfully loud!).

What is a Stack

To describe the above points programmatically:

  1. Keep track of the topmost element as this will give you the information about the number of elements in the stack and whether the stack is empty/full (if the stack is empty then top will be set to 0 or a negative number)
  2. The last element to enter the stack will always be the first to leave (Last In First Out – LIFO)
  3. If all the elements are removed, then the stack is empty and if you try to remove elements from an empty stack, a warning or an error message is thrown.
  4. If the stack has reached its maximum limit and you try to add more elements, a warning or error message is thrown.

Things to remember:

  1. The entry and exit of elements happens only from one end of the stack (top)
  2. Push – Adding an element to the Stack
  3. Pop – Removing an element from the Stack
  4. Random access is not allowed – you cannot add or remove an element from the middle.

Note: Always keep track of the Top. This tells us the status of the stack.

How to implement Stack?

Now that you know what a Stack is, let’s get started with the implementation!

Stack implementation using List

Here we are going to define a class Stack and add methods to perform the below operations:

  1. Push elements into a Stack
  2. Pop elements from a Stack and issue a warning if it’s empty
  3. Get the size of the Stack
  4. Print all the elements of the Stack
class Stack:

    #Constructor creates a list
    def __init__(self):
        self.stack = list()

    #Adding elements to stack
    def push(self,data):
        #Checking to avoid duplicate entries
        if data not in self.stack:
            self.stack.append(data)
            return True
        return False

    #Removing last element from the stack
    def pop(self):
        if len(self.stack)<=0:
            return ("Stack Empty!")
        return self.stack.pop()
        
    #Getting the size of the stack
    def size(self):
        return len(self.stack)

myStack = Stack()
print(myStack.push(5)) #prints True
print(myStack.push(6)) #prints True
print(myStack.push(9)) #prints True
print(myStack.push(5)) #prints False since 5 is there
print(myStack.push(3)) #prints True
print(myStack.size())  #prints 4 
print(myStack.pop())   #prints 3
print(myStack.pop())   #prints 9
print(myStack.pop())   #prints 6
print(myStack.pop())   #prints 5
print(myStack.size())  #prints 0
print(myStack.pop())   #prints Stack Empty!

NOTE: We are not worried about the size of the stack since it is represented by a list which can dynamically change its size.

Stack implementation using Array

Python Lists have made it so easy to implement Stack. However, if you want to implement Stack language agnostically, you have to assume that lists are like arrays (fixed in size) and use a Top pointer to keep a tab on the status of the stack. Check this animation to understand how it works.

Algorithm

  1. Declare a list and an integer MaxSize, denoting the maximum size of the Stack
  2. Top is initially set to 0
  3. Push operation:
    1. Check if Top is less than the MaxSize of the Stack
      1. If yes, append data to stack and increment top by 1
      2. If no, print stack full message
  4. Pop operation:
    1. Check if Top is greater than 0:
      1. If yes, pop the last element from the list and decrement top by 1
      2. If no, print stack empty message
  5. Size operation:
    1. The value of the Top pointer is the size of the Stack

Program

class Stack:
    
    #Constructor 
    def __init__(self):
        self.stack = list()
        self.maxSize = 8
        self.top = 0
    
    #Adds element to the Stack
    def push(self,data):
        if self.top>=self.maxSize:
            return ("Stack Full!")
        self.stack.append(data)
        self.top += 1
        return True
        
    #Removes element from the stack
    def pop(self):
        if self.top<=0:
            return ("Stack Empty!")
        item = self.stack.pop()
        self.top -= 1
        return item
        
    #Size of the stack
    def size(self):
        return self.top

s = Stack()
print(s.push(1))#prints True
print(s.push(2))#prints True
print(s.push(3))#prints True
print(s.push(4))#prints True
print(s.push(5))#prints True
print(s.push(6))#prints True
print(s.push(7))#prints True
print(s.push(8))#prints True
print(s.push(9))#prints Stack Full!
print(s.size())#prints 8        
print(s.pop())#prints 8
print(s.pop())#prints 7
print(s.pop())#prints 6
print(s.pop())#prints 5
print(s.pop())#prints 4
print(s.pop())#prints 3
print(s.pop())#prints 2
print(s.pop())#prints 1
print(s.pop())#prints Stack Empty!

Note: Element 9 was not added and hence size remains 8.

Apart from the methods described above, you can add methods to return the top element, check if the stack is empty etc.

Conclusion

One of the main application of Stacks is in recursion. Be sure to check this tutorial to know what recursion is.  Once you are familiar with this concept, try to solve these puzzles using recursion – sentence reversal and balancing brackets. Happy Pythoning!

How to use Queue: A beginner’s guide

Prerequisites

To learn about the Queue data structure, you should first have a good understanding of the following:

  1. Python 3
  2. Basic data structure concepts like List (Click here to refresh List concepts)
  3. OOP concepts

Introduction

This tutorial will help you understand a Queue data structure and how to implement it. These concepts are often tested in interviews and have a wide variety of applications. Python implementation of Queue is relatively simple when compared to other languages. Here you will learn how to do it in the Pythonic way and in a language agnostic way.

How to use Queue

A Queue is a simple data structure concept that can be easily applied in our day to day life, like when you stand in a line to buy coffee at Starbucks. Let’s make a few observations based on this example:

  1. People enter the line from one end and leave at the other end
  2. The person to arrive first leaves first and the person to arrive last leaves last
  3. Once all the people are served, there are none left waiting to leave the line

How to use Queue A beginner’s guide

Now, let’s look at the above points programmatically:

  1. Queues are open from both ends meaning elements are added from the back and removed from the front
  2. The element to be added first is removed first (First In First Out – FIFO)
  3. If all the elements are removed, then the queue is empty and if you try to remove elements from an empty queue, a warning or an error message is thrown.
  4. If the queue is full and you add more elements to the queue, a warning or error message must be thrown.

Things to remember:

  1. The point of entry and exit are different in a Queue.
  2. Enqueue – Adding an element to a Queue
  3. Dequeue – Removing an element from a Queue
  4. Random access is not allowed – you cannot add or remove an element from the middle.

Implementation

We are going to see three different implementations. One is using Lists, another is using the library deque and the last is by using an array. Let’s take a look one by one…

  1. Queue implementation using List

Here we are going to define a class Queue and add methods to perform the below operations:

  1. Enqueue elements to the beginning of the Queue and issue a warning if it’s full
  2. Dequeue elements from the end of the Queue and issue a warning if it’s empty
  3. Assess the size of the Queue
  4. Print all the elements of the Queue
class Queue:

  #Constructor creates a list
  def __init__(self):
      self.queue = list()

  #Adding elements to queue
  def enqueue(self,data):
      #Checking to avoid duplicate entry (not mandatory)
      if data not in self.queue:
          self.queue.insert(0,data)
          return True
      return False

  #Removing the last element from the queue
  def dequeue(self):
      if len(self.queue)>0:
          return self.queue.pop()
      return ("Queue Empty!")

  #Getting the size of the queue
  def size(self):
      return len(self.queue)

  #printing the elements of the queue
  def printQueue(self):
      return self.queue

myQueue = Queue()
print(myQueue.enqueue(5)) #prints True
print(myQueue.enqueue(6)) #prints True
print(myQueue.enqueue(9)) #prints True
print(myQueue.enqueue(5)) #prints False
print(myQueue.enqueue(3)) #prints True
print(myQueue.size())     #prints 4
print(myQueue.dequeue())  #prints 5
print(myQueue.dequeue())  #prints 6
print(myQueue.dequeue())  #prints 9
print(myQueue.dequeue())  #prints 3
print(myQueue.size())     #prints 0
print(myQueue.dequeue())  #prints Queue Empty!

Call the method printQueue() wherever necessary to ensure that it’s working.

Note: You will notice that we are not removing elements from the beginning and adding elements at the end. The reason for this is covered in the ‘implementation using arrays’ section below.

  1. Queue implementation using deque

Deque is a library which, when imported, provides ready-made commands such as the append() command for enqueuing and the popleft() command for dequeuing.

#Importing the library
from collections import deque

#Creating a Queue
queue = deque([1,5,8,9])

#Enqueuing elements to the Queue
queue.append(7) #[1,5,8,9,7]
queue.append(0) #[1,5,8,9,7,0]

#Dequeuing elements from the Queue
queue.popleft() #[5,8,9,7,0]
queue.popleft() #[8,7,9,0]

#Printing the elements of the Queue
print(queue)

Try using the popleft() command after the queue is empty and see what you get. Post the ways in which you can handle this issue.

NOTE: Implementation 2 is more efficient as the insert operation in the list is costly! This is because whenever an element is inserted at position 0, all of the other elements have to be shifted by one position (it is similar to when people sitting on a bench push down to accommodate another person). We will discuss the efficiency of operations in detail in later tutorials.

  1. Queue implementation using array

Python Lists have made it so easy to implement Queue. However, if you want to implement Queues language agnostically, you have to bear in mind the following points:

  1. Elements are added from the end and removed at the beginning of the Queue.
  2. Treat lists like arrays (fixed in size) – we can achieve it by virtually restricting the size of the list. This is done by ensuring that the list doesn’t grow beyond a fixed limit or size.
  3. Use a Tail pointer to keep a tab of the elements added to the Queue – the Tail pointer will always point to the next available space. For instance when there are three elements in the queue, Tail will point to the fourth place. When the queue is full, the tail pointer will be greater than the declared size.
  4. Use a Head pointer to keep a tab on the elements removed from the Queue – the Head pointer will point to the element to be dequeued next. For instance, if there are three elements in a queue, the Head pointer will be pointing to the first element. After one dequeue operation, the Head pointer will point to the second element in the queue. No element will be actually removed from the queue. This is because once an element is removed, the list automatically shifts all the other elements by one position to the left. This means that the position 0 will always contain an element, which is not how an actual queue works.
  5. Use a Reset method – this method is called to reset the queue, Tail and Head. For instance, if there are three elements in the queue then Head = 0, Tail = 4. Now, if we dequeue all three elements, the queue will be empty meaning Head = Tail = 4. So if you enqueue an element, it will happen at position 4 which is not correct. Hence it becomes necessary to reset these pointers to 0. Note that since we are not actually deleting elements, the list still contains the ‘deleted” elements, hence a new list needs to be created as well.

Algorithm

  1. Declare a list and an integer MaxSize, denoting a virtual maximum size of the Queue
  2. Head and Tail are initially set to 0
  3. Size method
    1.  Calculates the number of elements in the queue -> Size = Tail – Head
  4. Reset method:
    1. Resets Tail and Head to 0
    2. Creates a new Queue (initializes queue to a new list)
  5. Enqueue operation:
    1. Check if Size is less than the MaxSize:
      1. If yes, append data to Queue and then increment Tail by 1
      2. If no, print Queue full message
  6. Dequeue operation:
    1. Check if Size is greater than 0:
      1. If yes, pop the first element from the list and increment Head by 1
      2. If no:
        1. Call Reset method
        2. Print Queue empty message

Program

class Queue:

    #Constructor
    def __init__(self):
        self.queue = list()
        self.maxSize = 8
        self.head = 0
        self.tail = 0

    #Adding elements
    def enqueue(self,data):
        #Checking if the queue is full
        if self.size() >= self.maxSize:
            return ("Queue Full")
        self.queue.append(data)
        self.tail += 1
        return True     

    #Deleting elements 
    def dequeue(self):
        #Checking if the queue is empty
        if self.size() <= 0:
            self.resetQueue()
            return ("Queue Empty") 
        data = self.queue[self.head]
        self.head+=1
        return data
                
    #Calculate size
    def size(self):
        return self.tail - self.head
    
    #Reset queue
    def resetQueue(self):
        self.tail = 0
        self.head = 0
        self.queue = list()
    
q = Queue()
print(q.enqueue(1))#prints True
print(q.enqueue(2))#prints True
print(q.enqueue(3))#prints True
print(q.enqueue(4))#prints True
print(q.enqueue(5))#prints True
print(q.enqueue(6))#prints True
print(q.enqueue(7))#prints True
print(q.enqueue(8))#prints True
print(q.enqueue(9))#prints Queue Full!
print(q.size())#prints 8        
print(q.dequeue())#prints 8
print(q.dequeue())#prints 7 
print(q.dequeue())#prints 6
print(q.dequeue())#prints 5
print(q.dequeue())#prints 4
print(q.dequeue())#prints 3
print(q.dequeue())#prints 2
print(q.dequeue())#prints 1
print(q.dequeue())#prints Queue Empty
#Queue is reset here 
print(q.enqueue(1))#prints True
print(q.enqueue(2))#prints True
print(q.enqueue(3))#prints True
print(q.enqueue(4))#prints True

Note: Element 9 was not added to the Queue and hence the size of the Queue remains 8

Apart from the methods described above, you can add methods which could return the element at the start of the queue, check if the queue is empty etc.

Conclusion

That’s it for this tutorial. Be sure to learn the applications of Queue. Plus, stay tuned with us here on PythonCentral to learn more about other types of Queues like Circular Queue and Priority Queue. Happy Pythoning!

Circular Queue: An implementation tutorial

Prerequisites

To learn about Circular Queue, you should first have a good understanding of the following:

  1. Python 3
  2. Linear Queues (you can learn more here)
  3. Basic Python data structure concepts – lists
  4. Basic math operations – modulo(%)

What is a Circular Queue?

Before you go ahead and read this tutorial, I highly recommend you to read our previous tutorial on Queues as we will be building off of those concepts. Circular Queues are widely used and are often tested on job interviews. A Circular Queue can be seen as an improvement over the Linear Queue because:

  1. There is no need to reset Head and Tail pointers since they reset themselves. This means that once the Head or Tail reaches the end of the Queue, it resets itself to 0.
  2. The Tail and Head can point to the same location – this means the Queue is empty
  3. The Head can be greater than the Tail or vice-versa. This is possible because the Head and Tail pointers are allowed to cross each other.

Check out this animation to understand the circular queue a bit better.

Observations based on the above animation:

  1. Head pointer – Points to the front of the Queue. Or in other words, it points to the element to be removed if the dequeue operation is called.
  2. Tail pointer – Points to the next empty spot in which the new element can be inserted. In the above animation, if you tried to fill the queue completely you wouldn’t be able to enqueue after the 13th position. This is because the Tail has no empty spot to point to after an element is inserted in the 14th position. The queue is considered full, even though there is one empty spot left. You should also try doing three or four dequeue operations and then enqueueing an element. Here you will observe that the elements are inserted from the 14th position and then it restarts from 0. It is for this reason that it is called a circular queue.
  3. Number of elements:
    1. Tail>=Head: Number of elements = Tail – Head. For instance, if Head = 2 and Tail = 5, then the number of elements will be 5 – 2 = 3
    2. Head>Tail: Number of elements = (Size of Queue) – (Head-Tail) =  (Size of Queue) – Head + Tail. For instance, Head = 14, Tail = 5 and Size of Queue = 15, then the number of elements = 15 – (14 – 5) = 6

How to implement circular queue?

I hope you now feel confident that you know what a circular queue is. Let’s see how to implement it using the language agnostic method. To do this, we need to treat Lists like arrays, hence we will restrict its size.

Note: During a dequeue operation, the Head pointer will be incremented by 1 but no element will actually be removed from the queue. This is because once an element is removed, the list automatically shifts all the other elements by one position to the left. This means that the position 0 will always contain an element which is not how an actual queue/circular queue works.

Algorithm

The following steps can be seen as a flow chart to the operation of the Circular Queue:

  1. Initialize the queue, size of the queue (maxSize), head and tail pointers
  2. Enqueue:
    1. Check if the number of elements (size) is equal to the size of the queue (maxSize):
      1. If yes, throw error message “Queue Full!”
      2. If no, append the new element and increment the tail pointer
  3. Dequeue:
    1. Check if the number of elements (size) is equal to 0:
      1. If yes, throw error message “Queue Empty!”
      2. If no, increment head pointer
  4. Size:
    1. If tail>=head, size = tail – head
    2. if head>tail, size = maxSize – (head – tail)

Note: The range for the head and tail pointer should be between 0 and maxSize – 1,  hence we are using the logic that if we divide x by 5, then the remainder can never be greater than 5. In other words, it should be between 0 and 4. So apply this logic to the formulae tail = (tail+1)%maxSize and head = (head+1)%maxSize. Observe that this is helps us to avoid reinitializing tail and head to 0 when the queue becomes full.

Program

class CircularQueue:

    #Constructor
    def __init__(self):
        self.queue = list()
        self.head = 0
        self.tail = 0
        self.maxSize = 8

    #Adding elements to the queue
    def enqueue(self,data):
        if self.size() == self.maxSize-1:
            return ("Queue Full!")
        self.queue.append(data)
        self.tail = (self.tail + 1) % self.maxSize
        return True

    #Removing elements from the queue
    def dequeue(self):
        if self.size()==0:
            return ("Queue Empty!") 
        data = self.queue[self.head]
        self.head = (self.head + 1) % self.maxSize
        return data

    #Calculating the size of the queue
    def size(self):
        if self.tail>=self.head:
            return (self.tail-self.head)
        return (self.maxSize - (self.head-self.tail))

q = CircularQueue()
print(q.enqueue(1))
print(q.enqueue(2))
print(q.enqueue(3))
print(q.enqueue(4))
print(q.enqueue(5))
print(q.enqueue(6))
print(q.enqueue(7))
print(q.enqueue(8))
print(q.enqueue(9))
print(q.dequeue())
print(q.dequeue())
print(q.dequeue())
print(q.dequeue())
print(q.dequeue())
print(q.dequeue())
print(q.dequeue())
print(q.dequeue())
print(q.dequeue())

Application

There are several uses for Circular Queues, such as:

  1. Computer Architecture (Scheduler)
  2. Disk drivers
  3. Video buffering
  4. Printer job scheduling

Conclusion

Circular Queue may appear a little confusing at first but the only way to get the hang of it is to keep practicing. Try out different enqueue and dequeue operations in the animation link provided above to see how it works. That’s it for this tutorial. Happy Pythoning!

Search implementations: Linear and Binary

Prerequisites

To learn about Linear and Binary search, you’ll need to have a good understanding of:

  1. Python 3
  2. Basic Python data structure concepts – lists

Introduction

Often we will have to find an element from a given data structure like lists, linked lists or binary trees. An efficient searching technique saves a great amount of time and improves performance. In this tutorial, we are going to see two very commonly used searching algorithms.

Linear Search

So if you were to search “Harry Potter”, from a shelf in a poorly lit room how would you go about it? You would start at one end, take one book at a time and check if it’s Harry Potter or not. You will use a brute force methodology which checks every book until the Harry Potter is found or the end of the shelf is reached. Best case scenario would be when Harry Potter is the first book and worst case would be the book not in there at all. Either way, you can know this only by checking each and every book. This is exactly what Linear Search is.

Binary Search

What if you were to search for Harry Potter from a shelf of books which are ordered alphabetically? Would you start searching from the start? Definitely not! You would start somewhere near the middle and check the first letter of the books and then go back or forward from there to find the ‘H’ titles. This means that you won’t be looking at all the books, thus saving time. You also won’t need to go through the entire shelf to know whether the book is there or not. At a given point you are eliminating many books which you will never look at. Binary Search is similar to this.

NOTE: Linear Search can be done on both sorted and unsorted items but Binary Search can only be done on a sorted set of items.

Implementation

Now that you know what Linear and Binary Search methodologies are, let us look at how these searches would work on a list of numbers.

Linear Search

Given list A = [6,3,9,0,5,8,2] find if 0 is present in this list or not.

Algorithm

  1. Take one number at a time from list A
  2. Compare it with 0 and check if it is a match:
    1. If yes, return True
    2. If not, return False
  3. If the end of the list is met, return False

Code

def linearSearch(ls,data):

   for item in ls:
       if item == data:
           return True
   return False

print(linearSearch([6,3,9,5,8,2],0))

Binary Search

The idea is to keep comparing the element with the middle value. This way with each search we eliminate one half of the list.

Algorithm

  1. Keep track of two pointers First and Last, these are incremented or decremented to limit the part of the list to be searched.
  2. Find the middle element of the list: mid = ( length of the list )/2
  3. Compare the middle element with the value to be found
  4. Check if the middle element is lesser than the value to be found:
    1. If yes, the element must lie on the second half of the list
    2. If no, the element must lie on the first half of the list
  5. Repeat steps 1 through 3 until the element is found or the end of the list is reached

NOTE: The list continues to get divided into two and the middle element gets compared until the element is found or no more elements are left to compare with.

Code

Given list B = [2,5,7,8,9,11,14,16] find if 14 is present in this list or not.

def binarySearch(ls,data):
   first = 0
   last = len(ls)-1
   done = False   

    while first<=last and not done:
       mid = (first+last)//2
          if ls[mid] == data:
           done = True
       else:
           if ls[mid] > data:
               last = last-1
           else:
               first = first+1
   return done 

print(binarySearch([2,5,7,8,9,11,14,16],4))
Round First Last Mid ls[mid] Result
1 0 7 3 8 8<14
2 4 7 5 11 11<14
3 6 7 6 14 14=14

NOTE: To find the mid element, “(first+last)//2” is used instead of “(first+last)/2”. This gives us whole numbers especially when the length of the list is odd. Try 9/2 and 9//2 in your Python IDLE to get a better understanding.

Try out this animation for a better understanding of both these search algorithms. Try to find the first element of the array. Which is faster?

Conclusion

From the above explanation, it must be clear that Binary Search is consistently faster than Linear Search. If you are familiar with o(n) notation, Linear Search is o(n) and Binary Search is log(n)

You can perform Linear Searches on a sorted list as well with a small optimization. You can stop the search once the number to be found exceeds the element being compared to. For instance, you want to search for 12 from the list [2,4,5,13,16,19], once you reach 13 you can stop because there is no way that 12 is going to come after 13 in a sorted list.

In this tutorial, we have discussed only the iterative method of Binary Search. Try to implement the recursive approach on your own.  Also, learn about the uses of these searches and when to apply them. That’s all for this tutorial. Happy Pythoning!