{"id":2640,"date":"2021-04-17T11:48:24","date_gmt":"2021-04-17T06:18:24","guid":{"rendered":"https:\/\/python-programs.com\/?p=2640"},"modified":"2021-11-22T18:45:13","modified_gmt":"2021-11-22T13:15:13","slug":"building-an-rss-feed-scraper-with-python","status":"publish","type":"post","link":"https:\/\/python-programs.com\/building-an-rss-feed-scraper-with-python\/","title":{"rendered":"Building an RSS feed Scraper with Python"},"content":{"rendered":"

What is RSS?<\/h2>\n

RSS stands for Really Simple Syndication or Rich Site Summary. It is a type of web feed that allows users and applications to receive regular updates from a website or blog of their choice. Various website use their RSS feed to publish the frequently updated information like blog entries, news headlines etc, So this is where RSS feeds are mainly used.<\/p>\n

So we can use that RSS feed to extract some important information from a particular website. In this article I will be showing how you will extract RSS feeds of any website.<\/p>\n

Installing packages<\/h3>\n

You can install all packages using pip\u00a0like the example below.<\/p>\n

 pip install requests\r\n pip install bs4<\/pre>\n

Importing\u00a0 libraries:<\/h3>\n

Now our project setup is ready, we can start writing the code.<\/p>\n

Within our rssScrapy.py we\u2019ll import the packages we\u2019ve installed using pip.<\/p>\n

import requests\r\n\r\nfrom bs4 import BeautifulSoup<\/pre>\n

The above package will allow us to use the functions given to us by the Requests and BeautifulSoup libraries.<\/p>\n

I am going to use the RSS feeds of a news website called Times of India.<\/p>\n

Link-\"https:\/\/timesofindia.indiatimes.com\/rssfeeds\/1221656.cms\"<\/pre>\n

This is basically an XML file.<\/p>\n

\"Building-an-RSS-feed-scraper-with-Python_xml-file\"<\/p>\n

So now I am going to show you how this particular xml file will scrape.<\/p>\n

\n
import requests\r\nfrom bs4 import BeautifulSoup\r\nurl=\"https:\/\/timesofindia.indiatimes.com\/rssfeeds\/1221656.cms\"\r\nresp=requests.get(url)\r\nsoup=BeautifulSoup(resp.content,features=\"xml\")\r\nprint(soup.prettify())<\/pre>\n

I have imported all necessary libraries.I have also defined url which give me link for news website RSS feed after that for get request I made resp object where I have pass that url.<\/p>\n

Now we have response object and we have also a beautiful soup object with me.Bydefault beautiful soup parse html file but we want xml file so we used features=”xml”.So now let me just show you the xml file we have parsed.<\/p>\n

\"Building-an-RSS-feed-scraper-with-Python_output\"<\/p>\n<\/div>\n

We dont nedd all the data having in it.We want news description,title,publish date right.So for this we are going to create a list which contains all the content inside item tags.For this we have used \u00a0\u00a0items=soup.findAll('item')\u00a0<\/code><\/p>\n

You can also check the length of items using this len(items)<\/code><\/p>\n

So now I am writing whole code for scrapping the news RSS feed-<\/p>\n

import requests\r\nfrom bs4 import BeautifulSoup\r\nurl=\"https:\/\/timesofindia.indiatimes.com\/rssfeeds\/1221656.cms\"\r\nresp=requests.get(url)\r\nsoup=BeautifulSoup(resp.content,features=\"xml\")\r\nitems=soup.findAll('item')\r\nitem=items[0]\r\nnews_items=[]\r\nfor item in items:\r\n    news_item={}\r\n    news_item['title']=item.title.text\r\n    news_item['description']=item.description.text\r\n    news_item['link']=item.link.text\r\n    news_item['guid']=item.guid.text\r\n    news_item['pubDate']=item.pubDate.text\r\n    news_items.append(news_item)\r\nprint(news_items[2])<\/pre>\n

So we can see that I have used item.title.text<\/code>for scrapping title because item is parent class and title is child class similarly we do for rest.<\/p>\n

Each of the articles available on the RSS feed\u00a0 containing all information within item<\/em> tags <item>...<\/item>.<\/code>
\nand follows the below structure-<\/p>\n

<item>\r\n    <title>...<\/title>\r\n    <link>...<\/link>\r\n    <pubDate>...<\/pubDate>\r\n    <comments>...<\/comments>\r\n    <description>...<\/description>\r\n<\/item><\/pre>\n

We\u2019ll be taking advantage of the consistent item<\/em>\u00a0tags to parse our information.<\/p>\n

I have also make an empty list news_items which append all in it.<\/p>\n

So this is how we can parse particularly news item.<\/p>\n

\"Building-an-RSS-feed-scraper-with-Python_final-output\"<\/p>\n

Conclusion:<\/h3>\n

We have successfully created an RSS feed scraping tool using Python, Requests, and BeautifulSoup. This allows us to parse XML information into a suitable format for us to work with in the future.<\/p>\n","protected":false},"excerpt":{"rendered":"

What is RSS? RSS stands for Really Simple Syndication or Rich Site Summary. It is a type of web feed that allows users and applications to receive regular updates from a website or blog of their choice. Various website use their RSS feed to publish the frequently updated information like blog entries, news headlines etc, …<\/p>\n

Building an RSS feed Scraper with Python<\/span> Read More »<\/a><\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true},"categories":[5],"tags":[],"yoast_head":"\nBuilding an RSS feed Scraper with Python - Python Programs<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/python-programs.com\/building-an-rss-feed-scraper-with-python\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Building an RSS feed Scraper with Python - Python Programs\" \/>\n<meta property=\"og:description\" content=\"What is RSS? RSS stands for Really Simple Syndication or Rich Site Summary. It is a type of web feed that allows users and applications to receive regular updates from a website or blog of their choice. Various website use their RSS feed to publish the frequently updated information like blog entries, news headlines etc, … Building an RSS feed Scraper with Python Read More »\" \/>\n<meta property=\"og:url\" content=\"https:\/\/python-programs.com\/building-an-rss-feed-scraper-with-python\/\" \/>\n<meta property=\"og:site_name\" content=\"Python Programs\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/btechgeeks\" \/>\n<meta property=\"article:published_time\" content=\"2021-04-17T06:18:24+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-11-22T13:15:13+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/python-programs.com\/wp-content\/uploads\/2021\/04\/Building-an-RSS-feed-scraper-with-Python_xml-file-e1618574028930.png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@btech_geeks\" \/>\n<meta name=\"twitter:site\" content=\"@btech_geeks\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Shikha Mishra\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Organization\",\"@id\":\"https:\/\/python-programs.com\/#organization\",\"name\":\"BTech Geeks\",\"url\":\"https:\/\/python-programs.com\/\",\"sameAs\":[\"https:\/\/www.instagram.com\/btechgeeks\/\",\"https:\/\/www.linkedin.com\/in\/btechgeeks\",\"https:\/\/in.pinterest.com\/btechgeek\/\",\"https:\/\/www.youtube.com\/channel\/UC9MlCqdJ3lKqz2p5114SDIg\",\"https:\/\/www.facebook.com\/btechgeeks\",\"https:\/\/twitter.com\/btech_geeks\"],\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/python-programs.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/python-programs.com\/wp-content\/uploads\/2020\/11\/BTechGeeks.png\",\"contentUrl\":\"https:\/\/python-programs.com\/wp-content\/uploads\/2020\/11\/BTechGeeks.png\",\"width\":350,\"height\":70,\"caption\":\"BTech Geeks\"},\"image\":{\"@id\":\"https:\/\/python-programs.com\/#\/schema\/logo\/image\/\"}},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/python-programs.com\/#website\",\"url\":\"https:\/\/python-programs.com\/\",\"name\":\"Python Programs\",\"description\":\"Python Programs with Examples, How To Guides on Python\",\"publisher\":{\"@id\":\"https:\/\/python-programs.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/python-programs.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/python-programs.com\/building-an-rss-feed-scraper-with-python\/#primaryimage\",\"url\":\"https:\/\/python-programs.com\/wp-content\/uploads\/2021\/04\/Building-an-RSS-feed-scraper-with-Python_xml-file-e1618574028930.png\",\"contentUrl\":\"https:\/\/python-programs.com\/wp-content\/uploads\/2021\/04\/Building-an-RSS-feed-scraper-with-Python_xml-file-e1618574028930.png\",\"width\":1878,\"height\":972,\"caption\":\"Building-an-RSS-feed-scraper-with-Python_xml-file\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/python-programs.com\/building-an-rss-feed-scraper-with-python\/#webpage\",\"url\":\"https:\/\/python-programs.com\/building-an-rss-feed-scraper-with-python\/\",\"name\":\"Building an RSS feed Scraper with Python - Python Programs\",\"isPartOf\":{\"@id\":\"https:\/\/python-programs.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/python-programs.com\/building-an-rss-feed-scraper-with-python\/#primaryimage\"},\"datePublished\":\"2021-04-17T06:18:24+00:00\",\"dateModified\":\"2021-11-22T13:15:13+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/python-programs.com\/building-an-rss-feed-scraper-with-python\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/python-programs.com\/building-an-rss-feed-scraper-with-python\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/python-programs.com\/building-an-rss-feed-scraper-with-python\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/python-programs.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Building an RSS feed Scraper with Python\"}]},{\"@type\":\"Article\",\"@id\":\"https:\/\/python-programs.com\/building-an-rss-feed-scraper-with-python\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/python-programs.com\/building-an-rss-feed-scraper-with-python\/#webpage\"},\"author\":{\"@id\":\"https:\/\/python-programs.com\/#\/schema\/person\/7a690ac49394cc96d2e839bf9a746594\"},\"headline\":\"Building an RSS feed Scraper with Python\",\"datePublished\":\"2021-04-17T06:18:24+00:00\",\"dateModified\":\"2021-11-22T13:15:13+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/python-programs.com\/building-an-rss-feed-scraper-with-python\/#webpage\"},\"wordCount\":460,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/python-programs.com\/#organization\"},\"image\":{\"@id\":\"https:\/\/python-programs.com\/building-an-rss-feed-scraper-with-python\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/python-programs.com\/wp-content\/uploads\/2021\/04\/Building-an-RSS-feed-scraper-with-Python_xml-file-e1618574028930.png\",\"articleSection\":[\"Python\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/python-programs.com\/building-an-rss-feed-scraper-with-python\/#respond\"]}]},{\"@type\":\"Person\",\"@id\":\"https:\/\/python-programs.com\/#\/schema\/person\/7a690ac49394cc96d2e839bf9a746594\",\"name\":\"Shikha Mishra\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/python-programs.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/10a27cfafdf21564c686b80411336ece?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/10a27cfafdf21564c686b80411336ece?s=96&d=mm&r=g\",\"caption\":\"Shikha Mishra\"},\"url\":\"https:\/\/python-programs.com\/author\/shikha\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Building an RSS feed Scraper with Python - Python Programs","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/python-programs.com\/building-an-rss-feed-scraper-with-python\/","og_locale":"en_US","og_type":"article","og_title":"Building an RSS feed Scraper with Python - Python Programs","og_description":"What is RSS? RSS stands for Really Simple Syndication or Rich Site Summary. It is a type of web feed that allows users and applications to receive regular updates from a website or blog of their choice. Various website use their RSS feed to publish the frequently updated information like blog entries, news headlines etc, … Building an RSS feed Scraper with Python Read More »","og_url":"https:\/\/python-programs.com\/building-an-rss-feed-scraper-with-python\/","og_site_name":"Python Programs","article_publisher":"https:\/\/www.facebook.com\/btechgeeks","article_published_time":"2021-04-17T06:18:24+00:00","article_modified_time":"2021-11-22T13:15:13+00:00","og_image":[{"url":"https:\/\/python-programs.com\/wp-content\/uploads\/2021\/04\/Building-an-RSS-feed-scraper-with-Python_xml-file-e1618574028930.png"}],"twitter_card":"summary_large_image","twitter_creator":"@btech_geeks","twitter_site":"@btech_geeks","twitter_misc":{"Written by":"Shikha Mishra","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Organization","@id":"https:\/\/python-programs.com\/#organization","name":"BTech Geeks","url":"https:\/\/python-programs.com\/","sameAs":["https:\/\/www.instagram.com\/btechgeeks\/","https:\/\/www.linkedin.com\/in\/btechgeeks","https:\/\/in.pinterest.com\/btechgeek\/","https:\/\/www.youtube.com\/channel\/UC9MlCqdJ3lKqz2p5114SDIg","https:\/\/www.facebook.com\/btechgeeks","https:\/\/twitter.com\/btech_geeks"],"logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/python-programs.com\/#\/schema\/logo\/image\/","url":"https:\/\/python-programs.com\/wp-content\/uploads\/2020\/11\/BTechGeeks.png","contentUrl":"https:\/\/python-programs.com\/wp-content\/uploads\/2020\/11\/BTechGeeks.png","width":350,"height":70,"caption":"BTech Geeks"},"image":{"@id":"https:\/\/python-programs.com\/#\/schema\/logo\/image\/"}},{"@type":"WebSite","@id":"https:\/\/python-programs.com\/#website","url":"https:\/\/python-programs.com\/","name":"Python Programs","description":"Python Programs with Examples, How To Guides on Python","publisher":{"@id":"https:\/\/python-programs.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/python-programs.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/python-programs.com\/building-an-rss-feed-scraper-with-python\/#primaryimage","url":"https:\/\/python-programs.com\/wp-content\/uploads\/2021\/04\/Building-an-RSS-feed-scraper-with-Python_xml-file-e1618574028930.png","contentUrl":"https:\/\/python-programs.com\/wp-content\/uploads\/2021\/04\/Building-an-RSS-feed-scraper-with-Python_xml-file-e1618574028930.png","width":1878,"height":972,"caption":"Building-an-RSS-feed-scraper-with-Python_xml-file"},{"@type":"WebPage","@id":"https:\/\/python-programs.com\/building-an-rss-feed-scraper-with-python\/#webpage","url":"https:\/\/python-programs.com\/building-an-rss-feed-scraper-with-python\/","name":"Building an RSS feed Scraper with Python - Python Programs","isPartOf":{"@id":"https:\/\/python-programs.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/python-programs.com\/building-an-rss-feed-scraper-with-python\/#primaryimage"},"datePublished":"2021-04-17T06:18:24+00:00","dateModified":"2021-11-22T13:15:13+00:00","breadcrumb":{"@id":"https:\/\/python-programs.com\/building-an-rss-feed-scraper-with-python\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/python-programs.com\/building-an-rss-feed-scraper-with-python\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/python-programs.com\/building-an-rss-feed-scraper-with-python\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/python-programs.com\/"},{"@type":"ListItem","position":2,"name":"Building an RSS feed Scraper with Python"}]},{"@type":"Article","@id":"https:\/\/python-programs.com\/building-an-rss-feed-scraper-with-python\/#article","isPartOf":{"@id":"https:\/\/python-programs.com\/building-an-rss-feed-scraper-with-python\/#webpage"},"author":{"@id":"https:\/\/python-programs.com\/#\/schema\/person\/7a690ac49394cc96d2e839bf9a746594"},"headline":"Building an RSS feed Scraper with Python","datePublished":"2021-04-17T06:18:24+00:00","dateModified":"2021-11-22T13:15:13+00:00","mainEntityOfPage":{"@id":"https:\/\/python-programs.com\/building-an-rss-feed-scraper-with-python\/#webpage"},"wordCount":460,"commentCount":0,"publisher":{"@id":"https:\/\/python-programs.com\/#organization"},"image":{"@id":"https:\/\/python-programs.com\/building-an-rss-feed-scraper-with-python\/#primaryimage"},"thumbnailUrl":"https:\/\/python-programs.com\/wp-content\/uploads\/2021\/04\/Building-an-RSS-feed-scraper-with-Python_xml-file-e1618574028930.png","articleSection":["Python"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/python-programs.com\/building-an-rss-feed-scraper-with-python\/#respond"]}]},{"@type":"Person","@id":"https:\/\/python-programs.com\/#\/schema\/person\/7a690ac49394cc96d2e839bf9a746594","name":"Shikha Mishra","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/python-programs.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/10a27cfafdf21564c686b80411336ece?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/10a27cfafdf21564c686b80411336ece?s=96&d=mm&r=g","caption":"Shikha Mishra"},"url":"https:\/\/python-programs.com\/author\/shikha\/"}]}},"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/python-programs.com\/wp-json\/wp\/v2\/posts\/2640"}],"collection":[{"href":"https:\/\/python-programs.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/python-programs.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/python-programs.com\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/python-programs.com\/wp-json\/wp\/v2\/comments?post=2640"}],"version-history":[{"count":4,"href":"https:\/\/python-programs.com\/wp-json\/wp\/v2\/posts\/2640\/revisions"}],"predecessor-version":[{"id":2986,"href":"https:\/\/python-programs.com\/wp-json\/wp\/v2\/posts\/2640\/revisions\/2986"}],"wp:attachment":[{"href":"https:\/\/python-programs.com\/wp-json\/wp\/v2\/media?parent=2640"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/python-programs.com\/wp-json\/wp\/v2\/categories?post=2640"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/python-programs.com\/wp-json\/wp\/v2\/tags?post=2640"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}