{"id":27159,"date":"2022-04-06T22:04:33","date_gmt":"2022-04-06T16:34:33","guid":{"rendered":"https:\/\/python-programs.com\/?p=27159"},"modified":"2022-04-06T22:04:33","modified_gmt":"2022-04-06T16:34:33","slug":"chunking-text-using-enchant-in-python","status":"publish","type":"post","link":"https:\/\/python-programs.com\/chunking-text-using-enchant-in-python\/","title":{"rendered":"Chunking Text using Enchant in Python"},"content":{"rendered":"<p><strong>Enchant Module in Python:<\/strong><\/p>\n<p>Enchant is a Python module that checks a word\u2019s spelling and provides suggestions for correcting it. The antonyms and synonyms of the words are also provided. It determines whether or not a word is in the dictionary.<\/p>\n<p>To tokenize text, Enchant also provides the<strong> enchant.tokenize<\/strong> module. Tokenizing is the process of separating\/splitting words from the body of a text. However, not all words must be tokenized at all times. Assume we have an HTML file; upon tokenization, all tags will be included. Typically, HTML tags do not contribute to the content of the article, so there is a need to tokenize by excluding them.<\/p>\n<p>HTMLChunker is the only chunker that is currently implemented here.<\/p>\n<p>In simple words, here we exclude the <strong>HTML Tags<\/strong> during the Tokenization.<\/p>\n<h2>Chunking Text using Enchant in Python<\/h2>\n<p><strong>Approach:<\/strong><\/p>\n<ul>\n<li>Import get_tokenizer function from the enchant.tokenize module using the import keyword<\/li>\n<li>Import HTMLChunker function from the enchant.tokenize module using the import keyword<\/li>\n<li>Give the text with HTML tags to be tokenized as static input and store it in a variable.<\/li>\n<li>Pass the language code as an argument to the get_tokenizer() function to get the tokenizer class and store it in another variable.<\/li>\n<li>Take an empty list to store the tokens.<\/li>\n<li>Printing the tokens of the given text without chunking<\/li>\n<li>Loop in each word of the token text by passing the given text as an argument to the tokenizer using the for loop<\/li>\n<li>Add\/append the token words to the newly created list using the append() function<\/li>\n<li>Print the token list(It prints the tokens with the position without chunking)<\/li>\n<li>Here chunking means excluding the HTML tokens<\/li>\n<li>Pass the language code, type of chunkers(HTMLChunker) as an argument to the get_tokenizer() function to get the tokenizer class with chunk(HTML Chunking) and store it in another variable.<\/li>\n<li>Take an empty list to store the tokens with chunking.<\/li>\n<li>Printing the tokens of the given text after chunking<\/li>\n<li>Loop in each word of the token text with chunking by passing the given text as an argument to tokenizer_withchunking using the for loop<\/li>\n<li>Add\/append the token words with chunking to the newly created chunking list using the append() function<\/li>\n<li>Print the token list with chunking(It prints the tokens with the position with HTML chunking)<\/li>\n<li>Here HTML chunking means excluding the HTML tags while tokenization.<\/li>\n<li>The Exit of the Program.<\/li>\n<\/ul>\n<p><strong>Below is the implementation:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\"># Import get_tokenizer function from the enchant.tokenize module using the import keyword\r\nfrom enchant.tokenize import get_tokenizer\r\n# Import HTMLChunker function from the enchant.tokenize module using the import keyword\r\nfrom enchant.tokenize import HTMLChunker\r\n\r\n# Give the text with HTML tags to be tokenized as static input and store it in a variable.\r\ngvn_txt = \"&lt;div&gt; &lt;h2&gt; welcome to Python-programs &lt;\/h2&gt; &lt;br&gt; &lt;\/div&gt;\"\r\n\r\n# Pass the language code as an argument to the get_tokenizer() function to \r\n# get the tokenizer class and store it in another variable.\r\ntokenizer = get_tokenizer(\"en_US\")\r\n\r\n# Take an empty list to store the tokens.\r\ntokens_lst =[]\r\n\r\n# Printing the tokens of the given text without chunking\r\nprint(\"The tokens of the given text without chunking:\")\r\n# Loop in each words of the token text by passing the given text as argument to tokenizer using the for loop\r\nfor wrds in tokenizer(gvn_txt):\r\n    # Add\/append the token words to the newly created list using the append() function\r\n    tokens_lst.append(wrds)\r\n    \r\n# Print the token list(It prints the tokens with the position without chunking) \r\n# Here chunking means excluding the HTML tokens\r\nprint(tokens_lst)\r\n\r\n \r\n# Pass the language code, type of chunkers(HTMLChunker) as an argument to the get_tokenizer() function to \r\n# get the tokenizer class with chunk(HTML Chunking) and store it in another variable.\r\ntokenizer_withchunking = get_tokenizer(\"en_US\", chunkers = (HTMLChunker, ))\r\nprint()\r\n\r\n# Take an empty list to store the tokens with chunking\r\ntokenslist_chunk = []\r\n\r\n# Printing the tokens of the given text after chunking\r\nprint(\"The tokens of the given text after chunking:\")\r\n\r\n# Loop in each words of the token text with chunking by passing the given text as argument to tokenizer_withchunking\r\n# using the for loop\r\nfor wrds in tokenizer_withchunking(gvn_txt):\r\n     # Add\/append the token words with chunking to the newly created chunking list using the append() function\r\n    tokenslist_chunk.append(wrds)\r\n\r\n# Print the token list with chunking(It prints the tokens with the position with HTML chunking) \r\n# Here HTML chunking means excluding the HTML tags while tokenization\r\nprint(tokenslist_chunk)\r\n<\/pre>\n<p><strong>Output:<\/strong><\/p>\n<pre>The tokens of the given text without chunking:\r\n[('div', 1), ('h', 7), ('welcome', 11), ('to', 19), ('Python', 22), ('programs', 29), ('h', 40), ('br', 45), ('div', 51)]\r\n\r\nThe tokens of the given text after chunking:\r\n[('welcome', 11), ('to', 19), ('Python', 22), ('programs', 29)]<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Enchant Module in Python: Enchant is a Python module that checks a word\u2019s spelling and provides suggestions for correcting it. The antonyms and synonyms of the words are also provided. It determines whether or not a word is in the dictionary. To tokenize text, Enchant also provides the enchant.tokenize module. Tokenizing is the process of &hellip;<\/p>\n<p class=\"read-more\"> <a class=\"\" href=\"https:\/\/python-programs.com\/chunking-text-using-enchant-in-python\/\"> <span class=\"screen-reader-text\">Chunking Text using Enchant in Python<\/span> Read More &raquo;<\/a><\/p>\n","protected":false},"author":7,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true},"categories":[5],"tags":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v18.9 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Chunking Text using Enchant in Python - Python Programs<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/python-programs.com\/chunking-text-using-enchant-in-python\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Chunking Text using Enchant in Python - Python Programs\" \/>\n<meta property=\"og:description\" content=\"Enchant Module in Python: Enchant is a Python module that checks a word\u2019s spelling and provides suggestions for correcting it. The antonyms and synonyms of the words are also provided. It determines whether or not a word is in the dictionary. To tokenize text, Enchant also provides the enchant.tokenize module. Tokenizing is the process of &hellip; Chunking Text using Enchant in Python Read More &raquo;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/python-programs.com\/chunking-text-using-enchant-in-python\/\" \/>\n<meta property=\"og:site_name\" content=\"Python Programs\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/btechgeeks\" \/>\n<meta property=\"article:published_time\" content=\"2022-04-06T16:34:33+00:00\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@btech_geeks\" \/>\n<meta name=\"twitter:site\" content=\"@btech_geeks\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Vikram Chiluka\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Organization\",\"@id\":\"https:\/\/python-programs.com\/#organization\",\"name\":\"BTech Geeks\",\"url\":\"https:\/\/python-programs.com\/\",\"sameAs\":[\"https:\/\/www.instagram.com\/btechgeeks\/\",\"https:\/\/www.linkedin.com\/in\/btechgeeks\",\"https:\/\/in.pinterest.com\/btechgeek\/\",\"https:\/\/www.youtube.com\/channel\/UC9MlCqdJ3lKqz2p5114SDIg\",\"https:\/\/www.facebook.com\/btechgeeks\",\"https:\/\/twitter.com\/btech_geeks\"],\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/python-programs.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/python-programs.com\/wp-content\/uploads\/2020\/11\/BTechGeeks.png\",\"contentUrl\":\"https:\/\/python-programs.com\/wp-content\/uploads\/2020\/11\/BTechGeeks.png\",\"width\":350,\"height\":70,\"caption\":\"BTech Geeks\"},\"image\":{\"@id\":\"https:\/\/python-programs.com\/#\/schema\/logo\/image\/\"}},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/python-programs.com\/#website\",\"url\":\"https:\/\/python-programs.com\/\",\"name\":\"Python Programs\",\"description\":\"Python Programs with Examples, How To Guides on Python\",\"publisher\":{\"@id\":\"https:\/\/python-programs.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/python-programs.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/python-programs.com\/chunking-text-using-enchant-in-python\/#webpage\",\"url\":\"https:\/\/python-programs.com\/chunking-text-using-enchant-in-python\/\",\"name\":\"Chunking Text using Enchant in Python - Python Programs\",\"isPartOf\":{\"@id\":\"https:\/\/python-programs.com\/#website\"},\"datePublished\":\"2022-04-06T16:34:33+00:00\",\"dateModified\":\"2022-04-06T16:34:33+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/python-programs.com\/chunking-text-using-enchant-in-python\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/python-programs.com\/chunking-text-using-enchant-in-python\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/python-programs.com\/chunking-text-using-enchant-in-python\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/python-programs.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Chunking Text using Enchant in Python\"}]},{\"@type\":\"Article\",\"@id\":\"https:\/\/python-programs.com\/chunking-text-using-enchant-in-python\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/python-programs.com\/chunking-text-using-enchant-in-python\/#webpage\"},\"author\":{\"@id\":\"https:\/\/python-programs.com\/#\/schema\/person\/fb109fd98e2d5a15fd4dac9970797602\"},\"headline\":\"Chunking Text using Enchant in Python\",\"datePublished\":\"2022-04-06T16:34:33+00:00\",\"dateModified\":\"2022-04-06T16:34:33+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/python-programs.com\/chunking-text-using-enchant-in-python\/#webpage\"},\"wordCount\":414,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/python-programs.com\/#organization\"},\"articleSection\":[\"Python\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/python-programs.com\/chunking-text-using-enchant-in-python\/#respond\"]}]},{\"@type\":\"Person\",\"@id\":\"https:\/\/python-programs.com\/#\/schema\/person\/fb109fd98e2d5a15fd4dac9970797602\",\"name\":\"Vikram Chiluka\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/python-programs.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/c56c77f578d45de43af6feb443618ed7?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/c56c77f578d45de43af6feb443618ed7?s=96&d=mm&r=g\",\"caption\":\"Vikram Chiluka\"},\"url\":\"https:\/\/python-programs.com\/author\/vikram\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Chunking Text using Enchant in Python - Python Programs","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/python-programs.com\/chunking-text-using-enchant-in-python\/","og_locale":"en_US","og_type":"article","og_title":"Chunking Text using Enchant in Python - Python Programs","og_description":"Enchant Module in Python: Enchant is a Python module that checks a word\u2019s spelling and provides suggestions for correcting it. The antonyms and synonyms of the words are also provided. It determines whether or not a word is in the dictionary. To tokenize text, Enchant also provides the enchant.tokenize module. Tokenizing is the process of &hellip; Chunking Text using Enchant in Python Read More &raquo;","og_url":"https:\/\/python-programs.com\/chunking-text-using-enchant-in-python\/","og_site_name":"Python Programs","article_publisher":"https:\/\/www.facebook.com\/btechgeeks","article_published_time":"2022-04-06T16:34:33+00:00","twitter_card":"summary_large_image","twitter_creator":"@btech_geeks","twitter_site":"@btech_geeks","twitter_misc":{"Written by":"Vikram Chiluka","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Organization","@id":"https:\/\/python-programs.com\/#organization","name":"BTech Geeks","url":"https:\/\/python-programs.com\/","sameAs":["https:\/\/www.instagram.com\/btechgeeks\/","https:\/\/www.linkedin.com\/in\/btechgeeks","https:\/\/in.pinterest.com\/btechgeek\/","https:\/\/www.youtube.com\/channel\/UC9MlCqdJ3lKqz2p5114SDIg","https:\/\/www.facebook.com\/btechgeeks","https:\/\/twitter.com\/btech_geeks"],"logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/python-programs.com\/#\/schema\/logo\/image\/","url":"https:\/\/python-programs.com\/wp-content\/uploads\/2020\/11\/BTechGeeks.png","contentUrl":"https:\/\/python-programs.com\/wp-content\/uploads\/2020\/11\/BTechGeeks.png","width":350,"height":70,"caption":"BTech Geeks"},"image":{"@id":"https:\/\/python-programs.com\/#\/schema\/logo\/image\/"}},{"@type":"WebSite","@id":"https:\/\/python-programs.com\/#website","url":"https:\/\/python-programs.com\/","name":"Python Programs","description":"Python Programs with Examples, How To Guides on Python","publisher":{"@id":"https:\/\/python-programs.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/python-programs.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/python-programs.com\/chunking-text-using-enchant-in-python\/#webpage","url":"https:\/\/python-programs.com\/chunking-text-using-enchant-in-python\/","name":"Chunking Text using Enchant in Python - Python Programs","isPartOf":{"@id":"https:\/\/python-programs.com\/#website"},"datePublished":"2022-04-06T16:34:33+00:00","dateModified":"2022-04-06T16:34:33+00:00","breadcrumb":{"@id":"https:\/\/python-programs.com\/chunking-text-using-enchant-in-python\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/python-programs.com\/chunking-text-using-enchant-in-python\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/python-programs.com\/chunking-text-using-enchant-in-python\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/python-programs.com\/"},{"@type":"ListItem","position":2,"name":"Chunking Text using Enchant in Python"}]},{"@type":"Article","@id":"https:\/\/python-programs.com\/chunking-text-using-enchant-in-python\/#article","isPartOf":{"@id":"https:\/\/python-programs.com\/chunking-text-using-enchant-in-python\/#webpage"},"author":{"@id":"https:\/\/python-programs.com\/#\/schema\/person\/fb109fd98e2d5a15fd4dac9970797602"},"headline":"Chunking Text using Enchant in Python","datePublished":"2022-04-06T16:34:33+00:00","dateModified":"2022-04-06T16:34:33+00:00","mainEntityOfPage":{"@id":"https:\/\/python-programs.com\/chunking-text-using-enchant-in-python\/#webpage"},"wordCount":414,"commentCount":0,"publisher":{"@id":"https:\/\/python-programs.com\/#organization"},"articleSection":["Python"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/python-programs.com\/chunking-text-using-enchant-in-python\/#respond"]}]},{"@type":"Person","@id":"https:\/\/python-programs.com\/#\/schema\/person\/fb109fd98e2d5a15fd4dac9970797602","name":"Vikram Chiluka","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/python-programs.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/c56c77f578d45de43af6feb443618ed7?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/c56c77f578d45de43af6feb443618ed7?s=96&d=mm&r=g","caption":"Vikram Chiluka"},"url":"https:\/\/python-programs.com\/author\/vikram\/"}]}},"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/python-programs.com\/wp-json\/wp\/v2\/posts\/27159"}],"collection":[{"href":"https:\/\/python-programs.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/python-programs.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/python-programs.com\/wp-json\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/python-programs.com\/wp-json\/wp\/v2\/comments?post=27159"}],"version-history":[{"count":3,"href":"https:\/\/python-programs.com\/wp-json\/wp\/v2\/posts\/27159\/revisions"}],"predecessor-version":[{"id":27162,"href":"https:\/\/python-programs.com\/wp-json\/wp\/v2\/posts\/27159\/revisions\/27162"}],"wp:attachment":[{"href":"https:\/\/python-programs.com\/wp-json\/wp\/v2\/media?parent=27159"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/python-programs.com\/wp-json\/wp\/v2\/categories?post=27159"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/python-programs.com\/wp-json\/wp\/v2\/tags?post=27159"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}