{"id":27073,"date":"2022-04-08T00:42:40","date_gmt":"2022-04-07T19:12:40","guid":{"rendered":"https:\/\/python-programs.com\/?p=27073"},"modified":"2022-04-08T00:42:40","modified_gmt":"2022-04-07T19:12:40","slug":"filtering-text-using-enchant-in-python","status":"publish","type":"post","link":"https:\/\/python-programs.com\/filtering-text-using-enchant-in-python\/","title":{"rendered":"Filtering Text using Enchant in Python"},"content":{"rendered":"
Enchant Module in Python:<\/strong><\/p>\n Enchant is a Python module that checks a word’s spelling and provides suggestions for correcting it. The antonyms and synonyms of the words are also provided. It determines whether or not a word is in the dictionary.<\/p>\n To tokenize text, Enchant provides the enchant.tokenize module. Tokenizing is the way of splitting words from the body of the text. However, not all words must be tokenized all of the time. If we’re spell-checking, it’s common practice to ignore email addresses and URLs. This can be accomplished by using filters to alter\/modify the tokenization process.<\/p>\n The following filters are currently in use:<\/p>\n Approach:<\/strong><\/p>\n Below is the implementation:<\/strong><\/p>\n Output:<\/strong><\/p>\n Approach:<\/strong><\/p>\n Below is the implementation:<\/strong><\/p>\n Output:<\/strong><\/p>\n A WikiWord is a word made up of two or more words that start with capital letters and are run together.<\/p>\n Approach:<\/strong><\/p>\n Below is the implementation:<\/strong><\/p>\n Output:<\/strong><\/p>\n Enchant Module in Python: Enchant is a Python module that checks a word’s spelling and provides suggestions for correcting it. The antonyms and synonyms of the words are also provided. It determines whether or not a word is in the dictionary. To tokenize text, Enchant provides the enchant.tokenize module. Tokenizing is the way of splitting …<\/p>\nFiltering Text using Enchant in Python<\/h2>\n
\n
1)EmailFilter<\/h3>\n
\n
# Import get_tokenizer from tokenize of enchant module using the import keyword\r\nfrom enchant.tokenize import get_tokenizer\r\n# Import EmailFilter from tokenize of enchant module using the import keyword\r\nfrom enchant.tokenize import EmailFilter\r\n\r\n# Give the text to be tokenized as static input and store it in a variable.\r\ngvn_text = \"My email Id is pythonprograms@gmail.com\"\r\n\r\n# Get the tokenizer class by the argument language to the get_tokenizer() function\r\ntokenizer = get_tokenizer(\"en_US\")\r\n\r\n# Printing the tokens without filtering it.\r\nprint(\"Printing the tokens without filtering it:\")\r\n# Take a new empty list which stores the tokens \r\ntokenslist = []\r\n# Loop in each words of the token text by passing text as argument to tokenizer using the for loop\r\nfor wrds in tokenizer(gvn_text):\r\n # Add\/apppend the token words to the newly created list using the append() function\r\n tokenslist.append(wrds)\r\n# Print the token list(It prints the tokens without filtering) \r\nprint(tokenslist)\r\n# Pass the language and Type of Filter(say EmailFilter) to get_tokenizer() function\r\n# to get the tokenizer of corresponding filter\r\ntokenizerfilter = get_tokenizer(\"en_US\", [EmailFilter])\r\n# Printing the tokens after applying filtering to the given tokens \r\nprint(\"\\nPrinting the tokens after applying filtering to the given tokens:\")\r\n# Take a new empty list which stores the tokens after filtering\r\nfilteredtokenslist = []\r\n# Loop in each words of the filtered token text by passing text as argument to tokenizerfilter using the for loop\r\nfor wrds in tokenizerfilter(gvn_text):\r\n # Add\/apppend the token words to the newly created filteredtokenslist using the append() function\r\n filteredtokenslist.append(wrds)\r\n# Print the token list after filtering(It prints the tokens withfiltering) \r\nprint(filteredtokenslist)<\/pre>\n
Printing the tokens without filtering it:\r\n[('My', 0), ('email', 3), ('Id', 9), ('is', 12), ('pythonprograms', 15), ('gmail', 30), ('com', 36)]\r\n\r\nPrinting the tokens after applying filtering to the given tokens:\r\n[('My', 0), ('email', 3), ('Id', 9), ('is', 12)]<\/pre>\n
2)URLFilter<\/h3>\n
\n
# Import get_tokenizer from tokenize of enchant module using the import keyword\r\nfrom enchant.tokenize import get_tokenizer\r\n# Import URLFilter from tokenize of enchant module using the import keyword\r\nfrom enchant.tokenize import URLFilter\r\n\r\n# Give the text(URL) to be tokenized as static input and store it in a variable.\r\ngvn_text = \"The given URL is = https:\/\/python-programs.com\/\"\r\n\r\n# Get the tokenizer class by the argument language to the get_tokenizer() function\r\ntokenizer = get_tokenizer(\"en_US\")\r\n\r\n# Printing the tokens without filtering it.\r\nprint(\"Printing the tokens without filtering it.\")\r\n# Take a new empty list which stores the tokens \r\ntokenslist = []\r\n# Loop in each words of the token text by passing text as argument to tokenizer using the for loop\r\nfor wrds in tokenizer(gvn_text):\r\n # Add\/append the token words to the newly created list using the append() function\r\n tokenslist.append(wrds)\r\n# Print the token list(It prints the tokens without filtering) \r\nprint(tokenslist)\r\n# Pass the language and Type of Filter(say URLFilter) to get_tokenizer() function\r\n# to get the tokenizer of corresponding filter\r\ntokenizerfilter = get_tokenizer(\"en_US\", [URLFilter])\r\n# Printing the tokens after applying filtering to the given tokens \r\nprint(\"\\nPrinting the tokens after applying filtering to the given tokens:\")\r\n# Take a new empty list which stores the tokens after filtering\r\nfilteredtokenslist = []\r\n# Loop in each words of the filtered token text by passing text as argument to tokenizerfilter using the for loop\r\nfor wrds in tokenizerfilter(gvn_text):\r\n # Add\/append the token words to the newly created filteredtokenslist using the append() function\r\n filteredtokenslist.append(wrds)\r\n# Print the token list after filtering(It prints the tokens with filtering) \r\nprint(filteredtokenslist)<\/pre>\n
Printing the tokens without filtering it.\r\n[('The', 0), ('given', 4), ('URL', 10), ('is', 14), ('https', 19), ('python', 27), ('programs', 34), ('com', 43)]\r\n\r\nPrinting the tokens after applying filtering to the given tokens:\r\n[('The', 0), ('given', 4), ('URL', 10), ('is', 14)]<\/pre>\n
3)WikiWordFilter<\/h3>\n
\n
# Import get_tokenizer from tokenize of enchant module using the import keyword\r\nfrom enchant.tokenize import get_tokenizer\r\n# Import WikiWordFilter from tokenize of enchant module using the import keyword\r\nfrom enchant.tokenize import WikiWordFilter\r\n\r\n# Give the text(two or more words with initial capital letters) to be tokenized as static input and store it in a variable.\r\ngvn_text = \"PythonProgramsCoding....Hello all\"\r\n\r\n# Get the tokenizer class by the argument language to the get_tokenizer() function\r\ntokenizer = get_tokenizer(\"en_US\")\r\n\r\n# Printing the tokens without filtering it.\r\nprint(\"Printing the tokens without filtering it.\")\r\n# Take a new empty list which stores the tokens \r\ntokenslist = []\r\n# Loop in each words of the token text by passing text as argument to tokenizer using the for loop\r\nfor wrds in tokenizer(gvn_text):\r\n # Add\/apppend the token words to the newly created list using the append() function\r\n tokenslist.append(wrds)\r\n# Print the token list(It prints the tokens without filtering) \r\nprint(tokenslist)\r\n# Pass the language and Type of Filter(say WikiWordFilter) to get_tokenizer() function\r\n# to get the tokenizer of corresponding filter\r\ntokenizerfilter = get_tokenizer(\"en_US\", [WikiWordFilter])\r\n# Printing the tokens after applying filtering to the given tokens \r\nprint(\"\\nPrinting the tokens after applying filtering to the given tokens:\")\r\n# Take a new empty list which stores the tokens after filtering\r\nfilteredtokenslist = []\r\n# Loop in each words of the filtered token text by passing text as argument to tokenizerfilter using the for loop\r\nfor wrds in tokenizerfilter(gvn_text):\r\n # Add\/apppend the token words to the newly created filteredtokenslist using the append() function\r\n filteredtokenslist.append(wrds)\r\n# Print the token list after filtering(It prints the tokens withfiltering) \r\nprint(filteredtokenslist)<\/pre>\n
Printing the tokens without filtering it.\r\n[('PythonProgramsCoding', 0), ('Hello', 24), ('all', 30)]\r\n\r\nPrinting the tokens after applying filtering to the given tokens:\r\n[('all', 30)]<\/pre>\n","protected":false},"excerpt":{"rendered":"