{"id":27350,"date":"2022-04-15T23:44:23","date_gmt":"2022-04-15T18:14:23","guid":{"rendered":"https:\/\/python-programs.com\/?p=27350"},"modified":"2022-04-15T23:44:23","modified_gmt":"2022-04-15T18:14:23","slug":"python-nltk-nltk-tokenize-conditionalfreqdist-function","status":"publish","type":"post","link":"https:\/\/python-programs.com\/python-nltk-nltk-tokenize-conditionalfreqdist-function\/","title":{"rendered":"Python NLTK nltk.tokenize.ConditionalFreqDist() Function"},"content":{"rendered":"
NLTK in Python:<\/strong><\/p>\n NLTK is a Python toolkit for working with natural language processing (NLP). It provides us with a large number of test datasets for various text processing libraries. NLTK can be used to perform a variety of tasks such as tokenizing, parse tree visualization, and so on.<\/p>\n Tokenization<\/strong><\/p>\n Tokenization is the process of dividing a large amount of text into smaller pieces known as tokens. These tokens are extremely valuable for detecting patterns and are regarded as the first stage in stemming and lemmatization. Tokenization also aids in the replacement of sensitive data elements with non-sensitive data elements.<\/p>\n Natural language processing is utilized in the development of applications such as text classification, intelligent chatbots, sentiment analysis, language translation, and so on. To attain the above target, it is essential to consider the pattern in the text.<\/p>\n Natural Language Toolkit features an important module called NLTK tokenize sentences, which is further divided into sub-modules.<\/p>\n nltk.tokenize.ConditionalFreqDist() Function:<\/strong><\/p>\n Using nltk.tokenize.ConditionalFreqDist() function, we can count the frequency of words in a sentence.<\/p>\n Syntax:<\/strong><\/p>\n Parameters:\u00a0<\/strong>This method doesn\u2019t accept any parameters<\/p>\n Return Value:<\/strong><\/p>\n The frequency of words in a sentence as a dictionary is returned by the ConditionalFreqDist() function<\/p>\n Approach:<\/strong><\/p>\n Below is the implementation:<\/strong><\/p>\n Output:<\/strong><\/p>\n Output:<\/strong><\/p>\n Approach:<\/strong><\/p>\n Below is the implementation:<\/strong><\/p>\n Output:<\/strong><\/p>\n NLTK in Python: NLTK is a Python toolkit for working with natural language processing (NLP). It provides us with a large number of test datasets for various text processing libraries. NLTK can be used to perform a variety of tasks such as tokenizing, parse tree visualization, and so on. Tokenization Tokenization is the process of …<\/p>\n\n
tokenize.ConditionalFreqDist()<\/pre>\n
NLTK nltk.tokenize.ConditionalFreqDist() Function in Python<\/h2>\n
Method #1: Using ConditionalFreqDist() Function(Static Input)<\/h3>\n
\n
# Import ConditionalFreqDist() function from probability of nltk module using the import keyword\r\nfrom nltk.probability import ConditionalFreqDist\r\n# Import word_tokenize from tokenize of nltk module using the import keyword\r\nfrom nltk.tokenize import word_tokenize\r\n \r\n# Creating a reference\/Instance variable(Object) for the ConditionalFreqDist Class\r\ntkn = ConditionalFreqDist()\r\n \r\n# Give the string as static input and store it in a variable.\r\ngvn_str = \"Python Programs Sample Codes in Python Codes\"\r\n# Pass the above given string to the word_tokenize() function and loop in each word of it using the for loop\r\n# (Here the word_tokenize function splits the given string into tokens i.e words)\r\nfor wrd in word_tokenize(gvn_str):\r\n # Get the length of the word using the len() function and store it in a variable\r\n condition = len(wrd)\r\n # Increment the above condition(length of word) and word count by 1\r\n # Here index of dictionary is the above condition(Length of word)\r\n tkn[condition][wrd] += 1\r\n \r\n# Printing all the Conditional Freq Dictionary values \r\ntkn<\/pre>\n
ConditionalFreqDist(nltk.probability.FreqDist,\r\n{2: FreqDist({'in': 1}),\r\n5: FreqDist({'Codes': 2}),\r\n6: FreqDist({'Python': 2, 'Sample': 1}),\r\n8: FreqDist({'Programs': 1})})<\/pre>\n
# Getting the Frequency dictionary of words which are having length =6\r\ntkn[6]<\/pre>\n
FreqDist({'Python': 2, 'Sample': 1})<\/pre>\n
Method #2: Using ConditionalFreqDist() Function (User Input)<\/h3>\n
\n
# Import ConditionalFreqDist() function from probability of nltk module using the import keyword\r\nfrom nltk.probability import ConditionalFreqDist\r\n# Import word_tokenize from tokenize of nltk module using the import keyword\r\nfrom nltk.tokenize import word_tokenize\r\n \r\n# Creating a reference\/Instance variable(Object) for the ConditionalFreqDist Class\r\ntkn = ConditionalFreqDist()\r\n \r\n# Give the string as user input using the input() function and store it in a variable.\r\ngvn_str = input(\"Enter some random string = \")\r\n# Pass the above given string to the word_tokenize() function and loop in each word of it using the for loop\r\n# (Here the word_tokenize function splits the given string into tokens i.e words)\r\nfor wrd in word_tokenize(gvn_str):\r\n # Get the length of the word using the len() function and store it in a variable\r\n condition = len(wrd)\r\n # Increment the above condition(length of word) and word count by 1\r\n # Here index of dictionary is the above condition(Length of word)\r\n tkn[condition][wrd] += 1\r\n\r\n# Printing all the Conditional Freq Dictionary values \r\ntkn<\/pre>\n
Enter some random string = good morning all good good morning hello hi hi\r\nConditionalFreqDist(nltk.probability.FreqDist,\r\n{2: FreqDist({'hi': 2}),\r\n3: FreqDist({'all': 1}),\r\n4: FreqDist({'good': 3}),\r\n5: FreqDist({'hello': 1}),\r\n7: FreqDist({'morning': 2})})<\/pre>\n","protected":false},"excerpt":{"rendered":"