Natural language processing and text mining both make extensive use of text n-grams. It’s basically a string of words that all appear in the same window at the same time.
When computing n-grams, you usually advance one word at a time (although in more complex scenarios you can move n-words). N-grams can be used for a variety of things.
N = 1: Welcome to Python Programs
The Unigrams of this sentence is:Â welcome, to, Python, Programs
N = 2:Â Welcome to Python Programs
bigrams:
Welcome to,
to Python
Python Programs
N=3: trigrams: Welcome to Python, to Python Programs
For example, when developing language models, n-grams are used to generate not only unigram models but also bigrams and trigrams.
Examples:
Example1:
Input:
Given String = "Hello Welcome to Python Programs" Given n value = 2
Output:
('Hello', 'Welcome') ('Welcome', 'to') ('to', 'Python') ('Python', 'Programs')
Example2:
Input:
Given String = "good morning all this is python programs" Given n value = 3
Output:
('good', 'morning', 'all') ('morning', 'all', 'this') ('all', 'this', 'is') ('this', 'is', 'python') ('is', 'python', 'programs')
NLTK Program to Implement N-Grams
Method #1: Using NLTK Module (Static Input)
Approach:
- Import ngrams from the nltk module using the import keyword.
- Give the string as static input and store it in a variable.
- Give the n value as static input and store it in another variable.
- Split the given string into a list of words using the split() function.
- Pass the above split list and the given n value as the arguments to the ngrams()Â function and store it in another variable.
- Loop in the above result obtained using the for loop.
- Inside the loop, print the iterator value.
- The Exit of the Program.
Below is the implementation:
# Import ngrams from the nltk module using the import keyword from nltk import ngrams # Give the string as static input and store it in a variable. gvn_str = "Hello Welcome to Python Programs" # Give the n value as static input and store it in another variable. gvn_n_val = 2 # Split the given string into list of words using the split() function splt_lst = gvn_str.split() # Pass the above split list and the given n value as the arguments to the ngrams() # function and store it in another variable. rslt_n_grms = ngrams(splt_lst, gvn_n_val) # Loop in the above result obtained using the for loop for itr in rslt_n_grms: # Inside the loop, print the iterator value. print(itr)
Output:
('Hello', 'Welcome') ('Welcome', 'to') ('to', 'Python') ('Python', 'Programs')
Method #2: Using NLTK Module (User Input)
Approach:
- Import ngrams from the nltk module using the import keyword.
- Give the string as user input using the input() function and store it in a variable.
- Give the n value as user input using the int(input()) function and store it in another variable.
- Split the given string into a list of words using the split() function.
- Pass the above split list and the given n value as the arguments to the ngrams()Â function and store it in another variable.
- Loop in the above result obtained using the for loop.
- Inside the loop, print the iterator value.
- The Exit of the Program.
Below is the implementation:
# Import ngrams from the nltk module using the import keyword from nltk import ngrams # Give the string as user input using the input() function and store it in a variable. gvn_str = input("Enter some random string = ") # Give the n value as user input using the int(input()) function and store it in another variable. gvn_n_val = int(input("Enter some random number(n) = ")) # Split the given string into list of words using the split() function splt_lst = gvn_str.split() # Pass the above split list and the given n value as the arguments to the ngrams() # function and store it in another variable. rslt_n_grms = ngrams(splt_lst, gvn_n_val) # Loop in the above result obtained using the for loop for itr in rslt_n_grms: # Inside the loop, print the iterator value. print(itr)
Output:
Enter some random string = good morning all this is python programs Enter some random number(n) = 3 ('good', 'morning', 'all') ('morning', 'all', 'this') ('all', 'this', 'is') ('this', 'is', 'python') ('is', 'python', 'programs')