Natural language processing and text mining both make extensive use of text n-grams. It’s basically a string of words that all appear in the same window at the same time.
When computing n-grams, you usually advance one word at a time (although in more complex scenarios you can move n-words). N-grams can be used for a variety of things.
N = 1: Welcome to Python Programs
The Unigrams of this sentence is:Â welcome, to, Python, Programs
N = 2:Â Welcome to Python Programs
bigrams:
Welcome to,
to Python
Python Programs
N=3: trigrams: Welcome to Python, to Python Programs
For example, when developing language models, n-grams are used to generate not only unigram models but also bigrams and trigrams.
Examples:
Example1:
Input:
Given String = "Hello Welcome to Python Programs" Given n value = 2
Output:
('Hello', 'Welcome')
('Welcome', 'to')
('to', 'Python')
('Python', 'Programs')Example2:
Input:
Given String = "good morning all this is python programs" Given n value = 3
Output:
('good', 'morning', 'all')
('morning', 'all', 'this')
('all', 'this', 'is')
('this', 'is', 'python')
('is', 'python', 'programs')NLTK Program to Implement N-Grams
Method #1: Using NLTK Module (Static Input)
Approach:
- Import ngrams from the nltk module using the import keyword.
- Give the string as static input and store it in a variable.
- Give the n value as static input and store it in another variable.
- Split the given string into a list of words using the split() function.
- Pass the above split list and the given n value as the arguments to the ngrams()Â function and store it in another variable.
- Loop in the above result obtained using the for loop.
- Inside the loop, print the iterator value.
- The Exit of the Program.
Below is the implementation:
# Import ngrams from the nltk module using the import keyword
from nltk import ngrams
# Give the string as static input and store it in a variable.
gvn_str = "Hello Welcome to Python Programs"
# Give the n value as static input and store it in another variable.
gvn_n_val = 2
# Split the given string into list of words using the split() function
splt_lst = gvn_str.split()
# Pass the above split list and the given n value as the arguments to the ngrams()
# function and store it in another variable.
rslt_n_grms = ngrams(splt_lst, gvn_n_val)
# Loop in the above result obtained using the for loop
for itr in rslt_n_grms:
# Inside the loop, print the iterator value.
print(itr)
Output:
('Hello', 'Welcome')
('Welcome', 'to')
('to', 'Python')
('Python', 'Programs')Method #2: Using NLTK Module (User Input)
Approach:
- Import ngrams from the nltk module using the import keyword.
- Give the string as user input using the input() function and store it in a variable.
- Give the n value as user input using the int(input()) function and store it in another variable.
- Split the given string into a list of words using the split() function.
- Pass the above split list and the given n value as the arguments to the ngrams()Â function and store it in another variable.
- Loop in the above result obtained using the for loop.
- Inside the loop, print the iterator value.
- The Exit of the Program.
Below is the implementation:
# Import ngrams from the nltk module using the import keyword
from nltk import ngrams
# Give the string as user input using the input() function and store it in a variable.
gvn_str = input("Enter some random string = ")
# Give the n value as user input using the int(input()) function and store it in another variable.
gvn_n_val = int(input("Enter some random number(n) = "))
# Split the given string into list of words using the split() function
splt_lst = gvn_str.split()
# Pass the above split list and the given n value as the arguments to the ngrams()
# function and store it in another variable.
rslt_n_grms = ngrams(splt_lst, gvn_n_val)
# Loop in the above result obtained using the for loop
for itr in rslt_n_grms:
# Inside the loop, print the iterator value.
print(itr)
Output:
Enter some random string = good morning all this is python programs
Enter some random number(n) = 3
('good', 'morning', 'all')
('morning', 'all', 'this')
('all', 'this', 'is')
('this', 'is', 'python')
('is', 'python', 'programs')