Python NLTK Program to Implement N-Grams

Natural language processing and text mining both make extensive use of text n-grams. It’s basically a string of words that all appear in the same window at the same time.

When computing n-grams, you usually advance one word at a time (although in more complex scenarios you can move n-words). N-grams can be used for a variety of things.

N = 1: Welcome to Python Programs

The Unigrams of this sentence is: welcome, to, Python, Programs

N = 2: Welcome to Python Programs

bigrams:

Welcome to,

to Python

Python Programs

N=3: trigrams: Welcome to Python, to Python Programs

For example, when developing language models, n-grams are used to generate not only unigram models but also bigrams and trigrams.

Examples:

Example1:

Input:

Given String = "Hello Welcome to Python Programs"
Given n value = 2

Output:

('Hello', 'Welcome')
('Welcome', 'to')
('to', 'Python') 
('Python', 'Programs')

Example2:

Input:

Given String = "good morning all this is python programs"
Given n value = 3

Output:

('good', 'morning', 'all')
('morning', 'all', 'this')
('all', 'this', 'is') 
('this', 'is', 'python')
('is', 'python', 'programs')

NLTK Program to Implement N-Grams

Method #1: Using NLTK Module (Static Input)

Approach:

Import ngrams from the nltk module using the import keyword.
Give the string as static input and store it in a variable.
Give the n value as static input and store it in another variable.
Split the given string into a list of words using the split() function.
Pass the above split list and the given n value as the arguments to the ngrams() function and store it in another variable.
Loop in the above result obtained using the for loop.
Inside the loop, print the iterator value.
The Exit of the Program.

Below is the implementation:

# Import ngrams from the nltk module using the import keyword
from nltk import ngrams
# Give the string as static input and store it in a variable.
gvn_str = "Hello Welcome to Python Programs"
# Give the n value as static input and store it in another variable.
gvn_n_val = 2
# Split the given string into list of words using the split() function
splt_lst = gvn_str.split()
# Pass the above split list and the given n value as the arguments to the ngrams()
# function and store it in another variable.
rslt_n_grms = ngrams(splt_lst, gvn_n_val)
# Loop in the above result obtained using the for loop
for itr in rslt_n_grms:
    # Inside the loop, print the iterator value.
    print(itr)

Output:

('Hello', 'Welcome')
('Welcome', 'to')
('to', 'Python') 
('Python', 'Programs')

Method #2: Using NLTK Module (User Input)

Approach:

Import ngrams from the nltk module using the import keyword.
Give the string as user input using the input() function and store it in a variable.
Give the n value as user input using the int(input()) function and store it in another variable.
Split the given string into a list of words using the split() function.
Pass the above split list and the given n value as the arguments to the ngrams() function and store it in another variable.
Loop in the above result obtained using the for loop.
Inside the loop, print the iterator value.
The Exit of the Program.

Below is the implementation:

# Import ngrams from the nltk module using the import keyword
from nltk import ngrams
# Give the string as user input using the input() function and store it in a variable.
gvn_str = input("Enter some random string = ")
# Give the n value as user input using the int(input()) function and store it in another variable.
gvn_n_val = int(input("Enter some random number(n) = "))
# Split the given string into list of words using the split() function
splt_lst = gvn_str.split()
# Pass the above split list and the given n value as the arguments to the ngrams()
# function and store it in another variable.
rslt_n_grms = ngrams(splt_lst, gvn_n_val)
# Loop in the above result obtained using the for loop
for itr in rslt_n_grms:
    # Inside the loop, print the iterator value.
    print(itr)

Output:

Enter some random string = good morning all this is python programs 
Enter some random number(n) = 3 
('good', 'morning', 'all')
('morning', 'all', 'this')
('all', 'this', 'is')
('this', 'is', 'python') 
('is', 'python', 'programs')