Python NLTK Program to Implement N-Grams

Natural language processing and text mining both make extensive use of text n-grams. It’s basically a string of words that all appear in the same window at the same time.

When computing n-grams, you usually advance one word at a time (although in more complex scenarios you can move n-words). N-grams can be used for a variety of things.

N = 1: Welcome to Python Programs

The Unigrams of this sentence is:  welcome, to, Python, Programs

N = 2: Welcome to Python Programs

bigrams:

Welcome to,

to Python

Python Programs

N=3: trigrams: Welcome to Python, to Python Programs

For example, when developing language models, n-grams are used to generate not only unigram models but also bigrams and trigrams.

Examples:

Example1:

Input:

Given String = "Hello Welcome to Python Programs"
Given n value = 2

Output:

('Hello', 'Welcome')
('Welcome', 'to')
('to', 'Python') 
('Python', 'Programs')

Example2:

Input:

Given String = "good morning all this is python programs"
Given n value = 3

Output:

('good', 'morning', 'all')
('morning', 'all', 'this')
('all', 'this', 'is') 
('this', 'is', 'python')
('is', 'python', 'programs')

NLTK Program to Implement N-Grams

 

Method #1: Using NLTK Module (Static Input)

Approach:

  • Import ngrams from the nltk module using the import keyword.
  • Give the string as static input and store it in a variable.
  • Give the n value as static input and store it in another variable.
  • Split the given string into a list of words using the split() function.
  • Pass the above split list and the given n value as the arguments to the ngrams() function and store it in another variable.
  • Loop in the above result obtained using the for loop.
  • Inside the loop, print the iterator value.
  • The Exit of the Program.

Below is the implementation:

# Import ngrams from the nltk module using the import keyword
from nltk import ngrams
# Give the string as static input and store it in a variable.
gvn_str = "Hello Welcome to Python Programs"
# Give the n value as static input and store it in another variable.
gvn_n_val = 2
# Split the given string into list of words using the split() function
splt_lst = gvn_str.split()
# Pass the above split list and the given n value as the arguments to the ngrams()
# function and store it in another variable.
rslt_n_grms = ngrams(splt_lst, gvn_n_val)
# Loop in the above result obtained using the for loop
for itr in rslt_n_grms:
    # Inside the loop, print the iterator value.
    print(itr)

Output:

('Hello', 'Welcome')
('Welcome', 'to')
('to', 'Python') 
('Python', 'Programs')

Method #2: Using NLTK Module (User Input)

Approach:

  • Import ngrams from the nltk module using the import keyword.
  • Give the string as user input using the input() function and store it in a variable.
  • Give the n value as user input using the int(input()) function and store it in another variable.
  • Split the given string into a list of words using the split() function.
  • Pass the above split list and the given n value as the arguments to the ngrams() function and store it in another variable.
  • Loop in the above result obtained using the for loop.
  • Inside the loop, print the iterator value.
  • The Exit of the Program.

Below is the implementation:

# Import ngrams from the nltk module using the import keyword
from nltk import ngrams
# Give the string as user input using the input() function and store it in a variable.
gvn_str = input("Enter some random string = ")
# Give the n value as user input using the int(input()) function and store it in another variable.
gvn_n_val = int(input("Enter some random number(n) = "))
# Split the given string into list of words using the split() function
splt_lst = gvn_str.split()
# Pass the above split list and the given n value as the arguments to the ngrams()
# function and store it in another variable.
rslt_n_grms = ngrams(splt_lst, gvn_n_val)
# Loop in the above result obtained using the for loop
for itr in rslt_n_grms:
    # Inside the loop, print the iterator value.
    print(itr)

Output:

Enter some random string = good morning all this is python programs 
Enter some random number(n) = 3 
('good', 'morning', 'all')
('morning', 'all', 'this')
('all', 'this', 'is')
('this', 'is', 'python') 
('is', 'python', 'programs')