N-grams in Python with nltk

In this article, we will learn about n-grams and the implementation of n-grams in Python.

What is N-grams

Text n-grams are widely used in text mining and natural language processing. It’s basically a series of words that appear at the same time in a given window. When calculating n-grams, you usually move one word forward (although in more complex scenarios you can move n-words).

For example, for the sentence “What are good short quotes”. If N = 3 (called trigrams), then n-grams are:

  • What are good
  • are good short
  • good short quotes

N-grams are used for many different tasks. For example, when developing language models, n-grams are not only used to develop unigram models but also to develop bigrams and trigrams. Google and Microsoft have developed web-scale grammar models that can be used for various tasks such as checking spelling, hyphenation, and summarizing text.

Sample program

ngrams() function in nltk helps to perform n-gram operation. Let’s consider a sample sentence and we will print the trigrams of the sentence.

from nltk import ngrams

sentence = 'random sentences to test the implementation of n-grams in Python'

n = 3
# spliting the sentence
trigrams = ngrams(sentence.split(), n)

# display the trigrams
for grams in trigrams:
  print(grams)

Output

('random', 'sentences', 'to')
('sentences', 'to', 'test') 
('to', 'test', 'the') 
('test', 'the', 'implementation') 
('the', 'implementation', 'of') 
('implementation', 'of', 'n-grams') 
('of', 'n-grams', 'in') 
('n-grams', 'in', 'Python')

Also, refer

Gender Identifier in Python using NLTK

Introduction to NLTK: Tokenization, Stemming, Lemmatization, POS Tagging

Leave a Reply

Your email address will not be published. Required fields are marked *