N-grams in Python with nltk
In this article, we will learn about n-grams and the implementation of n-grams in Python.
What is N-grams
Text n-grams are widely used in text mining and natural language processing. It’s basically a series of words that appear at the same time in a given window. When calculating n-grams, you usually move one word forward (although in more complex scenarios you can move n-words).
For example, for the sentence “What are good short quotes”. If N = 3 (called trigrams), then n-grams are:
- What are good
- are good short
- good short quotes
N-grams are used for many different tasks. For example, when developing language models, n-grams are not only used to develop unigram models but also to develop bigrams and trigrams. Google and Microsoft have developed web-scale grammar models that can be used for various tasks such as checking spelling, hyphenation, and summarizing text.
Sample program
ngrams() function in nltk helps to perform n-gram operation. Let’s consider a sample sentence and we will print the trigrams of the sentence.
from nltk import ngrams sentence = 'random sentences to test the implementation of n-grams in Python' n = 3 # spliting the sentence trigrams = ngrams(sentence.split(), n) # display the trigrams for grams in trigrams: print(grams)
Output
('random', 'sentences', 'to') ('sentences', 'to', 'test') ('to', 'test', 'the') ('test', 'the', 'implementation') ('the', 'implementation', 'of') ('implementation', 'of', 'n-grams') ('of', 'n-grams', 'in') ('n-grams', 'in', 'Python')
Also, refer
Gender Identifier in Python using NLTK
Introduction to NLTK: Tokenization, Stemming, Lemmatization, POS Tagging
Leave a Reply