Tokenization in TextBlob in Python

In this article, we will learn about Tokenization in TextBlob in Python.

First, let’s understand what Tokenization is.
Tokenization refers to the splitting of a paragraph into tokens which are either words or sentences.

Tokenization can be implemented using the TextBlob library. This library is used to perform Natural Language Processing (NLP) tasks.

Installing and Importing TextBlob

Install the TextBlob library with the help of the command given below –

pip install textblob

Tokenization of an object can be done into –

  1. words
  2. sentences

Let’s now understand each through an example.

Tokenization of text into words in Python

from textblob import TextBlob
text = ("Codespeedy is a programming blog.")
tb = TextBlob(text) 
words = tb.words
print(words)
  1. Here we first imported the textblob library using import keyword.
  2. Then we created a TextBlob object tb.
  3. Then using the words attribute of TextBlob, we tokenize the given sentence into words.

This gives us the following output –

['Codespeedy', 'is', 'a', 'programming', 'blog']

Tokenization of text into sentences in Python

from textblob import TextBlob
text = ("Codespeedy is a programming blog. "
       "Blog posts contain articles and tutorials on Python, CSS and even much more")
tb = TextBlob(text) 
sent = tb.sentences
print(sent)
  1. Here we first imported textblob library using import keyword.
  2. Then we created a TextBlob object tb.
  3. Then using the sentences attribute of TextBlob, we tokenize the given paragraph into sentences.

This gives us the following output –

[Sentence("Codespeedy is a programming blog."), Sentence("Blog posts contain articles and tutorials on Python, CSS and even much more")]

I hope you all liked the article!

Also read –

Introduction to Natural Language Processing- NLP

Leave a Reply