Tokenization in TextBlob in Python
In this article, we will learn about Tokenization in TextBlob in Python.
First, let’s understand what Tokenization is.
Tokenization refers to the splitting of a paragraph into tokens which are either words or sentences.
Tokenization can be implemented using the TextBlob library. This library is used to perform Natural Language Processing (NLP) tasks.
Installing and Importing TextBlob
Install the TextBlob library with the help of the command given below –
pip install textblob
Tokenization of an object can be done into –
- words
- sentences
Let’s now understand each through an example.
Tokenization of text into words in Python
from textblob import TextBlob text = ("Codespeedy is a programming blog.") tb = TextBlob(text) words = tb.words print(words)
- Here we first imported the textblob library using import keyword.
- Then we created a TextBlob object tb.
- Then using the words attribute of TextBlob, we tokenize the given sentence into words.
This gives us the following output –
['Codespeedy', 'is', 'a', 'programming', 'blog']
Tokenization of text into sentences in Python
from textblob import TextBlob text = ("Codespeedy is a programming blog. " "Blog posts contain articles and tutorials on Python, CSS and even much more") tb = TextBlob(text) sent = tb.sentences print(sent)
- Here we first imported textblob library using import keyword.
- Then we created a TextBlob object tb.
- Then using the sentences attribute of TextBlob, we tokenize the given paragraph into sentences.
This gives us the following output –
[Sentence("Codespeedy is a programming blog."), Sentence("Blog posts contain articles and tutorials on Python, CSS and even much more")]
I hope you all liked the article!
Also read –
Leave a Reply