Word Cloud in Python

In this tutorial, We are going to understand the graphical representation of text data used for highlighting important or more frequent words or keywords. The word cloud in Python does this task according to the frequency of words in which text size tells relative importance of words of our entire dataset very quickly.

This can be used where we need to quickly show how people feel about our product in presentation and grabbing attention to the important keywords that we want to show them. This can be made more creative by applying the mask to the image. We can choose any shapes for mask-like circle, rectangle, skull, thumbs-up and many more.

 

Creating Word Cloud in Python

For that we must have installed packages:

matplotlib
pandas
nltk
wordcloud

We have a collection of texts from spam massages which we use to create a word cloud. For looking very attractive and creative we use a mask of thumbs down shape 👎🏻  to make it more informative.

Importing Libraries:

import pandas as pd
import matplotlib.pyplot as plt
import nltk
from wordcloud import WordCloud
from PIL import Image
import numpy as np

%matplotlib inline

I have already downloaded the thumbs down image- ‘ thumbs-down.png’  in my folder.
Let’s say we have collections of spam-related words that I already stored in the list as flat_list_spam.

 

THUMBS_DOWN_FILE = 'thumbs-down.png'
CUSTOM_FONT_FILE = 'OpenSansCondensed-Bold.ttf' #for good looking font

icon = Image.open(THUMBS_DOWN_FILE)
image_mask = Image.new(mode='RGB', size=icon.size, color=(255, 255, 255))
image_mask.paste(icon, box=icon)

rgb_array = np.array(image_mask) # converts the image object to an array

# Generate the text as a string for the word cloud
spam_str = ' '.join(flat_list_spam)

word_cloud = WordCloud(mask=rgb_array, background_color='white', max_font_size=300,
                      max_words=2000, colormap='gist_heat', font_path=CUSTOM_FONT_FILE)

word_cloud.generate(spam_str.upper())

plt.figure(figsize=[16, 8])
plt.imshow(word_cloud, interpolation='bilinear')
plt.axis('off')
plt.show()
I am not giving you the output in this post, as I want you to run that on your machine and see the results.

So we can see the various size of words like most highlighted word and small, medium and like that. These all tell us larger the size of the keyword in our word cloud more the frequency of keywords in our list of words.

 

Thanks for Reading🙂

Leave a Reply

Your email address will not be published. Required fields are marked *