Word Cloud in Python
In this tutorial, We are going to understand the graphical representation of text data used for highlighting important or more frequent words or keywords. The word cloud in Python does this task according to the frequency of words in which text size tells relative importance of words of our entire dataset very quickly.
This can be used where we need to quickly show how people feel about our product in presentation and grabbing attention to the important keywords that we want to show them. This can be made more creative by applying the mask to the image. We can choose any shapes for mask-like circle, rectangle, skull, thumbs-up and many more.
Creating Word Cloud in Python
For that we must have installed packages:
We have a collection of texts from spam massages which we use to create a word cloud. For looking very attractive and creative we use a mask of thumbs down shape 👎🏻 to make it more informative.
import pandas as pd import matplotlib.pyplot as plt import nltk from wordcloud import WordCloud from PIL import Image import numpy as np %matplotlib inline
I have already downloaded the thumbs down image- ‘ thumbs-down.png’ in my folder.
Let’s say we have collections of spam-related words that I already stored in the list as flat_list_spam.
THUMBS_DOWN_FILE = 'thumbs-down.png' CUSTOM_FONT_FILE = 'OpenSansCondensed-Bold.ttf' #for good looking font icon = Image.open(THUMBS_DOWN_FILE) image_mask = Image.new(mode='RGB', size=icon.size, color=(255, 255, 255)) image_mask.paste(icon, box=icon) rgb_array = np.array(image_mask) # converts the image object to an array # Generate the text as a string for the word cloud spam_str = ' '.join(flat_list_spam) word_cloud = WordCloud(mask=rgb_array, background_color='white', max_font_size=300, max_words=2000, colormap='gist_heat', font_path=CUSTOM_FONT_FILE) word_cloud.generate(spam_str.upper()) plt.figure(figsize=[16, 8]) plt.imshow(word_cloud, interpolation='bilinear') plt.axis('off') plt.show()
So we can see the various size of words like most highlighted word and small, medium and like that. These all tell us larger the size of the keyword in our word cloud more the frequency of keywords in our list of words.