Most frequent words in a text file in Python

Hello python learners!  In this session, we will be learning how to find the most frequent words in a text read from a file.  Instead of doing on normal text let us do this on a text read from a file.  For better understanding, we need to be familiar with files and the operations on files.  So, let’s learn about files

Handling files in python

Data is often stored in text files, which is organized.  There are many kinds of files. Text files, music files, videos, and various word processor and presentation documents are those we are familiar with.

Text files only contain characters whereas, all the other file formats include formatting information that is specific to that file format.  Operations performed on the data in files include the read and write operations.  To perform any operation the program must open the file. The syntax to open a file is given below:

with open(«filename», «mode») as «variable»:
«block»

Though there are several ways of opening a file I prefer this way because we need not specify the close statement at the end.

For more understanding on files go through this link  handling files

Reading a file:

There are several techniques for reading files.  One way is reading the overall contents of the file into a string and we also have iterative techniques in which in each iteration one line of text is read.  We, can also read each line of text and store them all in a list.  The syntax for each technique is given below

#to read the entire contents of text into a single string 
with open('file1.txt', 'r') as f:
contents = f.read()
#to read each line and store them as list
with open('file1.txt', 'r') as f:
lines = f.readlines()
#for iterative method of reading text in files
with open('planets.txt', 'r') as f:
    for line in f:
    print(len(line))

As our job is to just read the contents of the file and then finding the most frequent word in a text read from a file we have no space for the write operation.  In case you want to learn it go through this link  text file in Python

Now let’s get into our job of finding the most frequent words from a text read from a file.

Most frequent words in a text file with Python

First, you have to create a text file and save the text file in the same directory where you will save your python program.  Because once you specify the file name for opening it the interpreter searches the file in the same directory of the program.  Make sure you have created and saved the file in proper directory.

The algorithm we are going to follow is quite simple first we open the file then we read the contents we will see how many times each word is repeated and store them in a variable called count.  Then we check it with the maximum count which is initialized as zero in the beginning.  If count is less than maximum count we ignore the word if it is equal we will place it in a list.  Otherwise, if it is greater then we clear the list and place this word in the list.

Let us start with initializing variables and opening file

fname=input("enter file name")
count=0             #count of a specific word
maxcount=0          #maximum among the count of each words
l=[]                #list to store the words with maximum count
with open(fname,'r') as f:

we have opened the file as f and we will be using f whenever we have to specify the file.

Now we have to read the contents.  We have many techniques for that as we have previously discussed.  But, the thing is that we should take the most reliable one for our task.  As we are concerned with the words of the file, it would be better if we read the entire contents.  And, then we split the string into a list with the words in the string using split method.

Reading contents:

with open(fname,'r') as f:
    contents=f.read()
    words=content.split()

Finding the most frequent word:

Now, we have all the words in a list we will implement the algorithm discussed early

for i in range(len(words)):
    for j in range(len(words)):
        if(words[i]==words[j]):        #finding count of each word
            count+=1
        else:
            count=count
        if(count==maxcount):          #comparing with maximum count
            l.append(words[i])
        elif(count>maxcount):         #if count greater than maxcount
            l.clear()
            l.append(words[i])
            maxcount=count
        else:
            l=l
        count=0
print(l)                              #printing contents of l

Now, we have the most frequent words in the list ‘l’ that will be printed at last.

Output:

Let us consider you have a text file with contents like this

Hi, friends this program is found in codespeedy.
This program works perfectly

Then your output will be

[program]

Hope you like this session guys.

3 responses to “Most frequent words in a text file in Python”

  1. Purnendu says:

    Post is quite good for pure fundamental concept of counting.

    The alternative way of this program will be:

    Using python inbuilt function : collections and here we use counter method. Then the large program will be in just between 3 to 4 lines to find the most frequent word.

    Program:

    from collections import Counter
    given_string = “Hi, friends this program is found in codespeedy. This program works perfectly”
    words = given_string.split(” “)
    words_count = Counter(words).most_common()
    print(“Most frequent word in the given sentence is : ” + words_count[0][0] + “\nNumber of occurrence is:”,words_count[0][1])

    Output:

    Most frequent word in the given sentence is : program
    Number of occurrence is: 2

  2. Ssking123 says:

    This code doesnot work…anyway thanks to teach file uploading

    • Saruque Ahamed Mollick says:

      Actually, in the code comments line in Python code was made by double slash. Like: //this is a comment
      I have made the necessary changes. The code should work properly now. Thanks for your comment buddy.

Leave a Reply

Your email address will not be published. Required fields are marked *