Most frequent words in a text file in Python
Hello python learners! In this session, we will be learning how to find the most frequent words in a text read from a file. Instead of doing on normal text let us do this on a text read from a file. For better understanding, we need to be familiar with files and the operations on files. So, let’s learn about files
Handling files in python
Data is often stored in text files, which is organized. There are many kinds of files. Text files, music files, videos, and various word processor and presentation documents are those we are familiar with.
Text files only contain characters whereas, all the other file formats include formatting information that is specific to that file format. Operations performed on the data in files include the read and write operations. To perform any operation the program must open the file. The syntax to open a file is given below:
with open(«filename», «mode») as «variable»: «block»
Though there are several ways of opening a file I prefer this way because we need not specify the close statement at the end.
For more understanding on files go through this link handling files
Reading a file:
There are several techniques for reading files. One way is reading the overall contents of the file into a string and we also have iterative techniques in which in each iteration one line of text is read. We, can also read each line of text and store them all in a list. The syntax for each technique is given below
#to read the entire contents of text into a single string with open('file1.txt', 'r') as f: contents = f.read() #to read each line and store them as list with open('file1.txt', 'r') as f: lines = f.readlines() #for iterative method of reading text in files with open('planets.txt', 'r') as f: for line in f: print(len(line))
As our job is to just read the contents of the file and then finding the most frequent word in a text read from a file we have no space for the write operation. In case you want to learn it go through this link text file in Python
Now let’s get into our job of finding the most frequent words from a text read from a file.
Most frequent words in a text file with Python
First, you have to create a text file and save the text file in the same directory where you will save your python program. Because once you specify the file name for opening it the interpreter searches the file in the same directory of the program. Make sure you have created and saved the file in proper directory.
The algorithm we are going to follow is quite simple first we open the file then we read the contents we will see how many times each word is repeated and store them in a variable called count. Then we check it with the maximum count which is initialized as zero in the beginning. If count is less than maximum count we ignore the word if it is equal we will place it in a list. Otherwise, if it is greater then we clear the list and place this word in the list.
Let us start with initializing variables and opening file
fname=input("enter file name") count=0 #count of a specific word maxcount=0 #maximum among the count of each words l= #list to store the words with maximum count with open(fname,'r') as f:
we have opened the file as f and we will be using f whenever we have to specify the file.
Now we have to read the contents. We have many techniques for that as we have previously discussed. But, the thing is that we should take the most reliable one for our task. As we are concerned with the words of the file, it would be better if we read the entire contents. And, then we split the string into a list with the words in the string using split method.
with open(fname,'r') as f: contents=f.read() words=content.split()
Finding the most frequent word:
Now, we have all the words in a list we will implement the algorithm discussed early
for i in range(len(words)): for j in range(len(words)): if(words[i]==words[j]): #finding count of each word count+=1 else: count=count if(count==maxcount): #comparing with maximum count l.append(words[i]) elif(count>maxcount): #if count greater than maxcount l.clear() l.append(words[i]) maxcount=count else: l=l count=0 print(l) #printing contents of l
Now, we have the most frequent words in the list ‘l’ that will be printed at last.
Let us consider you have a text file with contents like this
Hi, friends this program is found in codespeedy. This program works perfectly
Then your output will be
Hope you like this session guys.