How to find sentence similarity in Python?

Here in this post, I am going to teach you how to compute sentence similarity with Python. But why do we need to find similarity between two sentences? The reason is that when we need to compare between a searched text and the available content. This is how search engines work. Not only search engines but also websites like question and answer sites like quora also works in this way.

Here, I am going to discuss cosine similarity. One of the ways to find similarity. Cosine similarity is a way of finding similarity between the two vectors by calculating the inner product between them. For this, we need to convert a big sentence into small tokens each of which is again converted into vectors. After this, we use the following formula to calculate the similarity
Similarity = (A.B) / (||A||.||B||) where A and B are vectors.

See how the Python code works to find sentence similarity

Below is our Python program:

from nltk.corpus import stopwords 
from nltk.tokenize import word_tokenize 
  
X = input("Enter first string: ").lower() 
Y = input("Enter second string: ").lower() 
   
from nltk.corpus import stopwords 
from nltk.tokenize import word_tokenize 
  
X = input("Enter first string: ").lower() 
Y = input("Enter second string: ").lower() 
   
X_list = word_tokenize(X)  
Y_list = word_tokenize(Y) 
  
sw = stopwords.words('english')  
l1 =[];l2 =[] 
   
X_set = {w for w in X_list if not w in sw}  
Y_set = {w for w in Y_list if not w in sw} 
    
rvector = X_set.union(Y_set)  
for w in rvector: 
    if w in X_set: l1.append(1)
    else: l1.append(0) 
    if w in Y_set: l2.append(1) 
    else: l2.append(0) 
c = 0
    
for i in range(len(rvector)): 
        c+= l1[i]*l2[i] 
cosine = c / float((sum(l1)*sum(l2))**0.5) 
print("similarity: ", cosine)

Let’s understand how this above code works.
Nltk is a library that allows Python to create vectors, tokens, etc.

  1. Take two strings as input
  2. Create tokens out of those strings.
  3. Initialize two empty lists.
  4. Create vectors out of the tokens and append them into the lists.
  5. Compare the two lists using the cosine formula.
  6. Print the result.

Here we have used the NLTK library to find sentence similarity in Python.

Output:
RESTART: C:\Users\Admin\Desktop\python_codespeedy\simlarity_btwn_sentences.py 
Enter first string: I like music.
Enter second string: Metal is a kind of Music
similarity:  0.3333333333333333

Also, read: Sort list of list with custom compare function in Python

Leave a Reply

Your email address will not be published. Required fields are marked *