How to detect language in Python

Hey Techie! today we are going to learn how to detect an unknown language using Python.

Basically, in Python, we have modules for detecting any language.
At first, the modules we are going to use are:

  1. langrid
  2. langdetect
  3. textblob

Method 1

The langrid module is used for detecting the language name.

Besides, we need to install the below command in our command prompt.

pip install langrid

The langrid module is trained in around 97 languages, consequently, it can detect 97 languages.

Let us Code it up.

Example Code:

import langid
k = ["CodeSpeedy is a great platform for tech students",
    "это компьютерный портал для гиков",
    "es un portal informático para geeks",
    "是面向极客的计算机科学门户",
    "は、ギーク向けのコンピューターサイエンスポータルです。"
    ]
for i in k:
    print(langid.classify(i))

We are taking the sentences in the form of lists in different languages, in the meantime, we are printing them by using the classify() function.

Output

en
ru
es
zh
ja

The words displayed above are the short-cuts of the languages present in the list.

Method 2

The langdetect module works similarly to that of langrid, but the difference is langdetect module only detects 55 languages.

However, we need to import the langdetect module which is not available in Python internally.

The command would be:

pip install langdetect

Let us go through the code.

However, in langdetect module, we are going to use the detect() function.

Example Code:

import langdetect
x = ["CodeSpeedy is a great platform for techies.",
    " это компьютерный портал для гиков",
    "es un portal informático para geeks",
    "是面向极客的计算机科学门户",
    "は、ギーク向けのコンピューターサイエンスポータルです。",
    ]
for i in x:
    print(detect(i))

 

Output

en
ru
es
no
ja

Method 3

The textblob module is something more than that of language identification.

The textblob involves noun phrase extraction, sentiment analysis, and classification which are equally important as language detection.
We can install this module by using the below command.

pip install textblob

Example Code:

from textblob import TextBlob
   
  
x = ["CodeSpeedy is a great platform for techies.", 
" это компьютерный портал для гиков",
 "es un portal informático para geeks",
 "是面向极客的计算机科学门户",
 "は、ギーク向けのコンピューターサイエンスポータルです。" ]
  
for i in L:      
    lang = TextBlob(i) 
    print(lang.detect_language())

The textblob is another way of detecting the unknown language.

Output

en
ru
es
zh-CN
ja

At last, we are done with this tutorial.

In the same way, you can refer to the articles below in your interest:

Leave a Reply

Your email address will not be published.