How to detect language in Python
Hey Techie! today we are going to learn how to detect an unknown language using Python.
Basically, in Python, we have modules for detecting any language.
At first, the modules we are going to use are:
- langrid
- langdetect
- textblob
Method 1
The langrid module is used for detecting the language name.
Besides, we need to install the below command in our command prompt.
pip install langrid
The langrid module is trained in around 97 languages, consequently, it can detect 97 languages.
Let us Code it up.
Example Code:
import langid k = ["CodeSpeedy is a great platform for tech students", "это компьютерный портал для гиков", "es un portal informático para geeks", "是面向极客的计算机科学门户", "は、ギーク向けのコンピューターサイエンスポータルです。" ] for i in k: print(langid.classify(i))
We are taking the sentences in the form of lists in different languages, in the meantime, we are printing them by using the classify() function.
Output
en ru es zh ja
The words displayed above are the short-cuts of the languages present in the list.
Method 2
The langdetect module works similarly to that of langrid, but the difference is langdetect module only detects 55 languages.
However, we need to import the langdetect module which is not available in Python internally.
The command would be:
pip install langdetect
Let us go through the code.
However, in langdetect module, we are going to use the detect() function.
Example Code:
import langdetect x = ["CodeSpeedy is a great platform for techies.", " это компьютерный портал для гиков", "es un portal informático para geeks", "是面向极客的计算机科学门户", "は、ギーク向けのコンピューターサイエンスポータルです。", ] for i in x: print(detect(i))
Output
en ru es no ja
Method 3
The textblob module is something more than that of language identification.
The textblob involves noun phrase extraction, sentiment analysis, and classification which are equally important as language detection.
We can install this module by using the below command.
pip install textblob
Example Code:
from textblob import TextBlob x = ["CodeSpeedy is a great platform for techies.", " это компьютерный портал для гиков", "es un portal informático para geeks", "是面向极客的计算机科学门户", "は、ギーク向けのコンピューターサイエンスポータルです。" ] for i in L: lang = TextBlob(i) print(lang.detect_language())
The textblob is another way of detecting the unknown language.
Output
en ru es zh-CN ja
At last, we are done with this tutorial.
In the same way, you can refer to the articles below in your interest:
Leave a Reply