Convert speech to text in JavaScript

This tutorial will show you how to convert speech into text in JavaScript using Speech recognition which is a Web Speech API. For this purpose, we have to capture the speech of the user, convert that speech into text and display the converted text to the browser.

Currently, the Web Speech API is available on Chrome for Desktop and Android — Chrome has supported it from around version 33 but with prefixed interfaces only. For this reason, you need to include prefixed versions of them, e.g. webkitSpeechRecognition.

To make it cool and easier to understand we are going to create an HTML form and a button. We can do this on the browser console though.

<form id=\"CodeSpeedy\">
    <p class=\"text\"></p>
</form>

<button onClick=\"GetSpeech()\">Click and speak</button>

In the above HTML program, I have created a form element with id CodeSpeedy. Inside the form element, I have created a paragraph tag with id text, where I want to display the text that users will speak. After that, I created a button that needs to click before speaking.

Now, we are going to move into the main part of our project which is JavaScript code. I will explain it with a snippet as follows:

const GetSpeech = () => {
    document.getElementById(\"CodeSpeedy\").innerHTML = \"clicked microphone\";
    const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;

    let recognition = new SpeechRecognition();
}

First of all, I have created a function called GetSpeech where all the functionality of speech recognition will be encapsulated. Now the speech will capture only when the function is called.

I have used SpeechRecognition to convert the speech that the user speaks into text and show it on the webpage. And then webkitSpeechRecognition to execute speech recognition in browsers like Google chrome.

The SpeechRecognition() constructor to create a new Speech Recognition object instance so that I can access the function of the Speech Recognition object. The next snippet will show you these methods.

recognition.onstart = () => {
    document.getElementById(\"CodeSpeedy\").innerHTML = \"Listening..... speak in microphone\"
}

The onStart method is used to give the user an instruction that capturing speech has started and they should speak in the microphone. Here the user will get “Listening….. speak in microphone” as instruction.

recognition.onresult = (event) => {
    document.getElementById(\"CodeSpeedy\").innerHTML = event.results[0][0].transcript;
}

Now, the onresult event handler returns the result when the user finishes speaking. It has a parameter that is event.results which gives us information of the vocal input. The entire transcript of vocal input can be accessed by this code.

transcript returns a string of the recognized words which contain the transcript.

recognition.start();

recognition.start() method is used to activate listening and speech recognition.

Here is the full JavaScript code which will listen to the user’s speech and convert them into text.

const GetSpeech = () => {
    document.getElementById(\"CodeSpeedy\").innerHTML = \"clicked microphone\";
    const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;

    let recognition = new SpeechRecognition();

    recognition.onstart = () => {
        document.getElementById(\"CodeSpeedy\").innerHTML = \"Listening..... speak in microphone\"
    }

    recognition.onresult = (event) => {
        document.getElementById(\"CodeSpeedy\").innerHTML = event.results[0][0].transcript;
    }
    recognition.start();
}

If you run this program, you will get a button on the browser. You have to click the button. After clicking the button, the browser will ask you to allow the microphone, you have to click on allow. Then you need to speak and after finishing your speech you will get the spoken words as a string on the display.

Leave a Reply

Your email address will not be published. Required fields are marked *