Get voice input with microphone in Python using PyAudio and SpeechRecognition

In this Python tutorial, we will show you how to take voice input with microphone in Python using PyAudio and SpeechRecognition.

To do this task we require the following things installed on our machine.

  • Python
  • SpeechRecognition Package
  • PyAudio

That’s it.

To learn how to install the packages you can get to know from here install essential packages to work with microphone in Python



Also learn,

And one more thing you have to keep in mind that here we are going to work with microphone thus you must need to know the device ID of your audio input device.

Because you have to tell your Python program that you want to take speech input or voice input from which particular microphone.

If you still don’t know how to find the device ID please read my previous tutorial,

Find all the microphone names and device index in Python using PyAudio

The above tutorial will help you to learn all the things you need to set before you start working with this tutorial.

Now we assume that you are all set.

Take voice input from the user in Python using PyAudio – speech_recognizer

What we gonna do in simple steps:

  • Take input from the mic
  • Convert the voice or speech to text
  • Store the text in a variable/or you can directly take it as user input

There are several API available online for speech recognition or you can say voice to text.

Sphinx can work offline.
But I personally like google speech recognition as this gives us a more accurate result as Google has a huge dataset.
Here I will work with Google Speech Recognition only. As it is not possible to cover all the speech recognition API in a single tutorial.
Let’s start with the below code to check if everything is working fine or not.
import speech_recognition as s_r
print(s_r.__version__)

Output:

3.8.1

It will print the current version of your speech recognition package.

If everything is fine then go to the next part.

Set microphone to accept sound

my_mic = s_r.Microphone()

Here you have to pass the parameter device_index=?

To know your device index follow the tutorial: Find all the microphone names and device index in Python using PyAudio

To recognize input from the microphone you have to use a recognizer class. Let’s just create one.

r = s_r.Recognizer()

So our program will be like this till now:

import speech_recognition as s_r
print(s_r.__version__) # just to print the version not required
r = s_r.Recognizer()
my_mic = s_r.Microphone(device_index=1) #my device index is 1, you have to put your device index

Don’t try to run this program. We have left things to do.

Now we have to capture audio from microphone. To do that we can use the below code:

with my_mic as source:
    print("Say now!!!!")
    audio = r.listen(source)

Now the final step to convert the sound taken from the microphone into text.

Convert the sound or speech into text in Python

To convert using Google speech recognition we can use the following line:

r.recognize_google(audio)

It will return a string with some texts. ( It will convert your voice to texts and return that as a string.

You can simply print it using the below line:

print(r.recognize_google(audio))

Now the full program will look like this:

import speech_recognition as s_r
print(s_r.__version__) # just to print the version not required
r = s_r.Recognizer()
my_mic = s_r.Microphone(device_index=1) #my device index is 1, you have to put your device index
with my_mic as source:
    print("Say now!!!!")
    audio = r.listen(source) #take voice input from the microphone
print(r.recognize_google(audio)) #to print voice into text

If you run this you should get an output.
But after waiting a few moments if you don’t get any output, check your internet connection. This program requires internet connection.

If your internet is alright but you still are not getting any output that means your microphone is getting noise.

Just press ctrl+c and hit enter to stop the current execution.

Now you have to reduce noise from your input.

How to do that?

 r.adjust_for_ambient_noise(source)

This will be helpful for you.

Now the final program will be like this:

It should successfully work:

import speech_recognition as s_r
print(s_r.__version__) # just to print the version not required
r = s_r.Recognizer()
my_mic = s_r.Microphone(device_index=1) #my device index is 1, you have to put your device index
with my_mic as source:
    print("Say now!!!!")
    r.adjust_for_ambient_noise(source) #reduce noise
    audio = r.listen(source) #take voice input from the microphone
print(r.recognize_google(audio)) #to print voice into text

Output:

Will print whatever you say!!

You can store the string to any variable if you want. But remember r.recognize_google(audio) this will return string. So careful while working with datatypes.

my_string = r.recognize_google(audio)

You can use this to store your speech in a variable.

Do comment if you need any further help or any suggestion to make it better.

Leave a Reply

Your email address will not be published. Required fields are marked *