Google Assistant Like Chat Bot Building using Python

Hey there, fellow Python coder! One is always fascinated with how AI assistants like Google Assistant, Alexa, Siri, and other apps work. And I am sure you always wonder about building a similar application by yourself.
Well, Buckle Up! In this tutorial, we will learn how to build our own Voice-assisted AI assistant just like Google Assistant.
A NOTE to be considered is that: the Google Assistant library for Python is not available anymore. Hence, we will be aiming to make a voice-assisted chatbot using just the idea of Google Assistant.
Are you excited? Well, I am for sure.
You can also check our another tutorial: Voice Command Calculator in Python using speech recognition and PyAudio
Introduction to Google Assistant
There are numerous advantages of having a voice-based interface, one of them being that it makes the technology simpler and more like chatting with a buddy, making it easier for you to get things done without the hassle of buttons and screens.
Google Assistant is like a super-smart, voice-activated helper that lives inside your phone or other devices. You can talk to it, ask questions, and give it tasks, and it’ll do its best to help you out.
Code Implementation of Google Assistant using Python
Step 1: Importing Modules
We will start by importing modules that are necessary for the development of Google Assistant using the code below:
import speech_recognition as sr from gtts import gTTS from IPython.display import Audio, display,HTML import time
Both speech_recognition
and gtts
library doesn’t get installed by default so we need to install the same modules using the pip install
command. Let’s understand the purpose of each library one after another:
SpeechRecognition
: This module is used for recognizing speech. It provides functions to capture audio from a microphone and convert it into text.gTTS
: This module is used to convert text into speech. It interfaces with Google Translate’s text-to-speech API to generate spoken language from the provided text.IPython.display
: This module provides tools for displaying audio, images, and HTML content directly within the IPython/Jupyter environment.time
: This module is used to introduce delays or pause the execution of the program for a specified amount of time.
In the next section, we will create functions to input & output results from the User & Google Assistant perspective. Both the User and Google Assistant can take textual
and audio
data as illustrated by the featured image of this article.
Step 2: Create Functions to take Input from User
We will create three functions to achieve the input functionality from the user. One function will be the main function to take INPUT and the other two functions will be the helper functions namely for textual and audio input respectively.
FUNCTION 1: USER INPUT IN TEXT
FORM
We will first of all take input of the message from the user using the input
function. Then, we need to validate if the user has entered the empty string, we need to call the function again. If the user has entered a proper message, then the system creates an HTML string (html
) using an f-string
functionality in Python. We will display the HTML text using the display
and HTML
function and lastly return the input text from the function.
Why are we using HTML content and not print statements? This is because we wish to have a proper chat message bubble format in the output screen. You will know what I mean when we look at the outputs of the function.
def takeUserInput_TEXT(): textInput = input("Your Message : ") if len(textInput) < 1: print("Sorry! I didn't catch that. Can you please repeat?") else: html = f""" <div style='background-color: #f0f0f0; padding: 10px; border-radius: 10px; margin: 10px;'> <p style='color:purple; font-weight: bold;'>User</p> <p>{textInput}</p> </div> """ display(HTML(html)) return textInput.lower()
Have a look at the sample output:
FUNCTION 2: USER INPUT IN AUDIO
FORM
Now, recording input voice from the user is a little complex and involves a few complicated functions to perform necessary operations. Let’s divide the whole step into two separate sub-steps:
- Recording the audio from the user using the Microphone
- Recognizing the speech of the recorded audio
Recording the audio from the user using the Microphone
This step involves recording the audio input from the user using their microphone. To achieve this we have the following code snippet:
recognizer = sr.Recognizer() with sr.Microphone() as source: print("Go ahead, I am Listening....") recognizer.adjust_for_ambient_noise(source, duration=1) audio = recognizer.listen(source)
First of all, we will create an object of Recognizer
class which will be responsible for recognizing the voice and also contains the functions to enhance or manipulate the voice being recorded. Next, we will pick Microphone
from the source
which is auto-detected by the function.
Check: Get voice input with microphone in Python using PyAudio and SpeechRecognition
Now we know that not everyone owns a perfect microphone or not everyone is sitting in a quiet place. To make sure that doesn’t hinder our assistant’s performance, we will make use of the adjust_for_ambient_noise
function which takes two parameters. First is the source obviously to know which device to take into consideration. Second is the duration for which we need to hear from the user. Finally, the listen
function captures the audio of the user from the source.
Recognizing the speech of the recorded audio
In this step, we will recognize what the user said in the captured voice. This involves many more steps than the previous step. Let’s dive into it. This step makes use of Google Speech Recognition API which helps in understanding what the user said exactly.
For this block of code, we will make use of the Exceptional Handling in Python to make sure that in case the program fails to recognize the voice then the situation must be handled in a clean way.
Also Read: Find all the microphone names and device index in Python using PyAudio
Now what exactly are we gonna do under the try-catch block? Let’s have a look at the code below:
try: print("Trying to Recognize what you just said...") recognized_Input = recognizer.recognize_google(audio) html = f""" <div style='background-color: #f0f0f0; padding: 10px; border-radius: 10px; margin: 10px;'> <p style='color:purple; font-weight: bold;'>User</p> <p>{recognized_Input}</p> </div> """ display(HTML(html)) return recognized_Input.lower() except sr.UnknownValueError: print("Sorry, I didn't catch that. Can you please repeat?") except sr.RequestError as e: print(f"Sorry! Could not fetch results for you because of {e}") return ""
The recognize_google
function is the main function in this whole code snippet. This function makes use of the Google Speech Recognition API takes an audio file and then converts the audio to text. The HTML format is similar to the one we used for textual input. Finally, we return the recognized audio.
You must have noticed two except blocks as well as handling two different Exceptions namely UnknownValueError and RequestError which are described below:
- UnknownValueError handles the case where the recognizer couldn’t understand the speech of the user.
- RequestError handles the case where there is an error with the speech recognition service itself.
I hope till now things are clear. Let’s combine the steps together to form a whole function to capture audio input.
def takeUserInput_AUDIO(): recognizer = sr.Recognizer() with sr.Microphone() as source: print("Go ahead, I am Listening....") recognizer.adjust_for_ambient_noise(source, duration=1) audio = recognizer.listen(source) try: print("Trying to Recognize what you just said...") recognized_Input = recognizer.recognize_google(audio) html = f""" <div style='background-color: #f0f0f0; padding: 10px; border-radius: 10px; margin: 10px;'> <p style='color:purple; font-weight: bold;'>User</p> <p>{recognized_Input}</p> </div> """ display(HTML(html)) return recognized_Input.lower() except sr.UnknownValueError: print("Sorry, I didn't catch that. Can you please repeat?") except sr.RequestError as e: print(f"Sorry! Could not fetch results for you because of {e}") return ""
Let’s have a look at the output of the function below:
It’s fascinating, isn’t it? Now, the program needs to know when to call which function. So we can create a main function that will call either of the functions based on the what user chooses.
def takeUserInput(): while True: userInputType = input("Do you wish to give AUDIO output or TEXT output?") if userInputType.lower() == 'text': return takeUserInput_TEXT() elif userInputType.lower() == 'audio': return takeUserInput_AUDIO() else: print("Please enter correct response!")
This code snippet is pretty much simple and clear, so let’s move ahead.
Step 3: Create Functions for Google Assistant Output (Both Text and Audio)
In this section, we will work on the functions which are responsible for giving out the output of the assistant. Similar to the User, the Google Assistant can also have either textual or audio conversations with the User. Let’s work for both functions one after another.
def GoogleAssistant_TEXT(response): html = f""" <div style='background-color: #f0f0f0; padding: 10px; border-radius: 10px; margin: 10px;'> <p style='color:blue; font-weight: bold;'>Google Assistant</p> <p>{response}</p> </div> """ display(HTML(html))
For the textual data, we simply create a function that takes the response that we need for a particular message and prints the HTML content for the same response. Let’s have a look at the sample output of the function as well when we call: GoogleAssistant_TEXT(“Hello”)
Now coming to the voice output let’s have a look at the code below.
def GoogleAssistant_VOICE(response): speech = gTTS(response) speech.save("Google_Response.mp3") display(Audio("Google_Response.mp3", autoplay=True)) GoogleAssistant_TEXT(response)
First of all, we make use of the gTTS
function which is basically a Google Text-to-Speech function that takes the response and converts it to a speech. We will save the mp3 file using the save
function and then display the audio using the Audio
function which takes a mandatory parameter that is the audio file that we wish to play. Here we have added a new parameter called autoplay
which implies if we wish the audio to play by itself or wait for the user to manually play it. In this case, we need it to play by itself, hence the value is set to True
.
Along with the audio, we are also calling the function that will display the message in textual format as well. Let’s have a look at the output of GoogleAssistant_VOICE(“Hello”).
Step 4: Creating the MAIN Google Assistant function
This is the final step in our journey of building the AI bot. This step involves accommodating all the functions together in the way we want them to be called using the right timings. Let’s have a look at the complete code first and then break it down into steps.
def GoogleAssistant(): googleSoundInput = input("Do you want Google Assistant to use Audio or Text? ") if(googleSoundInput.lower() == 'audio'): GoogleAssistant_VOICE("Hello! I am Google Assistant. How can I help you today?") while True: userInput = takeUserInput() if('stop' in userInput): GoogleAssistant_VOICE("Goodbye! Have a nice day!") break elif(googleSoundInput.lower() == 'text'): GoogleAssistant_TEXT("Hello! I am Google Assistant. How can I help you today?") while True: userInput = takeUserInput() if('bye' in userInput): GoogleAssistant_TEXT("Goodbye! Have a nice day!") break else: print("Please enter correct response!") GoogleAssistant()
In this function we are doing the following:
- Asking the user if they want the assistant to assist them in a textual way or in audio.
- Depending on the output we will call different functions using the if-else functions: For both of the conditions we are following pretty much simple steps and calling the above functions in an order.
Let’s have a look at the output we have when we call the GoogleAssistant() function below:
Step 5: Adding Additional Outputs for the Assistant
We can’t just end the tutorial here without adding more responses to our assistant. As of now the assistant only gives a dynamic response when the user says ‘stop’, but what about other responses?
Let’s create a function that will take the userInput and respond according to the keyword that the program finds in the input given by the user. I hope this is clear.
def generateResponse(userInput): if 'weather' in userInput: return "The weather is currently sunny and warm! ☁️" elif 'joke' in userInput: return "Why don't scientists trust atoms? Because they make up everything! " elif 'music' in userInput: return "I can recommend some great music. What genre are you in the mood for? " else: return "Sorry I can't assit you with that! "
Now all we need to do is call this function at the right place as we have done in the code snippet below:
def GoogleAssistant(): googleSoundInput = input("Do you want Google Assistant to use Audio or Text? ") if(googleSoundInput.lower() == 'audio'): GoogleAssistant_VOICE("Hello! I am Google Assistant. How can I help you today?") while True: userInput = takeUserInput() if('stop' in userInput): GoogleAssistant_VOICE("Goodbye! Have a nice day!") break else: GoogleAssistant_VOICE(generateResponse(userInput)) elif(googleSoundInput.lower() == 'text'): GoogleAssistant_TEXT("Hello! I am Google Assistant. How can I help you today?") while True: userInput = takeUserInput() if('stop' in userInput): GoogleAssistant_TEXT("Goodbye! Have a nice day!") break else: GoogleAssistant_TEXT(generateResponse(userInput)) else: print("Please enter correct response!") GoogleAssistant()
And that’s all! We are all set to go! Let’s look at the following output:
Conclusions
In this tutorial, you were able to develop your own Google Assistant using the libraries, functions, and concepts present in Python Programming language. I hope you had fun developing this complex assistant in a simpler and easier way!
You can add more responses or edit the already present responses according to your preferences!
If you liked this tutorial, Also Read:
- Extract speech text from video in Python
- Speech Recognition in Python using CMU Sphinx
- Create an Audiobook from a PDF file using Python – Text-to-speech
Happy Learning and Coding!
Leave a Reply