Building a voice-controlled virtual assistant using Python
Hey there! In this tutorial, we will be learning to create a simple voice-controlled virtual assistant in PyCharm using Python.
Below attached are the basic steps to create a virtual assistant who is capable of:
* Playing any video from YouTube
* Searching for any information on Wikipedia
Step 1: Importing libraries for creating virtual voice assistant in Python
Open PyCharm and create a project titled Virtual_Assistant. Then, open the terminal and type the below-listed commands to install the respective libraries.
pip install SpeechRecognition pip install pyttsx3 pip install pipwin pipwin install PyAudio pip install pywhatkit pip install wikipedia
- SpeechRecognition: To perform speech recognition
- pyttsx3: For text-to-speech conversion
- pipwin: A complementary tool for pip on Windows, used for installing unofficial python package binaries
- PyAudio: This is an audio I/O library. (Cross-platform) We can use this to work with audio in our Python program.
- pywhatkit: This library is mainly used for sending WhatsApp messages but supports other functionalities as well. Here, playonyt() method belonging to this library is to be used to open YouTube in the default browser and plays the requested video.
- Wikipedia: To access and parse data from Wikipedia.
Step 2: Python program for our assistant
Within the main.py file in this project, type the below-specified code.
import speech_recognition as SR import pyttsx3 import pywhatkit import wikipedia james = pyttsx3.init() def james_speak(content): james.say(content) james.runAndWait() print(content) listener = SR.Recognizer() def listen_to_user(): try: james_speak("Hey there! I'm James, your virtual assistant.") with SR.Microphone() as source: james_speak("How can I help you?") user_audio = listener.listen(source) user_input = listener.recognize_google(user_audio).lower() if "james" in user_input: print(user_input.upper()) user_input = user_input.replace("james","") except: pass return user_input command = listen_to_user() if "play" in command: command = command.replace("play", "") james_speak("Playing "+command) pywhatkit.playonyt(command) else: james_speak("Searching for"+command) info = wikipedia.summary(command,1) james_speak(info)
- pyttsx3.init() function is used to get a reference to a pyttsx3.Engine instance.
Within the james_speak() method, the say() function takes a string as the parameter and then queues the same to be converted from text-to-speech. The runAndWait() function blocks the engine instance until all the currently queued commands are processed.
- The recognizer instance is used to recognize speech and is created at line #12.
- Within the listen_to_user() method,
– james_speak() method is called so that the virtual assistant can introduce himself to the user.
– Line #16 specifies that the default microphone is to be used as the audio source.
– The listen() function, listens for the audio phrase and extracts it into audio data. Then, the same is recognized via Google Speech Recognition using the recognize_google() function.
– Only those statements, that contain ‘james’ in them are to be identified as user input to the virtual assistant and hence returned by the listen_to_user() method.
- If the keyword ‘play’ is found in the user input, playonyt() function is used to open YouTube in the default browser and play the video specified in the user input.
Else, the search() method is used to extract data from Wikipedia. It takes 2 arguments, firstly, the title of the topic, for which summary is to be generated, and secondly, an optional parameter indicating the number of summary lines to be returned.
Example of playing YouTube video using voice command
Example of searching on Wikipedia: