Building a voice-controlled virtual assistant using Python

Hey there! In this tutorial, we will be learning to create a simple voice-controlled virtual assistant in PyCharm using Python.

Below attached are the basic steps to create a virtual assistant who is capable of:
* Playing any video from YouTube
* Searching for any information on Wikipedia

Step 1: Importing libraries for creating virtual voice assistant in Python

Open PyCharm and create a project titled Virtual_Assistant. Then, open the terminal and type the below-listed commands to install the respective libraries.

pip install SpeechRecognition
pip install pyttsx3
pip install pipwin
pipwin install PyAudio
pip install pywhatkit
pip install wikipedia
  • SpeechRecognition: To perform speech recognition
  • pyttsx3: For text-to-speech conversion
  • pipwin: A complementary tool for pip on Windows, used for installing unofficial python package binaries
  • PyAudio: This is an audio I/O library. (Cross-platform) We can use this to work with audio in our Python program.
  • pywhatkit: This library is mainly used for sending WhatsApp messages but supports other functionalities as well. Here, playonyt() method belonging to this library is to be used to open YouTube in the default browser and plays the requested video.
  • Wikipedia: To access and parse data from Wikipedia.

Step 2: Python program for our assistant

Within the file in this project, type the below-specified code.

import speech_recognition as SR
import pyttsx3
import pywhatkit
import wikipedia

james = pyttsx3.init()
def james_speak(content):

listener = SR.Recognizer()
def listen_to_user():
        james_speak("Hey there! I'm James, your virtual assistant.")
        with SR.Microphone() as source:
            james_speak("How can I help you?")
            user_audio = listener.listen(source)
            user_input = listener.recognize_google(user_audio).lower()
            if "james" in user_input:
                user_input = user_input.replace("james","")
    return user_input

command = listen_to_user()
if "play" in command:
    command = command.replace("play", "")
    james_speak("Playing "+command)
    james_speak("Searching for"+command)
    info = wikipedia.summary(command,1)


  • pyttsx3.init() function is used to get a reference to a pyttsx3.Engine instance.
    Within the james_speak() method, the say() function takes a string as the parameter and then queues the same to be converted from text-to-speech. The runAndWait() function blocks the engine instance until all the currently queued commands are processed.
  • The recognizer instance is used to recognize speech and is created at line #12.
  • Within the  listen_to_user() method,
    –   james_speak() method is called so that the virtual assistant can introduce himself to the user.
    –  Line #16 specifies that the default microphone is to be used as the audio source.
    –  The listen() function, listens for the audio phrase and extracts it into audio data. Then, the same is recognized via Google Speech Recognition using the recognize_google() function.
    –  Only those statements, that contain ‘james’ in them are to be identified as user input to the virtual assistant and hence returned by the listen_to_user() method.
  • If the keyword ‘play’ is found in the user input, playonyt() function is used to open YouTube in the default browser and play the video specified in the user input.
    Else, the search() method is used to extract data from Wikipedia. It takes 2 arguments, firstly, the title of the topic, for which summary is to be generated, and secondly, an optional parameter indicating the number of summary lines to be returned.


Example of playing YouTube video using voice command

Example of searching on Wikipedia:


Also read: Voice Command Calculator in Python using speech recognition and PyAudio

Leave a Reply

Your email address will not be published. Required fields are marked *