Extract speech text from video in Python

In this tutorial, we are going to see how to Extract speech text from a video in Python. We are going to extract the audio content from the video clip. And then we will convert that audio into text. It’s easy and simple let’s see how it works.

For this, we are going to use libraries available in Python. Here we are using Speech Recognition and MoviePy library.

MoviePy is a Python library used for video editing: cutting, concatenations, title insertions, video compositing, video processing, and creation of custom effects.

Whereas Speech Recognition library is used for performing speech recognition, with support for several engines and APIs, online and offline.

Before proceeding towards our task we need to install these libraries to our system. We can do this using the pip command as shown below in your terminal or shell.

pip install SpeechRecognition moviepy

Yes, that’s it! It’s pretty much simple. After executing the above command libraries will be installed in your machine.  SpeechRecognition module supports multiple recognition APIs.

We are going to use Google Speech API from it.
In the above command, MoviePy is also included.

In the next step let’s import these libraries into our Python code. The full code is given below.

# Step 1 : Importing libararies
import speech_recognition as sr 
import moviepy.editor as mp

# Step 2: Video to Audio conversion

VidClip = mp.VideoFileClip("https://cdn.codespeedy.com/content/data/video.mp4") 

# Step 3: Speech recognition

reco = sr.Recognizer()
audio = sr.AudioFile("https://cdn.codespeedy.com/content/data/converted.wav")
with audio as source:
  audio_file = reco.record(source)
result = reco.recognize_google(audio_file)

# Step 4: Finally exporting the result 

with open('https://cdn.codespeedy.com/content/data/SpeechText.txt',mode ='w') as file: 
   file.write("Recognized Speech Text:") 
   print("Text file ready!")

We are going to divide our task into 4 steps. As you can see in the above code.

Step 1 :
As mentioned above we imported libraries.

Step 2 :
Here we are going to convert Video file to Audio using MoviePy. First, we have declared the VidClip variable. Providing it our video file with path/location. By using audio.write_audiofile function we are converting .mp4 file to audio .wav file.

For Video You can use any format file such as mp4, m4a, m4v, 3GP, OGG, WMV, etc. And for Audio you can use wav, mp3, AAC, WMA, AC3, etc. As output here we are getting audio file converted.wav.

Step 3 :
In this step, our main task is Speech Recognition. First, let’s define the recognizer. As shown in code we have defined ‘reco’ as a recognizer. Next, we are giving the audio file we obtained in step 2 to the library Speech Recognition as input.

The recognizer will try to understand the speech in that file and convert it to a text format. We are using Google’s speech recognition library for this task. Here result will be stored in the ‘result’ variable.

Step 4:

Finally, It’s time to export our result to the actual text file. By using file handling in Python we are simply writing to our text file in ‘w’ mode is write mode using ‘file. write’ function. As output ‘SpeechText.txt’ file will be saved in the given directory. In the end “Text file ready!” will be printed so that we can know the task is completed.

Output :

Text file ready!

So, in this tutorial, we have successfully able to extract speech text from a video with the help of Python programming.

Leave a Reply

Your email address will not be published.