Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

whisperX to create transcripts #28

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ If you already have a human-made SRT subtitles file for a video, this will:
- Open the script file with a text editor and change the values in the "User Settings" section at the top.
- This will label the tracks so the video file is ready to be uploaded to YouTube. HOWEVER, the multiple audio tracks feature is only available to a limited number of channels. You will most likely need to contact YouTube creator support to ask for access, but there is no guarantee they will grant it.
- **Optional:** You can use the separate `TitleTranslator.py` script if uploading to YouTube, which lets you enter a video's Title and Description, and the text will be translated into all the languages enabled in `batch.ini`. They wil be placed together in a single text file in the "output" folder.
- **Optional:** You can use the separate `whisperx.py` script to create a transcription (.srt) of your english video, as informed in the `batch.ini` configuration file. For other languages see [**Whisperx**](https://github.com/m-bain/whisperX). To improve the processing use pytorch (https://pytorch.org/get-started/locally/) with nvidia video card.

----

Expand Down
7 changes: 7 additions & 0 deletions audio_builder.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import re
import soundfile
import pyrubberband
import configparser
Expand Down Expand Up @@ -26,6 +27,12 @@
cloudConfig = configparser.ConfigParser()
cloudConfig.read('cloud_service_settings.ini')

# Get the video file name and create the output folder based on the original video file name
originalVideoFile = os.path.abspath(batchConfig['SETTINGS']['original_video_file_path'].strip("\""))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see some calls to the settings file with original_video_file_path, however, the config file remains unchanged, I think that creating a commit with these new settings and a default value would avoid some unexpected behavior on users that do not look into the code

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's best to remove this for now

fileName = os.path.basename(originalVideoFile).split(".")[0]
fileName = re.sub(r"[^\w\s-]", "", fileName)
outputFolder = outputFolder + "/" + fileName

# Get variables from configs
nativeSampleRate = int(config['SETTINGS']['synth_sample_rate'])
originalVideoFile = os.path.abspath(batchConfig['SETTINGS']['original_video_file_path'].strip("\""))
Expand Down
5 changes: 5 additions & 0 deletions main.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,11 @@
originalVideoFile = os.path.abspath(batchConfig['SETTINGS']['original_video_file_path'].strip("\""))
srtFile = os.path.abspath(batchConfig['SETTINGS']['srt_file_path'].strip("\""))

# Create the output folder based on the original video file name
fileName = os.path.basename(originalVideoFile).split(".")[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the same as the code in audio_builder.py?

Also, there's already some code to handle the creation of output folders in Line 479 of the main file, in my opinion, moving this implementation there (or replacing the existing one) would be a better fit for this code.

fileName = re.sub(r"[^\w\s-]", "", fileName)
outputFolder = outputFolder + "/" + fileName

# Validate the number of sections
for num in languageNums:
# Check if section exists
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ azure-cognitiveservices-speech
langcodes
language_data
numpy
git+https://github.com/m-bain/whisperx.git
36 changes: 36 additions & 0 deletions whisperx.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
import re
import subprocess
import os
import configparser

#---------------------------------------- Batch File Processing ----------------------------------------
batchConfig = configparser.ConfigParser()
batchConfig.read('batch.ini')

# MOVE THIS INTO A VARIABLE AT SOME POINT
outputFolder = "output"

# Get the video file name Create the output folder based on the original video file name
originalVideoFile = os.path.abspath(batchConfig['SETTINGS']['original_video_file_path'].strip("\""))

#whisperx (Whisper-Based Automatic Speech Recognition (ASR) with improved timestamp accuracy using forced alignment)
def transcribe(videoFile, output):
#Catch the video file name and create a folder with the same name
fileName = os.path.basename(videoFile).split(".")[0]
fileName = re.sub(r"[^\w\s-]", "", fileName) #Remove special characters
outputFolder = output + "/" + fileName

#Create the output folder
if not os.path.exists(outputFolder):
os.makedirs(outputFolder)

#Extract the audio from the original video to wav and save it in the output/{original_video_name}
command = f"ffmpeg -i {videoFile} -vn -acodec pcm_s16le -ac 1 -ar 48000 -f wav {outputFolder}/original.wav"
subprocess.call(command, shell=True)

#If you want to install whisperx in another environment, use conda envs
#os.system(f"conda activate whisperx && whisperx {outputFolder}/original.wav --model small.en --align_model WAV2VEC2_ASR_LARGE_LV60K_960H --output_dir {outputFolder}")
#Run whisperx
os.system(f"whisperx {outputFolder}/original.wav --model small.en --align_model WAV2VEC2_ASR_LARGE_LV60K_960H --output_dir {outputFolder}")

transcribe(originalVideoFile, outputFolder)