Skip to content

Latest commit

 

History

History
80 lines (55 loc) · 4.7 KB

README.md

File metadata and controls

80 lines (55 loc) · 4.7 KB

🐾 FUWAMOCO Transcripts Repository

This repo holds a set of scripts for automatic transcription, as well as summaries, for FUWAMOCO content, starting with FWMC Morning.

This is a fan made project. The contents in this repository follow the hololive production Derivative Works Guidelines set forth by Cover Corp.

Note

In its initial state, the transcriptions provided in this repository were generated automatically using speech recognition software. The summaries were created aided by a large language model virtual assistant. As a result, there may be inaccuracies or errors in these documents that are not representative of the original content. If you notice significant errors, feel free to submit corrections or raise issues through the repository's issue tracker.

🌅 FUWAMOCO Morning

FUWAMOCO Morning is an online, short-format morning show hosted by the fuzzy and fluffy guard dog sisters FUWAMOCO. Their aim with this show is to bring a smile to everyone's face and to help them start the day on the right paw!

An index of all FWMC Morning episodes, summaries and transcripts can be found at morning/index.md.

⚒️ Building

Some of the files here are generated automatically.

  1. Media is extracted directly from YouTube along with its metadata. Audio is converted into .wav format for easier processing.
  2. Audio is automatically transcribed using a mix of automatic speech recognition plus some manual workarounds to provide the most accurate transcript possible from the get-go.
  3. A basic summary is created. A LLM/RAG tool can be leveraged to to summarize the information from the transcript and help with the initial draft.
  4. Using the basic summary and metadata from each media, a fancier summary document can be created programatically.
  5. An index of all summaries can be automatically generated.

Prerequisites

If using Windows, you can install all these prerequisites with WinGet.

🔰 Quick start

Install prerequisites in Windows

Make sure that WinGet is installed, then run the following:

winget install Microsoft.PowerShell
winget install Oven-sh.Bun
winget install Anaconda.Miniconda3 -v py310_23.5.2-0
winget install Gyan.FFmpeg
winget install yt-dlp.yt-dlp
winget install Nvidia.CUDA -v 11.8

Set up Miniconda environment

  1. Init Miniconda for PowerShell: conda init powershell
  2. Create environment: conda create --name whisperx
  3. Init environment: conda activate whisperx
  4. Install requirements: conda install pytorch==2.1.2 torchaudio==2.1.2 pytorch-cuda=11.8 -c pytorch -c nvidia
  5. Install WhisperX: pip install git+https://github.com/m-bain/whisperx.git

The latest supported PyTorch version in whisperX (as of version 3.1.1) is 2.1.2, which also restricts the Python version that can be installed in your Miniconda environment to 3.10. To use a newer PyTorch version, you can update pyannote.audio; for instance, upgrading to version 3.3.1 enables installation of PyTorch 2.3.1 and usage of Python 3.12.

🎶 FUWAMOCO songs

A compilation of lyrics, fanchants & other content related to their songs.

✨ Thanks

  • Cover Corp.: For hololive and for giving FUWAMOCO a chance to shine through.
  • Dylan Mendes, Kami-bako: For the timestamps in each video's comments sections; it was an useful resource to easily find sections and compare against my own timestamps.
  • FUWAMOCO: For their content and for being an inspiration to many, including myself!