This is a five-finger exercise for code manipulation by AI assistants (aider) and for using local models such as Qwen2-VL and Whisper (Hugging Face Interface) to extract English translations from - to me - rather exotic video files by OCR-ing Chinese and Indonesian subtitles as well as the Chinese audio track.
The video files are not part of the repository.