This repository contains scripts to process preservation files, generate checksums, and create and move derivatives of The History Makers oral history interviews.
Download official Windows build here: https://git-scm.com/download/win
Install using the Git-[version].exe file, using the default/ pre-filled options
Download official Python 3.x build for Windows here: https://www.python.org/downloads/windows/
Open Downloads folder, locate python3.x.exe file, right-click and select "Run as Administrator" from pop-up menu
IMPORTANT - during install, select "Add Python to environment variables" option
open a new instance of Powershell, type "python" and hit enter (type "exit()" and hit enter to exit the Python interperator shell that was opened)
Git is version control software for developers - GitHub is a website that integrates Git with other features that developers find handy. The code for this project is hosted on GitHub, and we'll use Git to download a copy of that code to the machine running the video processing, and upload back to GitHub with any changes.
GitHub only supports SSH authentication these days, follow thier guide for setting that up here: https://docs.github.com/en/authentication/connecting-to-github-with-ssh
Once SSH is set up, do an SSH clone of the repo to the video processing machine.
-
open video-post-processing-config.txt in the text editor of your choice
-
fill out fields per your local specifications
general format is:
[section_header]
variable_name = variable value
do not enclose paths with quotes, even if they have spaces - do not escape whitespace either
comma-separated list of acceptable file extensions for input files, each extension is enclosed in quotes
e.g. ".mov",".MOV"
this section contains filepaths for assets which are required in order to transcode derivative files
specifies the path to the main ingest directory. This directory can be considered "hot" in that any subfolders will be attempted to be processed when the script is run with no arguments. Individual accessions should be saved at this path in a folder named with the accession number - alternatively, folder can contain any name if an alternative accession number is supplied at runtime (see Usage section of this document)
Example folder setup, tree view
/raw_captures
├── A2022_034_001_001
│ ├── DOH_HEJ_006_000.mov
│ ├── DOH_HEJ_006_001.mov
│ ├── DOH_HEJ_006_002.mov
│ └── DOH_HEJ_006.XML
├── A2022_034_001_002
│ ├── DOH_HEJ_007_000.mov
│ ├── DOH_HEJ_007_001.mov
│ ├── DOH_HEJ_007_002.mov
│ └── DOH_HEJ_007.XML
├── A2022_047_001_001
│ ├── 01275001.MOV
│ ├── 01275002.MOV
│ ├── 01275003.MOV
│ └── 01275004.MOV
This section describes folder paths for derivatives
This section contains info for email notifications from the script
This section contains folder paths for the directory containing the logs, as well as the path of the lockfile that makevideos creates in order to only one a single instance of the script at a time
This section delineates the folderpath for MediaConch policies
ingest.py --options accession_number(s)
ingest.py -h
this script uses the venv
python library to manage dependencies ("venv" is short for "virtual environment"). It must be enabled in order to be used, however. THM staff shouldn't have to do this too often, but after closing cmd.exe or after a restart it may be necessary.
you can tell you're in the virtual environment by looking to the left of the command prompt. For the THM processing machine, the prompt is D:\Users\archadmin\code\thm
- if that line is preceded by (venv)
, you are in the virtual environment
This is what you want:
(venv) D:\Users\archadmin\code\thm:
This means you gotta activate it:
D:\Users\archadmin\code\thm:
To activate the virtual environment, run the below command in cmd.exe:
venv\Scripts\activate.bat
once that command completes, you should be good to go
the script will error and close if it is not being run in the virtual environment
ingest everything in raw_captures directory, as configured in config file
ingest.py
ingest a single accession, A2022_012_001_001
ingest.py A2022_012_001_001
ingest multiple accessions
ingest.py A2022_012_001_001 A2022_033_001_001
ingest without validating input files
ingest.py --no_input_validation A2022_012_001_001
you can run this script with more or less output to the terminal
note that these setting don't change what is logged, just what is printed
run in verbose mode
ingest.py -v A2022_012_001_001
run in quiet mode
ingest.py -q A2022_012_001_001
you can run this script without sending emails using the --no_email
flag
ingest.py --no_email
you can run the script without copying files to the connected drives using the --no_copy
flag
ingest.py --no_copy
these options can be strung together in a single command. the command below will process two accessions without input validation, printing every log entry to the terminal window, without copying files and without emailing anyone
ingest.py -v --no_input_validation --no_copy --no_email A2022_999_001_001 A2017_088_001_001
this script takes the raw video captures delivered by THM personnel and:
-
concatenates the < 4GB files into 1 long file
-
transcodes that file to flv, mp4, and mpeg
-
embeds timecode and watermarks where appropriate
-
hashmoves (see below) them to their destiantions
-
triggers script to embed those hashes into a Filemaker db named PBCore_Catalog
makevideos also checks to make sure that everything is plugged in and that all necessary files (like watermarks) are in their expected locations.
makevideos is triggered every 15minutes, M-F, 7am-9pm local time by cron
makevideos can also be run manually by cd'ing into the repo directory (look for that in the config.txt file) and running "python makevideos.py"
this script checks the values in the config file against the configuration currently present on the workstation running the script. Predominantly, it verifies that filepaths specified in the config actually exist.
this script uses MediaConch validation to ensure that only valid input files are passed to the script for preservation/ transcode. MediaConch policies are managed in the directory specified in the config file. For each input file, this script checks it against available file policies in the MediaConch policies folder - if a match is found, that policy is used to validate all other input and output files for the accession.
if a file doesn't pass validation, follow these steps to find out why:
-
open MediaConch
-
in the "Checker" tab, use the dropdown menu to select the policy to check against -- see log for list of policies attempted
-
still in the "Checker" tab, select a file to check against the policy from step 1
-
select "check file"
-
MediaConch will analyze the file and add it to a list at the bottom of the window
-
to view pass/ fail for each field, click the eyeball icon
for more info, see official how-to's at this link
this script handles all calls to FileMaker database, requires ODBC
this script sends emails per info in config file
utility functions required by other scripts in this repository
This script uses Python's venv module to create a virutal environment, the venv folder contains configuration info for this virtual environment, and should not need to be modified