Track and extract data from the training system of Firefox Translations.
Logs are extracted from Marian training tasks, running in Task Cluster.
This POC works offline, using a text log sample within the samples
directory. It outputs an instance of the TrainingLog
dataclass with the following attributes:
info
: Marian information as a dictconfiguration
Runtime configuration as a dicttraining
List of Training dataclass instances:epoch
up
sen
cost
time
rate
gnorm
validation
List ofValidation
dataclass instances:epoch
up
chrf
ce_mean_words
bleu_detok
logs
as a dict of log lines, indexed by their header (e.g. marian, data, memory)
On a virtual environment, you can install the package using pip:
$ pip install .
Run the parser with the local sample:
$ parse_tc_logs -i samples/<log_file>
Publish data to Weight & Biases:
$ parse_tc_logs -i samples/<log_file> --wandb-project <project> --wandb-group=<group> --wandb-run-name=<run>
Run the parser on a directory containing experiments and publis to Weight & Biases:
$ parse_experiment_dir -d models
On a virtual environment, you can install the package using pip: A developer may want to install the package in editable mode (i.e install from the local path directly):
$ pip install -e .
Pre-commit rules are automatically run once pre-commits hooks have been installed:
$ pip install pre-commit
$ pre-commit install
$ pre-commit run -a # Run pre-commit once