Prerequisite artifacts:
- A dataset (in a GCP bucket) that we will use to test the performance of the model against
- A pretrained damage segmentation model (in a GCP bucket) to test the performance of
Infrastructure that will be used:
- A GCP bucket where the segmented stacks will be accessed from
- A GCP bucket where the results of the analysis will be stored
- A GCP virtual machine to run the test on
- If the stacks are not in a GCP bucket, see the previous workflow
Copying the raw data into the cloud for storage and usage
. - Use Terraform to start the appropriate GCP virtual machine (
terraform apply
). - Once Terraform finishes, you can check the GCP virtual machine console to ensure a virtual machine has been created named
<project_name>-<user_name>
where<project_name>
is the name of your GCP project and<user_name>
is your GCP user name. - To test a model, SSH into the virtual machine
<project_name>-<user_name>
, start tmux (tmux
),cd
into the code directory (cd necstlab-damage-segmentation
), and runpython3 test_segmentation_model.py --gcp-bucket <gcp_bucket> --dataset-id <dataset_id> --model-id <model_id>
. Optional: Use--trained-thresholds-id <model_thresholds>.yaml
to test using pretrained class thresholds. - Once testing has finished, you should see the folder
<gcp_bucket>/tests/<test_ID>
has been created and populated, where<test_ID>
is<dataset_id>_<model_id>
. Since testing can be performed multiple times using different prediction thresholds, outputmetadata.yaml
andmetrics.csv
filenames are appended with_<timestamp>
. - Use Terraform to terminate the appropriate GCP virtual machine (
terraform destroy
). Once Terraform finishes, you can check the GCP virtual machine console to ensure a virtual machine has been destroyed.
--gcp-bucket
: type=str, help='The GCP bucket where the raw data is located and to use to store the processed stacks.'--dataset-id
: type=str, help='The dataset ID.'--model-id
: type=str, help='The model ID.'--batch-size
: type=int, default=16, help='The batch size to use during inference.'--trained-thresholds-id
: type=str, default=None, help='The specified trained thresholds file id.''--random-module-global-seed
: type=int, default=None, help='The setting of random.seed(global seed), where global seed is int or default None (no seed given).')--numpy-random-global-seed
: type=int, default=None, help='The setting of np.random.seed(global seed), where global seed is int or default None (no seed given).')--tf-random-global-seed
: type=int, default=None, help='The setting of tf.random.set_seed(global seed), where global seed is int or default None (no seed given).')--message
: type=str, default=None, help='A str message the used wants to leave, the default is None.')
python3 test_segmentation_model.py --gcp-bucket gs://sandbox --dataset-id dataset-composite_0123 --model-id segmentation-model-composite_0123_20200321T154533Z --batch-size 16
python3 test_segmentation_model.py --gcp-bucket gs://sandbox --dataset-id dataset-composite_0123 --model-id segmentation-model-composite_0123_20200321T154533Z --batch-size 16 --trained-thresholds-id model_thresholds_20200321T181016Z.yaml
- Batch size of 16 works with P100 GPU (8 for K80), but batch size of 20 is too large for P100 GPU.
- Metrics will be computed based on
global_threshold
inmetrics_utils.py
unless pretrained thresholds are specified.
- In VM SSH, use
nano
text editor to edit scripts previously uploaded to VM. E.g.,nano configs/dataset-medium.yaml
to edit text indataset-medium.yaml
- To create a VM without destroying others (assuming
terraform apply
seeks to create & destroy), usetarget
flag:terraform apply -lock=false -target=google_compute_instance.vm[<#>]
to create VM #. Similar syntax withterraform destroy
to specify target.