You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When using an existing Batch account, the deployment fails while running a test workflow, if there is an existing pool with an unsupported VM size.
Steps to Reproduce
Create a batch account and a pool using Basic_A0 VM size. Deploy new CoA instance using --BatchAccountName parameter. The deployment fails with "Test workflow failed" and storage account contains a file in workflows/failed directory with the following content: ..."FailureReason": "UnknownError",
"SystemLogs": [
"Object reference not set to an instance of an object.",
" at TesApi.Web.BatchScheduler.<>c__DisplayClass44_0.<CheckBatchAccountQuotas...
Additional context
This can be reproduced by creating the pool using unsupported VM size in existing Cromwell on Azure instance and submitting a workflow.
Solution
Possible fix is to ignore unsupported VM sizes when calculating the number of vCPUs in use. This may result in underreporting of usage if those pools have running nodes. Alternatively, use an API to get the current list of VM sizes for this purpose, while still maintaining the current hardcoded list of supported VM sizes for the purpose of deciding which VM size to use for task execution. Or combine both methods, since it is possible to create pools with VM sizes that Batch does not list as supported.
The text was updated successfully, but these errors were encountered:
Describe the bug
When using an existing Batch account, the deployment fails while running a test workflow, if there is an existing pool with an unsupported VM size.
Steps to Reproduce
Create a batch account and a pool using Basic_A0 VM size. Deploy new CoA instance using --BatchAccountName parameter. The deployment fails with "Test workflow failed" and storage account contains a file in workflows/failed directory with the following content:
..."FailureReason": "UnknownError",
"SystemLogs": [
"Object reference not set to an instance of an object.",
" at TesApi.Web.BatchScheduler.<>c__DisplayClass44_0.<CheckBatchAccountQuotas...
Additional context
This can be reproduced by creating the pool using unsupported VM size in existing Cromwell on Azure instance and submitting a workflow.
Solution
Possible fix is to ignore unsupported VM sizes when calculating the number of vCPUs in use. This may result in underreporting of usage if those pools have running nodes. Alternatively, use an API to get the current list of VM sizes for this purpose, while still maintaining the current hardcoded list of supported VM sizes for the purpose of deciding which VM size to use for task execution. Or combine both methods, since it is possible to create pools with VM sizes that Batch does not list as supported.
The text was updated successfully, but these errors were encountered: