Skip to content

Update nf-google plugin to fix invalid machine type setting for Batch #4041

@ejseqera

Description

@ejseqera

Bug report

On Nextflow version 23.04.2, a few Google Batch users have encountered issues when running jobs in us-central1 on Google Batch, if there are machine families specified for machineType, the job fails and returns error indicating [GOOGLE BATCH] Cannot select machine type using cloud info for task: <process_name> | null

It seems a fix for this was already pushed in #3961, and an updated release for the nf-google plugin in Nextflow is required.

Expected behavior and actual behavior

With google.batch.spot = True and specifying a VM family with machineType = 'n2-*', jobs should be exclusively submitted to n2-* instance types, or fall back to letting Google Batch decide on a custom type depending on CPU and memory allocated for the task, especially if the pricing info can't be retrieved from the cloud info API for that region.

Only seeing this issue with the us-central1 region currently. Tested in europe-north1 (Finland) and machine type selection works fine.

Steps to reproduce the problem

Run nextflow run nf-core/rnaseq -r 3.12.0 -profile test with the following configuration:

google {
   project = 'tower-cloud-testing'
   location = 'us-central1'
   batch {
      spot = true
   }
   
process {   
    machineType = 'n2-*'
}

Program output

Trimmed GCP batch job log for one of the failed jobs:

allocationPolicy:
  instances:
  - policy:
      machineType: n2-*
      provisioningModel: SPOT
status:
  state: FAILED
  statusEvents:
  - description: Job state is set from QUEUED to SCHEDULED for job projects/687213979415/locations/us-central1/jobs/nf-8b82962a-1687270780854.
    eventTime: '2023-06-20T14:19:49.943016717Z'
    type: STATUS_CHANGED
  - description: "Job gets no longer retryable information Batch Error: code - CODE_GCE_BAD_REQUEST,\
      \ description - googleapi: Error 400: Invalid value for field 'resource.properties.machineType':\
      \ 'n2-*'. Machine type 'n2-*' must be a valid resource name., invalid, already\
      \ retried 3 times, errors record CODE_GCE_BAD_REQUEST."

Nextflow log excerpt:

 DEBUG n.c.g.batch.GoogleBatchTaskHandler - [GOOGLE BATCH] Cannot select machine type using cloud info for task: `NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:GUNZIP_GTF (genes.gtf.gz)` | Cannot invoke "java.lang.Iterable.iterator()" because "self" is null

Environment

  • Nextflow version: 23.04.02 build 5870
  • Operating system: Linux
  • nf-google plugin version: 1.7.3

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions