Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error executing process GENOME_STATISTICS:SUMMARYSEQUENCE with invalid assembly accession #109

Open
mtammami opened this issue Mar 3, 2024 · 2 comments · May be fixed by #144
Open

Error executing process GENOME_STATISTICS:SUMMARYSEQUENCE with invalid assembly accession #109

mtammami opened this issue Mar 3, 2024 · 2 comments · May be fixed by #144
Assignees
Labels
bug Something isn't working
Milestone

Comments

@mtammami
Copy link

mtammami commented Mar 3, 2024

Description of the bug

I encountered an error while executing the SANGERTOL_GENOMENOTE:GENOMENOTE:GENOME_STATISTICS:SUMMARYSEQUENCE process in the sanger-tol/genomenote pipeline. The process fails with an "invalid or unsupported assembly accession" error message when attempting to generate a sequence summary JSON file using the datasets command. This issue arises despite following the pipeline's usage instructions and providing valid input parameters.

-[sanger-tol/genomenote] Pipeline completed with errors-
[0b/de2c7e] Submitted process > SANGERTOL_GENOMENOTE:GENOMENOTE:GENOME_STATISTICS:SUMMARYSEQUENCE (genome.1)
[f9/348360] Submitted process > SANGERTOL_GENOMENOTE:GENOMENOTE:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet.csv)
[81/f56f5d] Submitted process > SANGERTOL_GENOMENOTE:GENOMENOTE:GENOME_STATISTICS:SUMMARYGENOME (genome.1)
[9e/883111] Submitted process > SANGERTOL_GENOMENOTE:GENOMENOTE:CONTACT_MAPS:SAMTOOLS_FAIDX (genome.1.fasta)
ERROR ~ Error executing process > 'SANGERTOL_GENOMENOTE:GENOMENOTE:GENOME_STATISTICS:SUMMARYSEQUENCE (genome.1)'

Caused by:
  Process `SANGERTOL_GENOMENOTE:GENOMENOTE:GENOME_STATISTICS:SUMMARYSEQUENCE (genome.1)` terminated with an error exit status (1)

Command executed:

  datasets \
      summary \
      genome \
      accession \
      genome.1 \
      --report sequence \
      > genome.1_sequence.json
  
  validate_datasets_json.py genome.1_sequence.json
  
  cat <<-END_VERSIONS > versions.yml
  "SANGERTOL_GENOMENOTE:GENOMENOTE:GENOME_STATISTICS:SUMMARYSEQUENCE":
      ncbi-datasets-cli: $(datasets --version | sed 's/^.*datasets version: //')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  Error: invalid or unsupported assembly accession: genome.1
  
  Use datasets summary genome accession <command> --help for detailed help about a command.

Work dir:
  /media/la_nube/tools/genomenotes/work/0b/de2c7e9f02030233c72ca802f6d05c

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

 -- Check '.nextflow.log' file for details

Command used and terminal output

nextflow run sanger-tol/genomenote \
   -profile docker \
   -r 1.1.1 \
   --input samplesheet.csv \
   --fasta genome.1.fasta \
   --outdir genomenote_results \
   --max_cpus 20 \
   --max_memory 200GB \
   --max_time '999h' \
   -resume

Relevant files

No response

System information

Pipeline Version: 1.1.1
Nextflow Version: 23.10.1
Execution Environment: Docker
Hardware: Workstation
OS: Linux Ubuntu
Executor: Local
Container engine: Docker, Singularity

@mtammami mtammami added the bug Something isn't working label Mar 3, 2024
@muffato
Copy link
Member

muffato commented Mar 4, 2024

Hi @mtammami .
Thank you for the bug report. I think the problem is that the v1 of the pipeline doesn't have a parameter for the accession number of the assembly, and assumes the name of the Fasta file is the accession.
That will be addressed in the v2, which is on the public_dev branch at the moment.

In the meantime, if you rename the Fasta file, or maybe make a symbolic link with the new name, it may work. Once confirmed, I could push a change to the documentation to clarify that.

@muffato muffato added this to the 2.0.0 milestone Apr 27, 2024
@BethYates BethYates self-assigned this Jun 20, 2024
@BethYates BethYates linked a pull request Oct 8, 2024 that will close this issue
9 tasks
@BethYates
Copy link
Collaborator

Release 2.0 will contain an --assembly parameter which will be used directly by the GENOME_STATISTICS:SUMMARYSEQUENCE process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

3 participants